Questions and Answers :
Windows :
Optimise PC build for CPDN
Message board moderation
Previous · 1 · 2 · 3 · 4
Author | Message |
---|---|
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
Use a Ramdisk http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=7602&nowrap=true#46446 In fact, even a first generation SSD wasn't fast enough for running CEP2 on all four cores of my Quad-Core a few years ago, and I picked up numerous errors. So I started with caching/ramdisk software then, and problem solved. (I don't think you will get errors with the current generation of SSDs, but the high write rate could kill them prematurely). |
Send message Joined: 17 Nov 07 Posts: 142 Credit: 4,271,370 RAC: 0 |
PrimoCache (renamed FancyCache) looks interesting. There's also this (from 2008, that's why it talks about Vista): https://www.pokertracker.com/forums/viewtopic.php?f=45&t=10489 (the "fsutil" command is discussed on Microsoft Technet, so this seems to be OK.) Enlarge Write-Ahead Cache(underline added). |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
I do have a UPS also - ... I'm not currently running it because the power supply has been much improved and I no longer get powercuts. Having made the mistake of saying that I don't get power-cuts any more, yesterday the power went and I lost 5 of my 6 models... If this happens again, I might need to invest in a new set of batteries. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,888,554 RAC: 1,481,373 |
About UPS and the shutdown scripts their software can trigger - This http://technet.microsoft.com/sv-se/sysinternals/bb897438%28en-us%29.aspx has a disk-flusher utility -- haven't tested it myself - probably works. Also maybe useful for making backups - may shorten the time to get the disk-queue clean after stopping all models and again after telling BOINC to terminate. |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
Thanks Mike, you make me feel better. Especially as I've just lost another 6 models because of yet another power cut. Thought things were safe again and UPS due next week, so I'd give it a go and run some more models - Wrong! Now not modelling till UPS installed and tested. Finally settled on the Eaton 5P after a lot of digging around, secondhand ones just not available here. Modern PC power supplies are active Power Factor corrected and as such it is hit and miss if they work with the stepped wave form from most cheap UPS systems. According to the Seagate FAQ (my power supply) the only way to tell is by trial and error, and at this stage I've got better things to do. The 5P has pure sine wave output at a good price and this is unusual for a Line-Interactive UPS. Hopefully this means it will work. The Line-Interactive UPS is good as it runs at about 98% efficiency. The most reliable and smoothing UPS is Online Double Conversion, but these only run at about 90% efficiency as the power always runs through the inverter. And they are expensive. The cheapest are Offline/standby, but again these normally output stepped sine wave and rely on a switch to move to the battery supply on power failure. Mixed reports on reliability of the switching, so decided not to bother. > disk-flusher utility, suggested by Eirik Sized correctly the UPS should give you plenty of time for write activities to finish. If you're going to buy one, why skimp on the capacity. I know nothing at all about this level of system utility, but personally I would be cautious about using it as I reckon modern OSs and hard drives with their huge caches are complicated enough at it is. It might be fine, but with the rate that BOINC writes to the hard drive I'll leave things as they are. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,888,554 RAC: 1,481,373 |
Noticed that your machine picked up 5 rapid-rapit models yesterday. Hoping cooling and UPS issues resolved. Me - with Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz [Family 6 Model 58 Stepping 9]Ivy Bridge Been doing some stats -- cpu temp lower than 70 - cores run at 3700 (hyper-better than advertised 3400) if I can trust the various softwares that claim to measure -- ?? I also collected stats on the i7-3770 (running Win8 in Virtualbox - so discount some overhead) Pushing the i7-3770 Ivy box to 8/8 (4 real cores + 4 hyper) didn't gain much, one or two percent either way. Using 6/8 seemed to be the sweet spot - extrapolate to your 8-real 16 ht machine. But also, using all 4 real cores for CPDN - didn't cost much from other work -- most apps don't use the floating point units. So when your machine stabilizes and cooling resolved -- expect 8-12 CPDN units can run without impacting whatever else you use machine for. |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
Hi Eirik, Yes I'm now underway again. The UPS took some time to arrive and then quite a while to sort out their software and undertake testing. It seems a good UPS, but their implementation to run batch files through their web software interface is absolutely woeful. For some reason batch files do not run as you would normally expect (win7-64bit), and it was only by trial and error that I got it to work. If anyone else goes down this route with Eaton, they can always pm me for details. The upshot is, testing power outages (pull the plug) works fine with the boinc service stopping and closing down safely before the PC itself shuts down. There seems to be a few issues with Boinc itself, but this may be because of the Eaton interface as I don't have this problem with other software doing the same thing. Even though the Boinc Service is stopped, after a bit, Boinc Manager seems to want to restart it. I removed boincmgr.exe and boinctray.exe from System Startup, which improved things, but didn't entirely eliminate it which seems a trifle weird. The restart thing seems a bit random, but I can live with it as you can ignore it or tell it no, and it does nothing, as Window's UAC interface intercepts the run call. Still waiting on the cooler to arrive in the country! Guess NZ is way down the list on Corsair's shipping order. Still I was preempting the summer weather so some way to go yet before it becomes an issue. Once that is installed I'll gradually add a few more tasks. As I mentioned, I think it now becomes an issue of HD writes when too many tasks are running, but the tasks are now spread out so that each 25/50/75/100 completion point should be unique at any one time. Fingers crossed. Edit addition. With respect to your processor speed, it may be that your bios is set to use Intel Turbo Boost. This allows an auto boost in processor speed as long as certain parameters are met - mainly not overheating. Normal operation will see the processor running in Turbo boost as that seems be the default of most Bios. |
Send message Joined: 15 May 09 Posts: 4346 Credit: 16,535,294 RAC: 5,887 |
A word of warning for those looking to optimise, the warnings about not using an SSD seem to be valid. SSD on my netbook has just given up the ghost after crunching predictably for just over a year though recently with WCG as no regional models available for CPDN recently. Fortunately the WCG data is all that I will have lost on it. I think I will probably go back to a slightly larger mechanical hard disk and accept the slightly slower start up times. May well put an ssd onto main machine for just the OS though. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,888,554 RAC: 1,481,373 |
Hey MartinNZ Just had a peek at your machine on the CPDN website -- looks like you running 10 models now - and going like Topsy. Great!. Thinking the ECC will be a long-term asset. Time to consider backup strategy. Thanks for your time and trouble building this box. e |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
The machine is already #12 on CPDN's RAC list. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
Hi Guys, first like to thank all those that helped me get the beast this far. Your input has been invaluable, and I think I now have a pretty reliable PC. It's been a pretty step learning curve, and for this one I'm certainly glad I got someone else to build it. The water cooler (Corsair H100i) went in on Thurs and dropped CPU temperatures by 30C so worth the investment. After a burn/soak test CPDN was started again, and was running OK so I added a couple more tasks as Eirik noticed. Computation time has increased from around 0.865 sec/TS to around 0.96, with the new tasks running at around 1 sec/TS. There will be an optimum point where it will not be worth adding more tasks as the RACs will not increase radically and work performance will drop. My RACs will be up and down for a while, as for some reason my backup software is playing up since the Corsair cooler and software was installed. When the BU runs it stops the BOINC service before running the BU, and is then supposed to restart it. For some reason it is stopping BOINC, then not running the BU scripts or restarting BOINC. Hmmm, it's likely to take a while to sort out. So Eirik, I have a BU strategy (pretty comprehensive really), but not for BOINC. The last time I did that was when running single models on an old Pentium 4. From what I remember then and read since, it's pretty difficult to restore multiple models, but perhaps things have changed. There is a separate thread for BUs, but I see it hasn't been added to since 2008. BTW, since installing the UPS, there have been FOUR power cuts, so in my case it is an essential item. astroWX's observation on the RAC list is interesting and shows how times change. When I first built my old i7-920, I think it came in at around 7th in the list, but was around 130th(?) when it was retired. Looking at that position now, the RAC is around 1600 compared to 2600 that I'm achieving with the new PC. Can't grumble on that one, especially as it's still workable as a work PC and only drawing 190W (according to the UPS). |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
I see I have 2 more failures, one at 25% the other at 50%. Each of these had tasks in close proximity in reaching the same point, but were still about 3 hours apart. I'll suspend other similar tasks so that they are at least 5 hours apart, but this gets a bit tedious when trying to look at the 25/50/75/100% points so that they do not clash - if indeed that is the problem. Other machines with more tasks do not seem to have the same issue and we shouldn't have to micro-manage tasks in this way anyway. If anyone has any thoughts let me know, I notice one has error 255 which could be a windows call about file not being found. The other is 193, which according to the BOINC FAQ is an obsolete error message, but one which seems to feature quite a bit on CPDN. I'll see how it goes in the coming days as everything else on the PC now seems to be sorted out and running OK. FYI the issue of BOINC not continuing after backups has been solved - a BU software issue. If anyone has any thoughts let me know, but I've having a week off soon. Will leave the system running and hope for the best. |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
Lies, damn lies, and statistics. Some of you expressed surprise earlier in this thread about the time it takes to finish the write activity to the hard drive when you stop CPDN/BOINC. I must admit I also found it odd, but hey, that�s what the Win7 Resource Monitor was telling me, so it must be right. Right? Well it turns out not quite so. A while back when we had no tasks I decided to update my backup procedures and do some testing. My backup hard drives are in removable racks that have activity LEDs so you can see what is going on. I expected some sort of delay to be shown in the Resource Monitor as it is not exactly a high priority item, but the delay can be huge. Even with the smallish amount of data I was using, Resource Monitor continued to show write activity for a minute after the LED showed that all activity had finished. I could even physically remove the hard drive when the Resource Monitor still showed read/write activity and Resource Monitor would just keep on showing loads of activity. BUT, I also have just carried out a slightly different experiment. With BOINC Manager open and CPDN suspended I check to see that there is no hard drive activity. I then Resume CPDN and within 10secs Resource Manager is showing BOINC related hard drive activity and it quickly builds after that. My guess is that when there is considerable CPU activity, Resource Monitor takes a really low priority, at other times, this appears to be not the case. So, ignore all that rubbish I talked earlier on, but as to how long it takes, who knows. I guess the processor fan slowing down gives a pretty good hint though, and that happens almost instantly. It would be quite interesting if anyone can isolate their CPDN hard drive activity and provide hard data. BTW, pretty happy with the installation. Now that we are back on one model, I�m just in the process of optimising the number of tasks and will report back later. Currently running 12 on the available 16 hyperthreads � with no degradation on work throughput. |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
...Currently running 12 on the available 16 hyperthreads � with no degradation on work throughput. Awesome, nice work. Regarding the disk activity indicator in the resource monitor, I wonder how much it is affected by disk caching. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
Credits query. Posting here as this isn't a where are my credits or I'll give up CPDN war cry ;-) I use my local BOINC Manager stats graph to help keep an eye on things to see that they are running OK. Used to use it to tune the number of tasks to run, but found that is only useful if there is a steady stream of the one model. Since 2 May the racs have dropped from 9200 to 7400 and still heading south. 2 May is roughly when the bulk ANZ models ran out, so wondering if there is a link. I've been running 10 tasks for last few months on 1290283 & checking the sec/Ts of the three different models running, they are the same before and after 2 May. Any thoughts? |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,085,690 RAC: 2,334 |
Since 2 May the racs have dropped from 9200 to 7400 and still heading south. 2 May is roughly when the bulk ANZ models ran out, so wondering if there is a link. I've been running 10 tasks for last few months on 1290283 & checking the sec/Ts of the three different models running, they are the same before and after 2 May. Any thoughts?[/quote] What type of models are you presently running. I have noticed the same drop in RAC as I have shifted back from running all ANZ models to several CM models. I have known for some time that CM models are compensated at a lower rate than any other type of model. Two weeks ago I was running all ANZ models. I now have 4 CM models running on 3 machines along with 5 ANZ�s and 1 EU. As the CM have kicked in my RAC has dropped by 900 credits. Maybe it is time to consider adjusting the credit award rate so that the CM�s are more in line with other model types. This would be justified by the fact that the CM models are a bigger commitment of time and computer resources. After all it is not like the credits cost them anything. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
One thing that needs to be kept in mind, is that there are 2 types of credit: 1) Incrementing credit 2) Temporary credit (RAC), the total of which will decrease with certain factors, such as not running models for a while. It was found some years ago that the RAC calcs aren't very good, and a correction factor, different for each type of model, was introduced. Private discussions occurred recently about one of the newer models requiring an adjustment. I forget which one it was, but I don't think that anything has been done yet. Credit matters still aren't part of core business. |
Send message Joined: 17 Nov 07 Posts: 142 Credit: 4,271,370 RAC: 0 |
Martin- Jim has the right of it. With 8 hadcm3ns, my i7-2600 maxed out around 3400 RAC. With other models--FAMOUS, Weather at Home--it could get over 6000. There's some technical reason why they can't adjust the credits per trickle for hadcm3n. I see you're running 5 hadcm3ns; if in the past you were running all Weather At Home models, that would account for the drop in RAC. I currently have a similar situation. My PC is running only 7 models, 4 hadam3pm2s and 3 hadcm3ns, and its RAC is over 10,000: it's no. 4 on the "top computers" list as I write, which is just silly. I think the credit allocation for hadam3pm2 is also wrong-in the opposite direction. But as Les says, getting a Windows version of hadam3pm2 out the door and responding to scientists' demands are probably higher priorities than adjusting credits. We don't run CPDN work because of the project's focus on PR and community interaction. ;-) |
©2024 climateprediction.net