climateprediction.net home page
Optimise PC build for CPDN

Optimise PC build for CPDN

Questions and Answers : Windows : Optimise PC build for CPDN
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 47081 - Posted: 17 Sep 2013, 14:02:30 UTC - in response to Message 46964.  
Last modified: 17 Sep 2013, 14:14:34 UTC


So far all my tasks have crashed and I'm suspending calculations for the moment.

Remembering other threads about this I assume this is a problem with the hard drives not getting the data out quickly enough as highlighted in Greg's post here. It seems a possibility that the PC is generating too much data for the older drive BOINC sits on.

Am I correct in this, and if so, what's the best approach? I'm quite happy to put a faster drive in and that includes a SSD if necessary. Yes I know the SSD life could be short, but an Intel 520 should be good for 2 years? By that time something else will be along anyway. A 120GB SSD around $240, 1TB Seagate HDD around $110. Would need the 1TB drive to get the higher data speeds.

Or, are there any ways in which Windows can be manipulated to speed things up. I do have a largish number of drives, don't know if that makes any difference.

Any thoughts anyone?

Martin

Use a Ramdisk
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=7602&nowrap=true#46446

In fact, even a first generation SSD wasn't fast enough for running CEP2 on all four cores of my Quad-Core a few years ago, and I picked up numerous errors. So I started with caching/ramdisk software then, and problem solved. (I don't think you will get errors with the current generation of SSDs, but the high write rate could kill them prematurely).
ID: 47081 · Report as offensive     Reply Quote
Profile Greg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 47085 - Posted: 17 Sep 2013, 21:56:57 UTC - in response to Message 47081.  

PrimoCache (renamed FancyCache) looks interesting.

There's also this (from 2008, that's why it talks about Vista):

https://www.pokertracker.com/forums/viewtopic.php?f=45&t=10489 (the "fsutil" command is discussed on Microsoft Technet, so this seems to be OK.)
Enlarge Write-Ahead Cache

This option is configurable in Vista or Windows 2003 server only.

Windows gives the NTFS filesystem a default cache to use for information, but if you are opening and closing a lot of different files in rapid succession, this cache can be exhausted, causing reads and writes to take longer than necessary. There are two setting sizes: normal, and large. From the Microsoft Documentation:

Increasing physical memory does not always increase the amount of paged pool memory available to NTFS. Setting memoryusage to 2 raises the limit of paged pool memory. This might improve performance if your system is opening and closing many files in the same file set and is not already using large amounts of system memory for other applications or for cache memory. If your computer is already using large amounts of system memory for other applications or for cache memory, increasing the limit of NTFS paged and non-paged pool memory reduces the available pool memory for other processes. This might reduce overall system performance.

To set the cache to its larger size, click Start --> Run, type 'cmd' and hit enter. Then type:

fsutil behavior set memoryusage 2

In the event of any issue, or degradation of performance as a result of this change, you can set the cache back to its normal size. To revert to the default configuration, click Start --> Run, type 'cmd' and hit enter. Then type:

fsutil behavior set memoryusage 1
(underline added).
ID: 47085 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 47138 - Posted: 21 Sep 2013, 7:31:06 UTC - in response to Message 47077.  
Last modified: 21 Sep 2013, 7:32:35 UTC


I do have a UPS also - ... I'm not currently running it because the power supply has been much improved and I no longer get powercuts.


Having made the mistake of saying that I don't get power-cuts any more, yesterday the power went and I lost 5 of my 6 models...

If this happens again, I might need to invest in a new set of batteries.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 47138 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 47139 - Posted: 21 Sep 2013, 8:51:34 UTC

About UPS and the shutdown scripts their software can trigger -
This http://technet.microsoft.com/sv-se/sysinternals/bb897438%28en-us%29.aspx has a disk-flusher utility -- haven't tested it myself - probably works. Also maybe useful for making backups - may shorten the time to get the disk-queue clean after stopping all models and again after telling BOINC to terminate.

ID: 47139 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 47151 - Posted: 21 Sep 2013, 23:19:42 UTC - in response to Message 47139.  

Thanks Mike, you make me feel better. Especially as I've just lost another 6 models because of yet another power cut. Thought things were safe again and UPS due next week, so I'd give it a go and run some more models - Wrong! Now not modelling till UPS installed and tested.

Finally settled on the Eaton 5P after a lot of digging around, secondhand ones just not available here. Modern PC power supplies are active Power Factor corrected and as such it is hit and miss if they work with the stepped wave form from most cheap UPS systems. According to the Seagate FAQ (my power supply) the only way to tell is by trial and error, and at this stage I've got better things to do. The 5P has pure sine wave output at a good price and this is unusual for a Line-Interactive UPS. Hopefully this means it will work. The Line-Interactive UPS is good as it runs at about 98% efficiency. The most reliable and smoothing UPS is Online Double Conversion, but these only run at about 90% efficiency as the power always runs through the inverter. And they are expensive. The cheapest are Offline/standby, but again these normally output stepped sine wave and rely on a switch to move to the battery supply on power failure. Mixed reports on reliability of the switching, so decided not to bother.

> disk-flusher utility, suggested by Eirik
Sized correctly the UPS should give you plenty of time for write activities to finish. If you're going to buy one, why skimp on the capacity. I know nothing at all about this level of system utility, but personally I would be cautious about using it as I reckon modern OSs and hard drives with their huge caches are complicated enough at it is. It might be fine, but with the rate that BOINC writes to the hard drive I'll leave things as they are.
ID: 47151 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 47230 - Posted: 4 Oct 2013, 9:09:24 UTC

Noticed that your machine picked up 5 rapid-rapit models yesterday.
Hoping cooling and UPS issues resolved.

Me - with
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz [Family 6 Model 58 Stepping 9]
Ivy Bridge

Been doing some stats -- cpu temp lower than 70 - cores run at 3700 (hyper-better than advertised 3400) if I can trust the various softwares that claim to measure -- ??

I also collected stats on the i7-3770 (running Win8 in Virtualbox - so discount some overhead)

Pushing the i7-3770 Ivy box to 8/8 (4 real cores + 4 hyper) didn't gain much, one or two percent either way. Using 6/8 seemed to be the sweet spot - extrapolate to your 8-real 16 ht machine.
But also, using all 4 real cores for CPDN - didn't cost much from other work -- most apps don't use the floating point units.

So when your machine stabilizes and cooling resolved -- expect 8-12 CPDN units can run without impacting whatever else you use machine for.
ID: 47230 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 47242 - Posted: 7 Oct 2013, 2:12:18 UTC - in response to Message 47230.  
Last modified: 7 Oct 2013, 2:19:17 UTC

Hi Eirik,

Yes I'm now underway again. The UPS took some time to arrive and then quite a while to sort out their software and undertake testing. It seems a good UPS, but their implementation to run batch files through their web software interface is absolutely woeful. For some reason batch files do not run as you would normally expect (win7-64bit), and it was only by trial and error that I got it to work. If anyone else goes down this route with Eaton, they can always pm me for details.

The upshot is, testing power outages (pull the plug) works fine with the boinc service stopping and closing down safely before the PC itself shuts down. There seems to be a few issues with Boinc itself, but this may be because of the Eaton interface as I don't have this problem with other software doing the same thing. Even though the Boinc Service is stopped, after a bit, Boinc Manager seems to want to restart it. I removed boincmgr.exe and boinctray.exe from System Startup, which improved things, but didn't entirely eliminate it which seems a trifle weird. The restart thing seems a bit random, but I can live with it as you can ignore it or tell it no, and it does nothing, as Window's UAC interface intercepts the run call.

Still waiting on the cooler to arrive in the country! Guess NZ is way down the list on Corsair's shipping order. Still I was preempting the summer weather so some way to go yet before it becomes an issue.

Once that is installed I'll gradually add a few more tasks. As I mentioned, I think it now becomes an issue of HD writes when too many tasks are running, but the tasks are now spread out so that each 25/50/75/100 completion point should be unique at any one time.

Fingers crossed.

Edit addition.
With respect to your processor speed, it may be that your bios is set to use Intel Turbo Boost. This allows an auto boost in processor speed as long as certain parameters are met - mainly not overheating. Normal operation will see the processor running in Turbo boost as that seems be the default of most Bios.
ID: 47242 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,378,503
RAC: 3,632
Message 47334 - Posted: 17 Oct 2013, 14:18:39 UTC

A word of warning for those looking to optimise, the warnings about not using an SSD seem to be valid. SSD on my netbook has just given up the ghost after crunching predictably for just over a year though recently with WCG as no regional models available for CPDN recently. Fortunately the WCG data is all that I will have lost on it. I think I will probably go back to a slightly larger mechanical hard disk and accept the slightly slower start up times. May well put an ssd onto main machine for just the OS though.
ID: 47334 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 47360 - Posted: 20 Oct 2013, 14:00:06 UTC - in response to Message 47242.  

Hey MartinNZ Just had a peek at your machine on the CPDN website -- looks like you running 10 models now - and going like Topsy.
Great!. Thinking the ECC will be a long-term asset.

Time to consider backup strategy.

Thanks for your time and trouble building this box.

e

ID: 47360 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 47362 - Posted: 20 Oct 2013, 20:59:29 UTC - in response to Message 47360.  

The machine is already #12 on CPDN's RAC list.

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 47362 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 47366 - Posted: 20 Oct 2013, 22:50:25 UTC
Last modified: 20 Oct 2013, 22:53:34 UTC

Hi Guys, first like to thank all those that helped me get the beast this far. Your input has been invaluable, and I think I now have a pretty reliable PC. It's been a pretty step learning curve, and for this one I'm certainly glad I got someone else to build it.

The water cooler (Corsair H100i) went in on Thurs and dropped CPU temperatures by 30C so worth the investment. After a burn/soak test CPDN was started again, and was running OK so I added a couple more tasks as Eirik noticed. Computation time has increased from around 0.865 sec/TS to around 0.96, with the new tasks running at around 1 sec/TS. There will be an optimum point where it will not be worth adding more tasks as the RACs will not increase radically and work performance will drop.

My RACs will be up and down for a while, as for some reason my backup software is playing up since the Corsair cooler and software was installed. When the BU runs it stops the BOINC service before running the BU, and is then supposed to restart it. For some reason it is stopping BOINC, then not running the BU scripts or restarting BOINC. Hmmm, it's likely to take a while to sort out. So Eirik, I have a BU strategy (pretty comprehensive really), but not for BOINC. The last time I did that was when running single models on an old Pentium 4. From what I remember then and read since, it's pretty difficult to restore multiple models, but perhaps things have changed. There is a separate thread for BUs, but I see it hasn't been added to since 2008.

BTW, since installing the UPS, there have been FOUR power cuts, so in my case it is an essential item.

astroWX's observation on the RAC list is interesting and shows how times change. When I first built my old i7-920, I think it came in at around 7th in the list, but was around 130th(?) when it was retired. Looking at that position now, the RAC is around 1600 compared to 2600 that I'm achieving with the new PC. Can't grumble on that one, especially as it's still workable as a work PC and only drawing 190W (according to the UPS).
ID: 47366 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 47377 - Posted: 21 Oct 2013, 21:17:59 UTC

I see I have 2 more failures, one at 25% the other at 50%. Each of these had tasks in close proximity in reaching the same point, but were still about 3 hours apart. I'll suspend other similar tasks so that they are at least 5 hours apart, but this gets a bit tedious when trying to look at the 25/50/75/100% points so that they do not clash - if indeed that is the problem. Other machines with more tasks do not seem to have the same issue and we shouldn't have to micro-manage tasks in this way anyway.

If anyone has any thoughts let me know, I notice one has error 255 which could be a windows call about file not being found. The other is 193, which according to the BOINC FAQ is an obsolete error message, but one which seems to feature quite a bit on CPDN.

I'll see how it goes in the coming days as everything else on the PC now seems to be sorted out and running OK. FYI the issue of BOINC not continuing after backups has been solved - a BU software issue.

If anyone has any thoughts let me know, but I've having a week off soon. Will leave the system running and hope for the best.
ID: 47377 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 47955 - Posted: 9 Jan 2014, 0:38:54 UTC - in response to Message 47377.  

Lies, damn lies, and statistics.

Some of you expressed surprise earlier in this thread about the time it takes to finish the write activity to the hard drive when you stop CPDN/BOINC. I must admit I also found it odd, but hey, that�s what the Win7 Resource Monitor was telling me, so it must be right. Right?

Well it turns out not quite so. A while back when we had no tasks I decided to update my backup procedures and do some testing. My backup hard drives are in removable racks that have activity LEDs so you can see what is going on. I expected some sort of delay to be shown in the Resource Monitor as it is not exactly a high priority item, but the delay can be huge. Even with the smallish amount of data I was using, Resource Monitor continued to show write activity for a minute after the LED showed that all activity had finished. I could even physically remove the hard drive when the Resource Monitor still showed read/write activity and Resource Monitor would just keep on showing loads of activity.

BUT, I also have just carried out a slightly different experiment. With BOINC Manager open and CPDN suspended I check to see that there is no hard drive activity. I then Resume CPDN and within 10secs Resource Manager is showing BOINC related hard drive activity and it quickly builds after that. My guess is that when there is considerable CPU activity, Resource Monitor takes a really low priority, at other times, this appears to be not the case.

So, ignore all that rubbish I talked earlier on, but as to how long it takes, who knows. I guess the processor fan slowing down gives a pretty good hint though, and that happens almost instantly.

It would be quite interesting if anyone can isolate their CPDN hard drive activity and provide hard data.

BTW, pretty happy with the installation. Now that we are back on one model, I�m just in the process of optimising the number of tasks and will report back later. Currently running 12 on the available 16 hyperthreads � with no degradation on work throughput.

ID: 47955 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 47960 - Posted: 9 Jan 2014, 15:30:44 UTC - in response to Message 47955.  

...Currently running 12 on the available 16 hyperthreads � with no degradation on work throughput.


Awesome, nice work.


Regarding the disk activity indicator in the resource monitor, I wonder how much it is affected by disk caching.

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 47960 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 49107 - Posted: 14 May 2014, 22:28:21 UTC - in response to Message 47960.  

Credits query. Posting here as this isn't a where are my credits or I'll give up CPDN war cry ;-)

I use my local BOINC Manager stats graph to help keep an eye on things to see that they are running OK. Used to use it to tune the number of tasks to run, but found that is only useful if there is a steady stream of the one model.

Since 2 May the racs have dropped from 9200 to 7400 and still heading south. 2 May is roughly when the bulk ANZ models ran out, so wondering if there is a link. I've been running 10 tasks for last few months on 1290283 & checking the sec/Ts of the three different models running, they are the same before and after 2 May.

Any thoughts?
ID: 49107 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,053,321
RAC: 4,417
Message 49110 - Posted: 15 May 2014, 1:09:10 UTC - in response to Message 49107.  

Since 2 May the racs have dropped from 9200 to 7400 and still heading south. 2 May is roughly when the bulk ANZ models ran out, so wondering if there is a link. I've been running 10 tasks for last few months on 1290283 & checking the sec/Ts of the three different models running, they are the same before and after 2 May.

Any thoughts?[/quote]

What type of models are you presently running. I have noticed the same drop in RAC as I have shifted back from running all ANZ models to several CM models. I have known for some time that CM models are compensated at a lower rate than any other type of model.

Two weeks ago I was running all ANZ models. I now have 4 CM models running on 3 machines along with 5 ANZ�s and 1 EU. As the CM have kicked in my RAC has dropped by 900 credits.

Maybe it is time to consider adjusting the credit award rate so that the CM�s are more in line with other model types. This would be justified by the fact that the CM models are a bigger commitment of time and computer resources. After all it is not like the credits cost them anything.

ID: 49110 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 49111 - Posted: 15 May 2014, 1:49:48 UTC

One thing that needs to be kept in mind, is that there are 2 types of credit:
1) Incrementing credit
2) Temporary credit (RAC), the total of which will decrease with certain factors, such as not running models for a while.

It was found some years ago that the RAC calcs aren't very good, and a correction factor, different for each type of model, was introduced.
Private discussions occurred recently about one of the newer models requiring an adjustment. I forget which one it was, but I don't think that anything has been done yet.

Credit matters still aren't part of core business.


ID: 49111 · Report as offensive     Reply Quote
Profile Greg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 49112 - Posted: 15 May 2014, 2:47:55 UTC - in response to Message 49107.  

Martin-

Jim has the right of it. With 8 hadcm3ns, my i7-2600 maxed out around 3400 RAC. With other models--FAMOUS, Weather at Home--it could get over 6000. There's some technical reason why they can't adjust the credits per trickle for hadcm3n. I see you're running 5 hadcm3ns; if in the past you were running all Weather At Home models, that would account for the drop in RAC.

I currently have a similar situation. My PC is running only 7 models, 4 hadam3pm2s and 3 hadcm3ns, and its RAC is over 10,000: it's no. 4 on the "top computers" list as I write, which is just silly. I think the credit allocation for hadam3pm2 is also wrong-in the opposite direction.

But as Les says, getting a Windows version of hadam3pm2 out the door and responding to scientists' demands are probably higher priorities than adjusting credits. We don't run CPDN work because of the project's focus on PR and community interaction. ;-)
ID: 49112 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Questions and Answers : Windows : Optimise PC build for CPDN

©2024 climateprediction.net