climateprediction.net home page
Optimise PC build for CPDN

Optimise PC build for CPDN

Questions and Answers : Windows : Optimise PC build for CPDN
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Greg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 46413 - Posted: 13 Jun 2013, 6:28:37 UTC

I agree with Eirik, assuming he means UPS - uninterruptible power supply. That would probably have a bigger effect on the number of failures than ECC RAM versus non-ECC.

About placement - I think it's fine to put Boinc (programs) on the system disk. The CPDN programs live in the data folder, though, IIRC. If you have lots of spare disks you could consider putting the paging file on its own disk - although nowadays with RAM relatively cheap, paging is less common than it used to be.
ID: 46413 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 46421 - Posted: 13 Jun 2013, 20:54:54 UTC - in response to Message 46413.  

UPS. I have surge protection and have considered UPS in the past, the thing putting me off being the loss in efficiency of around 10% in a typical situation. It would help smooth out voltages, but my monitoring shows voltages from 217-238VAC on the supply (within the NZ nom 230V +/-6%), and well within the capacity of any computer power supply. We do have power cuts (say 2/yr), but the level of model failure is a far higher this. So, if there are other reasons for using a UPS, I'm happy to be convinced. Oh, and big yes to the other hints about OS updates etc and that is what I practise.

Buying a new PC is doing my head in and getting in the way of doing work!!! Prices for a Xeon system are high, but affordable, but getting info out of companies is like getting blood out of a stone. One thing I was wondering was would I be better off going for a slower 8 core or a fast 6 core, given that I will allow CPDN to run on half the cores. Like I said I'm not really into the RAC race. HP priced a 6core 3.2GHz Xeon, but I could, for a mere small fortune, go to an 8 core 2.6 or 2.4GHz Xeon. My gut feeling is that the total work unit throughput would be higher with a slower 8 core. Anyone any data, or are we getting in the relms of does it really matter at this point?

All systems have been priced with ECC memory.

Pagefile. Reading up on this earlier, even with loads of ram, I believe it is still written to disc for protection and things like hibernation. Like I said below, even with 12GB or ram its activity level is pretty high. But good advice and yes I'm going to put it on a separate HDD.
ID: 46421 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1081
Credit: 7,007,720
RAC: 4,289
Message 46422 - Posted: 13 Jun 2013, 22:41:05 UTC - in response to Message 46405.  

[Eirik Redd wrote:]... right now Darwin runs fastest (but maybe less accurately) ...
I don't know of any suggestion that accuracy varies between platforms. I have certainly found differences in reproducibility of results between platforms (i.e. what would be 'validity', if CPDN validated), but even that doesn't equate to accuracy given the CPDN mantra of "output variations are equivalent to random input variations". It would surprise me if all run-times were equivalently accurate in an absolute sense (e.g. in computing a 'log', 'cos' or 'sqrt' function), but whether that translates into an identifiably more accurate final model state is quite another matter.
ID: 46422 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 46423 - Posted: 14 Jun 2013, 8:53:45 UTC - in response to Message 46422.  

This is kinda techie - but - on Linux - Ubuntu-12.04 anyhow -
hadcm3n_6.07_i686-pc-linux-gnu depends on a shared library >>libm.so.6 => /lib/i386-linux-gnu/libm.so.6 that is provided by the host. This is the math library that is well-documented - but the compiler has several options on how to call the host math libs -- including, among others --fast-math which takes shortcuts to run faster or the much slower option for IEEE something-or-other which guarantees last-decimal accuracy.
I expect the compile is similarly dependent on the Windows and Darwin hosts -- and what I remember from the extreme testing a few years ago to get the compile to work all all varieties of hosts -- so it all depends on compiler options and on what math capabilities the host, and it's dll's - has. And some of the math libs figure out what host they are on and optimize (or short-cut if you see it that way)

But -- these are all really minor variations - I've looked up and seen that on some hardware, software, dll's -- sometimes a model will fail a few timesteps earlier on on host with different math libs vs another with different math libs.

Overall - there is variation dependent on platform and specific versions of mathlibs -- but -- not much difference in overall results.



ID: 46423 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 46424 - Posted: 14 Jun 2013, 11:31:47 UTC - in response to Message 46421.  

UPS. I have surge protection and have considered UPS in the past,...


I used to run a UPS a few years ago ... at the time I was getting something like 8 power cuts / month (usually just a second or two). But the supply has dramatically improved and it doesn't seem to be needed now.

... HP priced a 6core 3.2GHz Xeon, but I could, for a mere small fortune, go to an 8 core 2.6 or 2.4GHz Xeon. My gut feeling is that the total work unit throughput would be higher with a slower 8 core. ...


Why not get the cheapest 6-core now, and get an 8-core in a couple of years when they are cheaper?


I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 46424 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 46446 - Posted: 18 Jun 2013, 15:59:13 UTC
Last modified: 18 Jun 2013, 16:07:17 UTC

There is another option or two when addressing the high writes to an SSD, which I learned when dealing with the similarly high writes of the CEP2 project on World Community Grid.

The one I use at the moment is to place the BOINC data folder on a ramdisk. I currently use PrimoRamdisk (from Romex Software) for its relatively fast startup and shutdown, since you need to save and then reload the contents of the ramdisk each time you reboot. Another option is Dataram RAMDisk, which has a free version for disk sizes less than 4 GB, which should be plenty for most BOINC projects.

The second option is to use a caching program with a write-cache (the read cache is unnecessary for SSDs, but could be helpful for a mechanical disk drive). I have used FancyCache (also from Romex), which has a free beta at the moment. If you set the write cache to maybe 1 GB or so and set the latency to an hour or more (I usually used 24 hours), you get a very large (e.g., 99%) reduction in writes to the disk.

But remember you are then storing the BOINC data in main memory, so if you get a crash you lose it, at least in the case of a ramdisk. In the case of a write-cache, you lose everything in the cache from the time of the last cache flush to disk, but you still retain the basic data so it is easier to recover from. I use an uninterruptable power supply (UPS) with automatic software shutdown of the PC to prevent loss in case of a power outage, and I have a very stable PC. If you overclock and crash a lot, these are not good options.
ID: 46446 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 46447 - Posted: 18 Jun 2013, 20:07:13 UTC - in response to Message 46424.  
Last modified: 18 Jun 2013, 20:14:19 UTC

Mike suggested:



Why not get the cheapest 6-core now, and get an 8-core in a couple of years when they are cheaper?



Well, I guess that comes down to why I'm running CPDN in the first place. But basically I believe the CPDN community needs answers sooner rather than later. I've been working on energy issues since 1985 for a variety of organisations ranging from low income communities to some of the largest corporations. During this time, my driving force has been energy 'conservation' (to quote an old fashioned term), but over the last 10 years the issues of climate change came to be a significant motivator. I was a late starter on that one! I could go on forever on this one, but not the point of this thread.

So, I decided that although the 8 core is disproportionally expensive, I can afford the system, it will not use significantly more energy (the Xeon processor is actually rated at lower wattage than my old i7), and as a business PC it gets written off against tax. But I would have done it anyway. :-)

UPS. Still undecided on that one, but as the Met Service is predicting the heaviest snow for the last 20 years in the next few days, perhaps I should think again. Almost certain to lose power, which is great, cause it means you can sit around the log burner reading the odd book or two. Oh, and throw snowballs. ;-))

Just for the record, I'm going for the following built by a local company specialising in server builds, with the processor making up nearly 50% of the cost. Given the cost, decided not to build it myself, and they provide a 3 yr guarantee.

    Xeon E5-2670 8 Core.
    LGA 1120 motherboard (of course)
    Intel 520 series SSD for OS & programs (incl BOINC), plus a second one as a working scratch disk for non-BOINC work.)
    Various work HDDs
    32GB ECC 1600GHz ram (overkill I know, but it's cheap although have to wait till the end of the month before the memory arrives in the country! Geez NZ is a hick place at times.)
    CPDN data files on their own HHD
    Page file on a separate HDD (one of the work data drives probably)
    BOINC will run as service - as I always have. Means I can log out and leave CPDN running.
    Nightly backups. But this excludes any CPDN data, and the BOINC service is automatically stopped and restarted for this activity.
    Will allow CPDN to run on 50% of the cores.
    Hyperthreading? Probably, as I don't recall spotting this causing problems, although it only seems to give small advantage.
    No overclocking.


ID: 46447 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 46448 - Posted: 18 Jun 2013, 20:21:42 UTC - in response to Message 46446.  

Jim said
I use an uninterruptable power supply (UPS) with automatic software shutdown of the PC to prevent loss in case of a power outage.....


With the auto shutdown, can you set the software to do certain tasks, e.g. stop the BOINC service (when running BOINC as a service of course), or shuting down the Std BOINC program before shutting down the OS?

I ask as there has been previous thoughts that you should always shut down BOINC before shutting down the OS, as the OS does not always allow BOINC sufficient time to shutdown the CPDN threads safely.
ID: 46448 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 46452 - Posted: 18 Jun 2013, 21:58:13 UTC - in response to Message 46448.  

With the auto shutdown, can you set the software to do certain tasks, e.g. stop the BOINC service (when running BOINC as a service of course), or shuting down the Std BOINC program before shutting down the OS?

I ask as there has been previous thoughts that you should always shut down BOINC before shutting down the OS, as the OS does not always allow BOINC sufficient time to shutdown the CPDN threads safely.

I don't see any specific provisions for shutting down particular programs in either my APC PowerChute or my CyberPower PowerPanel software that initiates the PC shutdown in case of a power outage. But surely (?) you can shut down your PC from the Start button in Windows without problem. At least it has always worked fine for me shutting down BOINC when I am running WCG/CEP2 (and also Folding@home), but I don't have any specific experience on power outages with CPDN. I just started up CPDN again, and the thunderstorm season has yet to do much damage here. But I have certainly rebooted the PC manually with no problem, and it should be the same thing.

I wonder if it depends on your OS and/or disk drive though. Win7 64-bit and a reasonably fast Samsung SSD work for me.

ID: 46452 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 46467 - Posted: 20 Jun 2013, 16:21:21 UTC - in response to Message 46446.  

I should mention before getting too far down the road that one potential problem with a ramdisk is that the size of the CPDN files (in the BOINC data folder) keep getting larger. That can be accommodated for by choosing the ramdisk large enough to begin with if you have enough main memory insofar as the operation of a given set of work units is concerned. The real problem comes later, since according to the FAQ, the various projects do not clean out all their stuff, but leave it there, to varying degrees, for possible later use.

Unless you want to delete old files at the end of every run, it looks like a better solution is the cache (e.g., FancyCache). You could set the write cache to a large enough size (a few GB) to handle the work in progress, and only the remainder would get written to the disk drive when the cache is flushed. That is probably my next project.
ID: 46467 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 46468 - Posted: 20 Jun 2013, 22:13:10 UTC - in response to Message 46467.  

Unfortunately the re-write of the FAQ pages wasn't completed before the new front pages went live.
Most of it still applies to 2004 when the project went from classic cpdn to BOINC cpdn.

So, some FACts:
1) In the beginning, when the research was all at the University of Oxford, and the people there were mostly exploring parameter space to see where models crashed, not all of the data was returned to the project. Some was retained on people's computers for possible return later if the model proved interesting.

But there were so many tantrums from people saying that they never signed up to be a data store, (in spite of being told that keeping it wasn't compulsory), that when the project moved on to the next phase in 2006, data was no longer left on people's computers.

Now if a model completes it's designed run length, it will delete files afterwards.
But if computer problems crash it, then the clean up routine is never reached.
Also, those that complete but won't stop running, leave remnants behind when they are aborted.
In these cases, the user must manually delete the folders.

2) The Couple Ocean models build up a large number of small data files before these get zipped and returned. This can amount to a few gigs each.

One person recently was missing the program that does the zipping and returning, so the files built up to over 10 Gigs.

And there are now monster machines with 24 processors running models, so a huge amount of data will be normal for them.



Backups: Here
ID: 46468 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 46469 - Posted: 21 Jun 2013, 1:06:44 UTC - in response to Message 46468.  

OK, good. I was hoping they would adopt some sort of mandatory clean-up policy at some point, considering the limitations of SSDs. I just need to experimentally determine the maximum size when running four tasks simultaneously on an Ivy Bridge i5-3550. If a ramdisk will do it, that works for me.
ID: 46469 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 46470 - Posted: 21 Jun 2013, 8:02:12 UTC
Last modified: 21 Jun 2013, 8:07:58 UTC

In my experience - the only thing that gets a new machine better productivity is a faster core.
Faster disks, faster memory - gets zero more production. zero

My ancient Core 2 Duo at 3.0 GHz is still getting 25920 seconds per trickle.
(on Linux - see post on mathlibs)

My new i7-3770 at 3.4 GHz - ignoring the "hyperthreading" will - when I choke it down to only 4 wu - ie one wu per real core
it will be a little bit faster per core than the ancient Core2 Duo at 3Ghz

I've tried faster memory, faster disks -- gets nothing.

The core speed is all that matters for CPDN.

Believe it. Or look at the fastest machines -- overclock - maybe- but overclock loses in even the short run
ID: 46470 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1081
Credit: 7,007,720
RAC: 4,289
Message 46471 - Posted: 21 Jun 2013, 8:49:31 UTC - in response to Message 46470.  

... My ancient Core 2 Duo at 3.0 GHz is still getting 25920 seconds per trickle. ...
That looks like the timestep itself, but 37,444 plays lowest i7 starting time 32,875 ==> same conclusion.
ID: 46471 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 46474 - Posted: 21 Jun 2013, 10:47:23 UTC - in response to Message 46470.  

Eric writes
...the only thing that gets a new machine better productivity is a faster core. ....


Well yes and no. If you look at the 'Top Computers' list, it is lead by a computer running an AMD Opteron(tm) Processor 6176. Curious, as I know nothing about AMD, I looked it up, and if I am reading it correctly, it runs at a lowly 2.4GHz, but the rig is listed as having 48(!!!!) cores - no wonder it tops the list. Forgive me if I got this wrong, but it serves as an illustration to the following.

OK, if you have a system and want to improve it without major changes, then yes a processor speed upgrade is an absolutely valid option.

If on the other hand you are looking at a completely new system as I am, then a different approach is equally valid, and will give higher total throughput. As mentioned earlier, I took the approach of running as many tasks as possible on one computer within my budget and energy envelope, and as such, hopefully will have a higher throughput of tasks than if I had just concentrated on processor speed. Simple maths indicate this will be the case, if we assume the same percentage of cores run on each CPU. So instead of running e.g. 4 tasks on my i7, I will be able to run 8 on the Xeon (with hyperthreading.) My 'ancient' i7 runs at 2.67GHz, the new system with a Xeon E5-2670 will run at 2.6GHz. Therefore I should get double the number of tasks through to CPDN. If I had gone from say 2.67 to 3.1GHz on my i7 (assuming all else the same), I would only have gained a 16% increase. But even this could not have been achieved, as no-one in NZ stocks LGA1366 processors anymore. That's built in redundancy for you - grrrr.

I actual fact I would expect more than double the throughput, as the proposed Xeon processor is several generations newer and hopefully more efficient (computionally) than my i7-920, which was one of the first in the i7 series.

Time will tell I guess.

ID: 46474 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 46476 - Posted: 21 Jun 2013, 12:54:01 UTC - in response to Message 46474.  
Last modified: 21 Jun 2013, 13:01:36 UTC

What I was trying to say -- the performance per core hasn't gotten much faster lately.
If you can get more cores at a reasonable price, and they do use less power per core recently -- that's good.

--edit --
Looks like the rig you are planning on is real good on the reliability factor - that's the important part --

Don't waste money on faster memory or disks, however -- tried that, it doesn't help with cpdn workload. Go for reliability.

Best luck - hope your new rig chomps the numbers.

e
ID: 46476 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 46477 - Posted: 21 Jun 2013, 13:48:11 UTC

So there is speed, reliability, and affordability.
Can't have all three.
Posting this account of my dream machine -- when I have a few million to spare.

The IBM z-series has 5GHz+ cores, MTBF in decades, can virtualize almost anything, super-redundancy, support included with the price.

But the price per core is -- roughly - 2000 times the price of a reliable Xeon.

Oh well.

I'll go for cheap and reasonably reliable at 400-2000 USD per box, depending on speed and core count.



ID: 46477 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 46956 - Posted: 4 Sep 2013, 5:14:34 UTC - in response to Message 46271.  

An update for those that are interested. Quite a bit here if you manage to get to the end.

Well after a long time I have a new Xeon workstation (ID 1285327), but the initial tasks have all crashed, the last two because I pulled the wrong plug out of the wall. Dumb, dumb, dumb! 2 others with code 22 definitely look to be model errors and the other two with 193 I can't determine. Hopefully we are past the crash stage and things will now go well.

I seem to remember a list of all the main exit codes as a sticky somewhere in the forum, but for the life of me I can't seem to locate it, but sure I'd spotted it a few weeks ago. Can anyone help on that one?

As to the build - this took over a month in itself and glad I didn�t do it. The builder went for the Gigabyte GA-X79S-UP5-WIFI as it really does tick all the boxes. HOWEVER it would never get passed 3 hours in a burn test before the power supply shut down. ALL components were changed and THREE motherboards tried, all with the same result. They finally gave up and went with an Asus MB. Digging around it is an issue with this motherboard and I CANNOT recommend it for use with CPDN type tasks. The final straw was when ASUS told them to start lowering the voltages on the MB to make it more stable, at which point I said no. Things like this should just work.

For those that are interested, the final spec for the CPDN related parts of the build are:
    Motherboard: Asus P9X79-WS LGA 2011
    CPU: Xeon E5-2680
    Memory: 32GB ECC memory (Kingston ValueRam Server Premier Memory KVR16E11/8 (4X 32GB total Memory)
    HDD OS & programs: Intel 520 240GB SSD
    HDD CPDN/BOINC: Existing Seagate 500GB HDD. Nothing else run from this HDD.
    6 other drives including another SSD for use as a scratch disk.


Initial impressions. For general office work, no real difference, but for memory and disk intensive work a huge change noted, but I imagine I would have achieved that with a cheaper system. BUT, I'm currently running 8 CPDN tasks and don't even know they are there.

CPDN. Typical speed has gone from around 1.38 sec/TS on the old PC, down to 0.93-0.98 sec/TS on the XEON. I initially ran only 4 tasks, and when I brought a total of 8 tasks online, there was only marginal slowdown in speed. So I have managed to speed up the calculations by over 30% and add another 33% of tasks, which was the aim. Once I see the system is stable and have completed a few successful runs through the computer over the next month or two I may increase the number of tasks allowed. Currently have limited it to 50% of the processors. Hyperthreading is on.

Energy. As you can see, even though the XEON is not an efficient processor, it�s actually using less than the old rig. The following are the whole PC excluding screen. Idle does not have CPDN loaded. Both Prime95 and OCCT are burn software.

    Idle: 100 Watts (110 W Old PC)
    Prime95: 180 W
    OCCT: 210 W
    CPDN 4 tasks: 130 W
    CPDN 6 tasks: N/A (180 W Old PC)
    CPDN 8 tasks: 155 W


My system is backed up each night although I no longer backup CPDN as the restore process is too difficult. But I do not want BOINC running while the backup is running, so it is stopped before the backup starts. I already had a 30 second delay after BOINC shutdown and before the BU software started, but decided to have a play around to see if this was adequate.

I run BOINC as a service so it runs when the user is logged out, and you can either stop the service through the Windows BOINC Manager or a system level service stop command. Both have similar results. If when stopping BOINC you check both the Windows Task Manager and the Resource Monitor; after 5 secs, the service says it is stopped. After 1:05mins, the tasks disappear from the memory allocation, but not until 2:25mins, do the tasks stop writing to the hard drive. This is way more than I thought, but confirms everyone�s advice to shut down BOINC before you hit the shutdown button on the computer. I've now built in a 5 mins delay after shutting down BOINC and before starting the Backup. I may actually look at putting BOINC files on a faster hard drive as the newer Seagate 7200.14 drives are almost twice the speed of the 7200.12 that I'm currently using. From my own experience this makes a huge difference in photo work.

I�ll now go and have a coffee after shutting down BOINC and before shutting down the PC! Nice.

Martin


ID: 46956 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 46959 - Posted: 4 Sep 2013, 18:44:52 UTC

Nice write-up, Martin.

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 46959 · Report as offensive     Reply Quote
Profile Greg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 46961 - Posted: 5 Sep 2013, 4:53:51 UTC - in response to Message 46956.  
Last modified: 5 Sep 2013, 4:54:22 UTC

Thanks for the write-up, Martin.

I don't recall a list of error codes on this discussion board, but the Boinc FAQ service, http://boincfaq.mundayweb.com/index.php, has a section devoted to them (section 6).
ID: 46961 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Questions and Answers : Windows : Optimise PC build for CPDN

©2024 climateprediction.net