climateprediction.net home page
Hardware for new models.

Hardware for new models.

Message boards : Number crunching : Hardware for new models.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

AuthorMessage
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 67129 - Posted: 30 Dec 2022, 9:58:10 UTC - in response to Message 67049.  

On CPU temperature and heat output:
– If core clocks are high, hot-spot temperatures will be high, almost independently of the size of the heatsink and its intake air temperature.
A large heatsink with a large fan and good heatsink paste will reduce all the temperatures inside the chip. If the CPU temperature is high, put your finger on the heatsink. If it's hot, you need a bigger fan and/or a larger heatsink, if it's cold, you need better thermal paste.

– Take a look into the BIOS for power limits. They might be set needlessly high by default.
– There is a trend among consumer desktop mainboard BIOSes to apply too high voltage by default, I hope that's not the case with workstation BIOSes.
I build my own, I've never seen a motherboard default to being high. That must be something OEMs do to make the PC look fast. They're shooting themselves in the foot if it makes it unreliable causing service calls.
ID: 67129 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 67130 - Posted: 30 Dec 2022, 10:00:03 UTC - in response to Message 67063.  

In the summertime I have a double window fan blowing outside air inside and windows open elsewhere, but when it is 90F outside, it is tough to keep the computer box cool enough to keep the processor cool enough unless I run the fans so fast as to drive me crazy.
I would die of sweat at 32C. That's only 5 less than body temperature. Get an AC unit!
ID: 67130 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 67131 - Posted: 30 Dec 2022, 10:02:18 UTC - in response to Message 67120.  

I can see @xii5ku probably has the same unfortunate copper based "broadband" as I do where I get pitiful upload bandwidth relative to download. Down:up ratio is like 50-100:1 here. :-(

With current oifs3 tasks, my upload link indeed saturate first before running out of memory, though it likely would change for the next resolutions and that's good news for me.
I wonder if your ISP might adjust the ratio? Presumably the line is capable of a certain total throughput. If you tell them up is more important than the average person, they might make an adjustment. It could be very easily programmable at their end without him leaving his desk. they've just guessed what the average user wants.
ID: 67131 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4342
Credit: 16,499,590
RAC: 5,672
Message 67141 - Posted: 30 Dec 2022, 13:10:17 UTC - in response to Message 67131.  

I wonder if your ISP might adjust the ratio? Presumably the line is capable of a certain total throughput. If you tell them up is more important than the average person, they might make an adjustment. It could be very easily programmable at their end without him leaving his desk. they've just guessed what the average user wants.
Mine won't. I tried. But don't let that put anyone else off.
ID: 67141 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 943
Credit: 34,182,995
RAC: 6,606
Message 67142 - Posted: 30 Dec 2022, 13:43:50 UTC - in response to Message 67141.  

I wonder if your ISP might adjust the ratio? Presumably the line is capable of a certain total throughput. If you tell them up is more important than the average person, they might make an adjustment. It could be very easily programmable at their end without him leaving his desk. they've just guessed what the average user wants.
Mine won't. I tried. But don't let that put anyone else off.
Some might, if you swapped to a business account, and paid them business-class money.

i.e., lots
ID: 67142 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4342
Credit: 16,499,590
RAC: 5,672
Message 67149 - Posted: 30 Dec 2022, 16:14:02 UTC - in response to Message 67142.  

Some might, if you swapped to a business account, and paid them business-class money.
My current supplier who piggy backs off the BT infrastructure don't have anything faster in my street anyway however much i am willing to pay.
ID: 67149 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1056
Credit: 16,520,943
RAC: 1,212
Message 67161 - Posted: 31 Dec 2022, 2:50:51 UTC - in response to Message 67128.  

If it won't fit, get a bigger case or as I do just leave the side off.


I cannot leave side off. The case is interlocked with the power supply so the power supply instantly shuts off if you even move the lever that opens it. And I doubt that would help all that much.

These are temperatures right now with about half the cores idle (no WCG, no Rosetta, only two instead of five CPDN).
Room temperature 75°F. When room temperature is twenty degrees hotter in six or seven months from now the box will also be that much hotter. And when the cores get to 88.0°C, I must cut the cores running Boinc from 12 to 8 or even 6 or 7.

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +65.0°C  (high = +88.0°C, crit = +98.0°C)
Core 1:        +65.0°C  (high = +88.0°C, crit = +98.0°C)
Core 2:        +62.0°C  (high = +88.0°C, crit = +98.0°C)
Core 3:        +61.0°C  (high = +88.0°C, crit = +98.0°C)
Core 5:        +60.0°C  (high = +88.0°C, crit = +98.0°C)
Core 8:        +59.0°C  (high = +88.0°C, crit = +98.0°C)
Core 9:        +64.0°C  (high = +88.0°C, crit = +98.0°C)
Core 11:       +61.0°C  (high = +88.0°C, crit = +98.0°C)
Core 12:       +61.0°C  (high = +88.0°C, crit = +98.0°C)

amdgpu-pci-6500
Adapter: PCI adapter
vddgfx:       +0.96 V
fan1:        2055 RPM  (min = 1800 RPM, max = 6000 RPM)
edge:         +44.0°C  (crit = +97.0°C, hyst = -273.1°C)
power1:        9.15 W  (cap =  25.00 W)

dell_smm-virtual-0
Adapter: Virtual device
fan1:        4273 RPM
fan2:        1126 RPM
fan3:        3920 RPM

ID: 67161 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 87
Credit: 32,917,978
RAC: 14,911
Message 67167 - Posted: 31 Dec 2022, 10:29:03 UTC - in response to Message 67131.  

I wonder if your ISP might adjust the ratio? Presumably the line is capable of a certain total throughput. If you tell them up is more important than the average person, they might make an adjustment. It could be very easily programmable at their end without him leaving his desk. they've just guessed what the average user wants.

Mine won't either, probably because they sell business class Internet too. I can get a maximal of 35Mbps upload if I pay them US$250/month, but I suspect the upload configuration might be negotiable unlike the residential plans.
ID: 67167 · Report as offensive     Reply Quote
xii5ku

Send message
Joined: 27 Mar 21
Posts: 79
Credit: 78,302,757
RAC: 1,077
Message 67263 - Posted: 3 Jan 2023, 22:04:41 UTC - in response to Message 67119.  

Glenn Carver wrote:
the next resolution configuration up from the one currently being run will need ~12 Gb, the one after that ~22Gb. As the resolution goes higher, the model timestep has to decrease so runtimes get longer.
Is there already an estimation of result data size too? (Same, or proportional with spatial resolution, or additionally also increasing with the number of time steps?)

Regarding spatial resolution: Does the need for higher resolution concern just the two horizontal dimensions, or also the vertical?

I'm curious how well the problem can be parallelized, i.e. whether or not significant serial parts will remain.

--------
Until several years ago, I sometimes had to run some massive multiphysics simulations in my engineering job; model setup and postprocessing was part of the deal as well. At first I ran this stuff on an Ethernet cluster of office PCs, over night outside office hours, which was a drag. So I eventually built two dual-socket compute servers for the purpose. The solver was explicit code IIRC, hence computer time was proportional to (1/grid spacing)^4. Parallelization was a headache, as I had to subdivide the model grid for this. A singlethreaded process would work on each subgrid, and these processes interacted via MPI to exchange results at the interfaces between subgrids. Computer utilization was best if all subgrids required similar numeric effort, yet certain locations in the model were taboo for subdivisions. Deciding on a grid resolution always required to compromise between what it took to keep errors in check vs. the available time to get the job done. Also, for certain problems it would have been nice to build very large grids, in order to keep the outer boundaries with their artificial conditions far away from where the action was happening in the model, but again the job deadlines required compromises.

The mentioned two dual-socket servers brought a big improvement over the Ethernet cluster, both in throughput and in convenience. What I'm doing in my day job has changed since (no humongous simulations anymore), but I still have the two servers. They now make up a tiny part of the hardware which I accumulated at home just for the Distributed Computing hobby. :-)
ID: 67263 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 804
Credit: 13,568,734
RAC: 7,025
Message 67267 - Posted: 3 Jan 2023, 22:56:47 UTC - in response to Message 67263.  

Glenn Carver wrote:
the next resolution configuration up from the one currently being run will need ~12 Gb, the one after that ~22Gb. As the resolution goes higher, the model timestep has to decrease so runtimes get longer.
Is there already an estimation of result data size too? (Same, or proportional with spatial resolution, or additionally also increasing with the number of time steps?)

Good question. The GRIB output file format used by the model is a lossy compression format. Which means output filesize increases much more slower as the resolution increases - typically fields are reduced to 12bits from 64. The frequency of output would be the same, i.e. 12 or 24 simulated hours, regardless of the model timestep.

Regarding spatial resolution: Does the need for higher resolution concern just the two horizontal dimensions, or also the vertical?
Just the horizontal. Increasing vertical as well would mean too much extra memory and it makes scientific sense to keep a constant vertical configuration across different horizontal resolutions.

I'm curious how well the problem can be parallelized, i.e. whether or not significant serial parts will remain.
IFS (& OpenIFS) uses both MPI domain decomposition and OpenMP. The code has algorithms which compute the optimum number of gridpoints per sub-domain in order to reduce the imbalance from compute time per domain. MPI tasks are used per compute node (actually each NUMA node), and non-blocking MPI communication is used across the fabric to have the comms in parallel with the compute. OpenMP is used at the outer loop level to make use of the cores/threads on each node. Hyperthreading isn't always used - depends on the hardware. Blocking algorithms are used in the inner loops, the aim being to vectorize as much as possible with good cache use.

On the big HPC machines, typically 8-12 OpenMP threads are used. MPI parallelism scales very well as long as there is enough for each task to do. I/O is also parallelized with a separate parallel I/O server used to hand off the model output data to (OpenIFS doesn't include this complex layer, all output is serial). I can't remember off the top of my head what the remaining serial %age time. There are some nice stats on the IFS performance in reports from the ECMWF High Performance Computing workshops - proceedings will be online.

The part of the model that computes the wind and transport of meteorlogical variables uses 'halos' around the grid areas, that is, extended boundaries which are filled by MPI communication with nearest neighbours, much like your outer boundary example below. The longer the model timestep, the larger the halo needs to be. Rather than fill all the halo region, the model has a clever algorithm which has a 'halo-on-demand' approach, only filling points that are 'blowing into' the region from the wind direction rather than blowing out.

An awful lot of work has been done by some very smart computational scientists at ECMWF to make IFS as efficient as possible on very large parallel systems.

xii5ku wrote:
--------
Until several years ago, I sometimes had to run some massive multiphysics simulations in my engineering job; model setup and postprocessing was part of the deal as well. At first I ran this stuff on an Ethernet cluster of office PCs, over night outside office hours, which was a drag. So I eventually built two dual-socket compute servers for the purpose. The solver was explicit code IIRC, hence computer time was proportional to (1/grid spacing)^4. Parallelization was a headache, as I had to subdivide the model grid for this. A singlethreaded process would work on each subgrid, and these processes interacted via MPI to exchange results at the interfaces between subgrids. Computer utilization was best if all subgrids required similar numeric effort, yet certain locations in the model were taboo for subdivisions. Deciding on a grid resolution always required to compromise between what it took to keep errors in check vs. the available time to get the job done. Also, for certain problems it would have been nice to build very large grids, in order to keep the outer boundaries with their artificial conditions far away from where the action was happening in the model, but again the job deadlines required compromises.

The mentioned two dual-socket servers brought a big improvement over the Ethernet cluster, both in throughput and in convenience. What I'm doing in my day job has changed since (no humongous simulations anymore), but I still have the two servers. They now make up a tiny part of the hardware which I accumulated at home just for the Distributed Computing hobby. :-)
ID: 67267 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 804
Credit: 13,568,734
RAC: 7,025
Message 67268 - Posted: 3 Jan 2023, 23:04:01 UTC

xii5ku: I forgot to add that similar to your ethernet parallel model, someone put OpenIFS on a small network of Raspberry Pi using ethernet.

See: https://samhatfield.co.uk/2019/05/21/the_raspberry_pi_planet_simulator/ and https://www.ecmwf.int/en/about/media-centre/science-blog/2019/weather-forecasts-openifs-home-made-supercomputer. Sam was a smart lad, ECMWF employed him!
ID: 67268 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 24,515,124
RAC: 1,691
Message 67279 - Posted: 4 Jan 2023, 8:52:35 UTC

By looking at the RAM requirements for the next OpenIFS models I wonder what kind of rig/computer I would need to upgrade to. My best machine is i7-4790 with 16 GB RAM, which collapsed under 4 OIFS's WUs running in parallel (I do not have app_config.xml but plan to have one) and trashed few WUs.

Meanwhile the info on OpenIFS running on RasPI got me excited that maybe new ARMs could be an efficient way to go, but then I remember CPDN may not work on ARM for the foreseeable future, so yeah I guess some Ryzen box will be.

Anyway some advice from CPDN fellow users on what to upgrade to will be appreciated. Since I plan to use it 24/7 it should not be power hungry beast = good balance btw output and electricity bill. :)
ID: 67279 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4342
Credit: 16,499,590
RAC: 5,672
Message 67286 - Posted: 4 Jan 2023, 9:59:54 UTC - in response to Message 67279.  

Anyway some advice from CPDN fellow users on what to upgrade to will be appreciated. Since I plan to use it 24/7 it should not be power hungry beast = good balance btw output and electricity bill. :)


AMD Ryzen 7 3700X is what I have, currently with 32GB RAM. I need to up the RAM to at least 64GB possibly more to future proof. I will also get the next speed up in memory as I know that someone with the same spec except for faster memory runs tasks noticeably faster.
ID: 67286 · Report as offensive     Reply Quote
gemini8

Send message
Joined: 4 Dec 15
Posts: 52
Credit: 2,182,959
RAC: 836
Message 67289 - Posted: 4 Jan 2023, 10:26:49 UTC - in response to Message 67286.  

AMD Ryzen 7 3700X is what I have, currently with 32GB RAM.

Same choice that I took some years ago for one of my machines.
Nice and solid.

Got myself a new machine last week and chose the Ryzen 7 5700X and dropped 64GB RAM into the board.
Could add another 64GB.
As I'm supporting a lot of Boinc projects I went for a reasonable energy footprint (CPU has 65W TDP), as much threads as possible (sixteen), a processor with decent L3 cache (32MB) and enough RAM (chose a board that supports up to 128GB). Also added a GTX 1660 with six gigs of VRAM to crunch on it.
I put all that into an ITX case and had to take the CPU frequency down to 2.2GHz due to cooling reasons. Will soon enhance this situation with another cooler, but not running at full specs doesn't really bother me, as it also reduces energy consumption and is still way more power than I need for my everyday stuff.
- - - - - - - - - -
Greetings, Jens
ID: 67289 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 804
Credit: 13,568,734
RAC: 7,025
Message 67293 - Posted: 4 Jan 2023, 11:19:44 UTC - in response to Message 67279.  

bernard_ivo wrote:
By looking at the RAM requirements for the next OpenIFS models I wonder what kind of rig/computer I would need to upgrade to. My best machine is i7-4790 with 16 GB RAM, which collapsed under 4 OIFS's WUs running in parallel (I do not have app_config.xml but plan to have one) and trashed few WUs.

Meanwhile the info on OpenIFS running on RasPI got me excited that maybe new ARMs could be an efficient way to go, but then I remember CPDN may not work on ARM for the foreseeable future, so yeah I guess some Ryzen box will be.

Anyway some advice from CPDN fellow users on what to upgrade to will be appreciated. Since I plan to use it 24/7 it should not be power hungry beast = good balance btw output and electricity bill. :)
Great question, I've been reading the release info on the new Intel 13 gen chips, esp the new lower power ones for a possible next build.

Couple of thoughts:
1/ ARM : OpenIFS works fine on ARM. The model is very tunable to different hardware. The list of OpenIFS apps on the CPDN webpage includes an OpenIFS ARM app. It's really a question of what code compiler is available for the hardware. As things stand there are no plans to do any more with the ARM OpenIFS app.
2/ An older slower chip is not a problem, but aim to optimize throughput (more on this below). I have a 3rd gen intel & mini-PC with a Ryzen 3550H; they work fine.
3/ Don't try to run 1 task per thread. OpenIFS moves alot of memory around and running too many tasks causes high memory pressure reducing overall runtimes.
4/ I would say go for fastest single core speed & the most, fastest, RAM you can afford as priority. Don't worry too much about core count or cache size, as I said OIFS moves alot of data in/out of RAM. It's more about having a balanced machine for a code like OpenIFS.
5/ Slow and steady wins the race. If you can run 24/7 with boincmgr Computing options set to 'Use 100% of cpu', that will work very well and avoid costly restarts.

I optimize throughput to get as many tasks completed per week (or day). For me that mean running ~1 less task than cores (not threads) as I know more than that slows the model down (can see by looking at CPU time per step, 4th column in the slots/?/ifs.stat file). Avoid swapping at all costs; use 'vmstat 1' and check the 'si' & 'so' columns (swap pages in/ swap pages out) to make sure no swapping is taking place.

Performance will depend on every part of the system, motherboard, RAM speed, I/O etc. It's more about having a balanced system than CPU power. IFS is often used as a benchmark by HPC vendors because it exercises the whole system. Hope that's useful.
ID: 67293 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 24,515,124
RAC: 1,691
Message 67295 - Posted: 4 Jan 2023, 11:34:20 UTC - in response to Message 67293.  

Thanks Glenn,

Much appreciated. I've learnt the hard way to limit the OpenIFS WU's running in parallel

4/ I would say go for fastest single core speed & the most, fastest, RAM you can afford as priority. Don't worry too much about core count or cache size, as I said OIFS moves alot of data in/out of RAM. It's more about having a balanced machine for a code like OpenIFS.

This seems to rule out older Xeon's v2, v3 that are now more available is second hand workstations and servers. HadAM4 at N216 resolution is heavy on both RAM and L3, so is L3 a valid bottleneck to consider or not as much as available RAM and CPU speed?
ID: 67295 · Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 6 Jul 06
Posts: 141
Credit: 3,511,752
RAC: 144,072
Message 67296 - Posted: 4 Jan 2023, 11:51:17 UTC
Last modified: 4 Jan 2023, 11:52:17 UTC

I saw some test results with the AMD RYZEN 5950X, RYZEN 7950X, INTEL 12900 and INTEL 13900 (I think they were the model names).

When all under full load for what ever test they were doing

RYZEN 9 5950X used 130 Watts
RYZEN 9 7950X used 270 Watts (or there abouts)
INTEL 12900 used 285-290 Watts (or there abouts)
INTEL 13900 used 315 Watts (or there abouts)

Can't point you to the tests but they were on Youtube along with other showing similar results.

So the RYZEN 5950X may not be as powerful as the new models but for energy efficiency hard to beat.

That's of course if you can find them, they are getting harder to find.

I run a RYZEN 9 5900X which has 12 cores + 12 threads which should use even less power as it has less cores than the 5950X.
It has 64 GB of RAM and along with a full compliment of other BOINC projects easily runs 9 CPDN work units at a time. Only gets to about 42 GB max depending what I am running at the time (everything not just CPDN) (it may get higher than 42 GB but I have the head room to cover that.)

BOINC has not downloaded more than 9 work units at any one time, probably because I am running a lot of other projects at the same time.

Conan
ID: 67296 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 804
Credit: 13,568,734
RAC: 7,025
Message 67299 - Posted: 4 Jan 2023, 12:32:42 UTC - in response to Message 67295.  

This seems to rule out older Xeon's v2, v3 that are now more available is second hand workstations and servers. HadAM4 at N216 resolution is heavy on both RAM and L3, so is L3 a valid bottleneck to consider or not as much as available RAM and CPU speed?
I'd also been looking at the 2nd hand market with Xeons. I have little experience with HadAM4. I don't think cache size is such a big deal for these models. When I was still working at ECMWF and spent time optimizing the code, we would always look to minimise TLB misses and page faults (see: https://stackoverflow.com/questions/37825859/cache-miss-a-tlb-miss-and-page-fault), because they are relatively speaking much more expensive than going into L3 cache. Bear in mind that when I tune the model for CPDN, I do it on a Intel chip with a smaller L3 cache than AMD's. So the larger L3 is unlikely to get you much more perfornance.

It's a very difficult question to answer without trying it, because it really does depend on the all round balance of hardware. If you had a good Xeon with a motherboard & memory that could keep up (and maybe swap out a HDD to SSD), why not? I'm not sure about power draw for these older machines - that was one thing that put me off (that and the size of the cases!).

There are some Xeons running these batches. I can have a look through the logs and report back what runtimes they give, if that would help?
ID: 67299 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 24,515,124
RAC: 1,691
Message 67302 - Posted: 4 Jan 2023, 13:03:04 UTC - in response to Message 67299.  

There are some Xeons running these batches. I can have a look through the logs and report back what runtimes they give, if that would help?

I do not want to divert you from your much more important work to do. Perhaps such stats could be exported at the CPDN site (bit more useful than https://www.cpdn.org/cpu_list.php), rather than digging manually via logs (as most of us do when looking for sec/TS figures). There is the WUProp@Home project that does collect some metrics from BOINC machines and allows comparison between different hardware, but haven't check it recently and not sure how many CPDN users are on it.

I realise there is no easy answer on what hardware to use for CPDN, so all the contributions so far are very useful. Thanks
ID: 67302 · Report as offensive     Reply Quote
Vato

Send message
Joined: 4 Oct 19
Posts: 13
Credit: 7,300,561
RAC: 14,819
Message 67303 - Posted: 4 Jan 2023, 13:14:24 UTC - in response to Message 67302.  

There is the WUProp@Home project that does collect some metrics from BOINC machines and allows comparison between different hardware, but haven't check it recently and not sure how many CPDN users are on it.


71 running OpenIFS according to https://stats.free-dc.org/wupropapp/climateprediction.netOpenIFS%2043r3%20Perturbed%20Surface so there might be some useful data there
ID: 67303 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

Message boards : Number crunching : Hardware for new models.

©2024 climateprediction.net