climateprediction.net home page
New work Discussion

New work Discussion

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 87 · 88 · 89 · 90 · 91 · Next

AuthorMessage
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 65996 - Posted: 25 Aug 2022, 20:50:40 UTC - in response to Message 65994.  
Last modified: 25 Aug 2022, 20:55:40 UTC

My usual problem is dropped connections on short files - they can't handle the concurrency.

I changed these in my cc_conif.xml:
 <max_file_xfers>1</max_file_xfers>
 <max_file_xfers_per_project>1</max_file_xfers_per_project>

They are normally set to 8 and 4 respectively.

It fixed my last 7 stuck downloads. Thanks for the tip.

EDIT: "2" seems to work OK also.
ID: 65996 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 669
Credit: 4,391,754
RAC: 6,918
Message 65997 - Posted: 25 Aug 2022, 20:59:25 UTC - in response to Message 65996.  
Last modified: 25 Aug 2022, 21:11:34 UTC

My usual problem is dropped connections on short files - they can't handle the concurrency.

I changed these in my cc_conif.xml:
 <max_file_xfers>1</max_file_xfers>
 <max_file_xfers_per_project>1</max_file_xfers_per_project>

They are normally set to 8 and 4 respectively.

It fixed my last 7 stuck downloads. Thanks for the tip.

EDIT: "2" seems to work OK also.
Good idea, I'll reduce mine to 8 and 1. There's no point me trying to get 8 at once when most projects can give me files almost as fast as my fibre, and those that can't are going to get overloaded by asking for several at once. But I'll leave the first figure on 8 incase one project is slow so others can get through.
ID: 65997 · Report as offensive
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 65998 - Posted: 26 Aug 2022, 11:58:55 UTC

OpenIFS 43r3 ARM
There are ITX boards available on which you can mount multiple Raspberry Pi 4B compute models and some boards have a GPU slot also available. Each Raspberry Pi 4B has an 8GB RAM flavour available. These Pi4B compute models are cheap and powerful with four cores. Windows for ARM is a no-go, so Linux it will be.
My question is, can we run OpenIFS 43r3 ARM on a cluster of these?
https://hackaday.com/2021/11/28/this-raspberry-pi-mini-itx-board-has-tons-of-io/
ID: 65998 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 774
Credit: 13,433,329
RAC: 7,110
Message 65999 - Posted: 26 Aug 2022, 15:09:40 UTC - in response to Message 65998.  

OpenIFS 43r3 ARM
There are ITX boards available on which you can mount multiple Raspberry Pi 4B compute models and some boards have a GPU slot also available. Each Raspberry Pi 4B has an 8GB RAM flavour available. These Pi4B compute models are cheap and powerful with four cores. Windows for ARM is a no-go, so Linux it will be.
My question is, can we run OpenIFS 43r3 ARM on a cluster of these?
https://hackaday.com/2021/11/28/this-raspberry-pi-mini-itx-board-has-tons-of-io/
OpenIFS has already been run on Pi's. See: https://www.ecmwf.int/en/about/media-centre/science-blog/2019/weather-forecasts-openifs-home-made-supercomputer . I helped Sam set this up, it was a great demonstrator that he took to science fairs. ECMWF gave him a job after he finished at uni. I appreciate you're talking about something different but it demonstrates Pis will run the model.

How do the multiple Pi present themselves? If you had 2 Pi's would the system see 8 cores with a total of 16Gb RAM shared between them? Or would it see 2 separate compute nodes with only 8Gb addressable by each Pi?

I ask because (a) OpenIFS needs a total 16Gb minimum to do anything useful; (b) although OpenIFS supports both MPI & OpenMP parallelism, I stripped out MPI to reduce the memory footprint. In the article above, we used MPI to communicate across ethernet between the Pi's, as OpenMP needs shared memory. But CPDN only use the shared memory option in OpenIFS.

So, yes, I'm sure the system would run OpenIFS (maybe with a bit of hacking), but not for any useful work in CPDN.
ID: 65999 · Report as offensive
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 233
Credit: 31,062,873
RAC: 30,045
Message 66000 - Posted: 26 Aug 2022, 17:57:51 UTC - in response to Message 65999.  

How do the multiple Pi present themselves? If you had 2 Pi's would the system see 8 cores with a total of 16Gb RAM shared between them? Or would it see 2 separate compute nodes with only 8Gb addressable by each Pi?


They would appear as two entirely separate Pi4s, each with 4C/8GB.
ID: 66000 · Report as offensive
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 66001 - Posted: 27 Aug 2022, 7:40:19 UTC

There would be teething troubles but Open IFS for ARM, if you see the applications page of CPDN. Which ARM are they talking about? The only ARMs I know of are Single Board Computers.
ID: 66001 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,379,331
RAC: 3,596
Message 66002 - Posted: 27 Aug 2022, 8:41:43 UTC - in response to Message 66001.  

With the ITX boards that can take multiple Pi4B boards on them, my guess is it would work a bit like a render farm. I assume it would need one core from one of the Pis to manage how the work is spread around the rest. In the short term however, I don't see the advantage as for the same price as one of the boards you can mount several 4Bs boards on, you can currently get more power from a Ryzen or Intel solution.
ID: 66002 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 774
Credit: 13,433,329
RAC: 7,110
Message 66003 - Posted: 27 Aug 2022, 18:53:24 UTC - in response to Message 66001.  

There would be teething troubles but Open IFS for ARM, if you see the applications page of CPDN. Which ARM are they talking about? The only ARMs I know of are Single Board Computers.
The hardware isn't a problem. OpenIFS has been run on multiple ARM platforms. For example, the UK Isambard system (see: https://www.archer.ac.uk/training/virtual/2018-12-12-Isambard/archer-webinar-dec-2018.pdf).The compiler can be more of an issue sometimes if it doesn't support some of the modern fortran features the model uses, but usually it's just a case of tuning the model code to work efficiently with processor cache sizes etc.

I'm not sure how much I'm allowed to say about the AMD reference on the CPDN Applications page but again, there was no issue applying the model to this hardware.

In order to use the ITX board with multiple Raspberry Pi's we'd need the full OpenIFS model code with MPI+OpenMPI rather than the OpenMP-only version that CPDN use; it would work. But, as I said, the available memory on the Pi's would limit the model to nothing more than a demonstrator and not sufficient for the work that CPDN needs. It's cheap hardware, great for certain applications not for running weather models.
ID: 66003 · Report as offensive
wateroakley

Send message
Joined: 6 Aug 04
Posts: 185
Credit: 27,083,655
RAC: 6,161
Message 66004 - Posted: 27 Aug 2022, 21:01:22 UTC - in response to Message 65991.  

That's for another forum. I've posted about it at WCG.
Off topic ... spent some time today resetting config.xml and manually getting WCG to download Africa rainfall tasks If that's the necessary workload, I'll give it a miss.
ID: 66004 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 669
Credit: 4,391,754
RAC: 6,918
Message 66005 - Posted: 27 Aug 2022, 22:00:18 UTC - in response to Message 66004.  
Last modified: 27 Aug 2022, 22:02:32 UTC

That's for another forum. I've posted about it at WCG.
Off topic ... spent some time today resetting config.xml and manually getting WCG to download Africa rainfall tasks If that's the necessary workload, I'll give it a miss.
I've got 90 CPU cores and 12 GPUs and 3 Android phones running WCG flat out. I find it fun pestering the server. It makes Boinc more involved. The blast of heat as I walk past a garage window is absurd. I'm either going to solve every world problem, or use up all the electricity :-)
ID: 66005 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1045
Credit: 16,506,818
RAC: 5,001
Message 66009 - Posted: 28 Aug 2022, 15:22:29 UTC - in response to Message 65995.  
Last modified: 28 Aug 2022, 15:24:14 UTC

Would be better if they were zipped into a smaller number of larger files surely.


I think many of the files are the same between different issuances of of work units, where other files may just be the initial conditions that vary more often. So to send them as a zip file would require zipping every work unit, including the initial conditions, before sending. May be too much trouble (and load) on the server.

I would be more interested in getting work units for ClimatePrediction.
ID: 66009 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 669
Credit: 4,391,754
RAC: 6,918
Message 66010 - Posted: 28 Aug 2022, 20:27:38 UTC - in response to Message 66009.  

I think many of the files are the same between different issuances of of work units, where other files may just be the initial conditions that vary more often. So to send them as a zip file would require zipping every work unit, including the initial conditions, before sending. May be too much trouble (and load) on the server.
I'm seeing huge numbers of tiny files for GPU Covid. I've done thousands of those tasks now, and they still send loads of files. So either they are different, or the server isn't acknowledging I already have that file. With most projects, if I come back after a while, or every so often, there's a big dataset that gets downloaded just once, then the task files are smaller.
ID: 66010 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 925
Credit: 34,100,818
RAC: 11,270
Message 66011 - Posted: 28 Aug 2022, 22:10:33 UTC - in response to Message 66010.  
Last modified: 28 Aug 2022, 22:10:55 UTC

I'm seeing huge numbers of tiny files for GPU Covid. I've done thousands of those tasks now, and they still send loads of files. So either they are different, or the server isn't acknowledging I already have that file. With most projects, if I come back after a while, or every so often, there's a big dataset that gets downloaded just once, then the task files are smaller.
Only one project: Einstein@home. It's called "locality scheduling", and their server was specially enhanced by their boss, Bruce Allen. Everybody else does it their own way.

Technical detail: that only applies to the Gravity Wave search, using data from the LIGO detectors. Those are large, exquisitely detailed, datasets, recorded by the 4 km laser interferometers. Thousands of individual workunits are created to scan them every which way possible. You don't do that with nano-scale protein molecules. They are, indeed, all different.
ID: 66011 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 669
Credit: 4,391,754
RAC: 6,918
Message 66012 - Posted: 28 Aug 2022, 22:27:13 UTC - in response to Message 66011.  
Last modified: 28 Aug 2022, 22:30:59 UTC

Only one project: Einstein@home. It's called "locality scheduling", and their server was specially enhanced by their boss, Bruce Allen. Everybody else does it their own way.

Technical detail: that only applies to the Gravity Wave search, using data from the LIGO detectors. Those are large, exquisitely detailed, datasets, recorded by the 4 km laser interferometers. Thousands of individual workunits are created to scan them every which way possible. You don't do that with nano-scale protein molecules. They are, indeed, all different.
I'm sure I've seen it elsewhere, LHC for example. But yes i guess with virus research those files are always different. Zipping might help, but is probably no easier than fixing the duff network switch or whatever the 14 billion dollar company can't afford to replace!

I'm assuming not many people are able to get so many GPU workunits, since i've managed to get from 10,290th to 8,766th place in 1 day. About 12 Tahiti-grade AMD GPUs running it 24/7, 4Tflop rating each.
ID: 66012 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 774
Credit: 13,433,329
RAC: 7,110
Message 66013 - Posted: 29 Aug 2022, 8:44:59 UTC - in response to Message 66009.  

Would be better if they were zipped into a smaller number of larger files surely.
I think many of the files are the same between different issuances of work units, where other files may just be the initial conditions that vary more often. So to send them as a zip file would require zipping every work unit, including the initial conditions, before sending. May be too much trouble (and load) on the server.
It helps the server, not hinders it. CPDN works by zipping the files before loading onto the server - saves storage on the server, reduces no. of connections from clients and reduces download time because the total download size is now less due to compression. The 'zipping' is done by the scientist. That includes invariant files that are always needed for every experiment, plus files that vary per experiment such as initial conditions. It's a nobrainer really. The client unzips before starting the task.

I would be more interested in getting work units for ClimatePrediction.
They are coming. I'm in Oxford next week to chat to the team about setting up tests for the higher resolution multicore jobs. But since I'm retired and not getting paid, I'll go at my own pace. Though keen to demonstrate the capability that might get (younger) scientists interested in using the platform. After that I plan to work on implementing OpenIFS in VMs for the Windows & mac platforms, plus few other things.
ID: 66013 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 669
Credit: 4,391,754
RAC: 6,918
Message 66014 - Posted: 29 Aug 2022, 8:48:40 UTC - in response to Message 66013.  

After that I plan to work on implementing OpenIFS in VMs for the Windows & mac platforms, plus few other things.
This pleases me. I take it Windows will then run a Virtualbox job much like LHC?
ID: 66014 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,379,331
RAC: 3,596
Message 66015 - Posted: 29 Aug 2022, 10:35:25 UTC - in response to Message 66014.  

This pleases me. I take it Windows will then run a Virtualbox job much like LHC?
That is the plan.
ID: 66015 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 669
Credit: 4,391,754
RAC: 6,918
Message 66016 - Posted: 29 Aug 2022, 20:16:59 UTC - in response to Message 66015.  
Last modified: 29 Aug 2022, 20:18:20 UTC

This pleases me. I take it Windows will then run a Virtualbox job much like LHC?
That is the plan.
And with the much higher resolution, there will plenty of work for all :-)

Glenn really should get paid for this. When I worked in a university, we found grant money to pay for wages to hire folk like that.
ID: 66016 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,379,331
RAC: 3,596
Message 66027 - Posted: 1 Sep 2022, 10:13:03 UTC

More of the HADCM3S in testing. Still no clue as to how long before these mean more work for Macs or of the time scale for any other new work. :(
ID: 66027 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1045
Credit: 16,506,818
RAC: 5,001
Message 66030 - Posted: 1 Sep 2022, 13:45:28 UTC - in response to Message 66027.  

I got no ClimatePrediction tasks for my Linux Machine in August even though it was up the whole time.

top - 09:38:26 up 30 days, 6 min, 1 user, load average: 8.09, 8.18, 8.21
Tasks: 456 total, 9 running, 447 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.6 us, 5.6 sy, 44.2 ni, 49.6 id, 0.0 wa, 0.1 hi, 0.0 si, 0.0 st
MiB Mem : 63772.8 total, 553.5 free, 5004.2 used, 58215.0 buff/cache
MiB Swap: 15992.0 total, 15240.0 free, 752.0 used. 57826.2 avail Mem
ID: 66030 · Report as offensive
Previous · 1 . . . 87 · 88 · 89 · 90 · 91 · Next

Message boards : Number crunching : New work Discussion

©2024 climateprediction.net