New work Discussion

Author	Message
Jim1348 Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653	Message 65996 - Posted: 25 Aug 2022, 20:50:40 UTC - in response to Message 65994. Last modified: 25 Aug 2022, 20:55:40 UTC My usual problem is dropped connections on short files - they can't handle the concurrency. I changed these in my cc_conif.xml: <max_file_xfers>1</max_file_xfers> <max_file_xfers_per_project>1</max_file_xfers_per_project> They are normally set to 8 and 4 respectively. It fixed my last 7 stuck downloads. Thanks for the tip. EDIT: "2" seems to work OK also. ID: 65996 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 65997 - Posted: 25 Aug 2022, 20:59:25 UTC - in response to Message 65996. Last modified: 25 Aug 2022, 21:11:34 UTC My usual problem is dropped connections on short files - they can't handle the concurrency. I changed these in my cc_conif.xml: <max_file_xfers>1</max_file_xfers> <max_file_xfers_per_project>1</max_file_xfers_per_project> They are normally set to 8 and 4 respectively. It fixed my last 7 stuck downloads. Thanks for the tip. EDIT: "2" seems to work OK also. Good idea, I'll reduce mine to 8 and 1. There's no point me trying to get 8 at once when most projects can give me files almost as fast as my fibre, and those that can't are going to get overloaded by asking for several at once. But I'll leave the first figure on 8 incase one project is slow so others can get through. ID: 65997 ·

KAMasud Send message Joined: 6 Oct 06 Posts: 204 Credit: 7,608,986 RAC: 0	Message 65998 - Posted: 26 Aug 2022, 11:58:55 UTC OpenIFS 43r3 ARM There are ITX boards available on which you can mount multiple Raspberry Pi 4B compute models and some boards have a GPU slot also available. Each Raspberry Pi 4B has an 8GB RAM flavour available. These Pi4B compute models are cheap and powerful with four cores. Windows for ARM is a no-go, so Linux it will be. My question is, can we run OpenIFS 43r3 ARM on a cluster of these? https://hackaday.com/2021/11/28/this-raspberry-pi-mini-itx-board-has-tons-of-io/ ID: 65998 ·

Glenn Carver Send message Joined: 29 Oct 17 Posts: 885 Credit: 14,195,727 RAC: 12,314	Message 65999 - Posted: 26 Aug 2022, 15:09:40 UTC - in response to Message 65998. OpenIFS 43r3 ARM There are ITX boards available on which you can mount multiple Raspberry Pi 4B compute models and some boards have a GPU slot also available. Each Raspberry Pi 4B has an 8GB RAM flavour available. These Pi4B compute models are cheap and powerful with four cores. Windows for ARM is a no-go, so Linux it will be. My question is, can we run OpenIFS 43r3 ARM on a cluster of these? https://hackaday.com/2021/11/28/this-raspberry-pi-mini-itx-board-has-tons-of-io/ OpenIFS has already been run on Pi's. See: https://www.ecmwf.int/en/about/media-centre/science-blog/2019/weather-forecasts-openifs-home-made-supercomputer . I helped Sam set this up, it was a great demonstrator that he took to science fairs. ECMWF gave him a job after he finished at uni. I appreciate you're talking about something different but it demonstrates Pis will run the model. How do the multiple Pi present themselves? If you had 2 Pi's would the system see 8 cores with a total of 16Gb RAM shared between them? Or would it see 2 separate compute nodes with only 8Gb addressable by each Pi? I ask because (a) OpenIFS needs a total 16Gb minimum to do anything useful; (b) although OpenIFS supports both MPI & OpenMP parallelism, I stripped out MPI to reduce the memory footprint. In the article above, we used MPI to communicate across ethernet between the Pi's, as OpenMP needs shared memory. But CPDN only use the shared memory option in OpenIFS. So, yes, I'm sure the system would run OpenIFS (maybe with a bit of hacking), but not for any useful work in CPDN. ID: 65999 ·

SolarSyonyk Send message Joined: 7 Sep 16 Posts: 262 Credit: 32,796,114 RAC: 9,447	Message 66000 - Posted: 26 Aug 2022, 17:57:51 UTC - in response to Message 65999. How do the multiple Pi present themselves? If you had 2 Pi's would the system see 8 cores with a total of 16Gb RAM shared between them? Or would it see 2 separate compute nodes with only 8Gb addressable by each Pi? They would appear as two entirely separate Pi4s, each with 4C/8GB. ID: 66000 ·

KAMasud Send message Joined: 6 Oct 06 Posts: 204 Credit: 7,608,986 RAC: 0	Message 66001 - Posted: 27 Aug 2022, 7:40:19 UTC There would be teething troubles but Open IFS for ARM, if you see the applications page of CPDN. Which ARM are they talking about? The only ARMs I know of are Single Board Computers. ID: 66001 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4389 Credit: 16,820,679 RAC: 5,944	Message 66002 - Posted: 27 Aug 2022, 8:41:43 UTC - in response to Message 66001. With the ITX boards that can take multiple Pi4B boards on them, my guess is it would work a bit like a render farm. I assume it would need one core from one of the Pis to manage how the work is spread around the rest. In the short term however, I don't see the advantage as for the same price as one of the boards you can mount several 4Bs boards on, you can currently get more power from a Ryzen or Intel solution. ID: 66002 ·

Glenn Carver Send message Joined: 29 Oct 17 Posts: 885 Credit: 14,195,727 RAC: 12,314	Message 66003 - Posted: 27 Aug 2022, 18:53:24 UTC - in response to Message 66001. There would be teething troubles but Open IFS for ARM, if you see the applications page of CPDN. Which ARM are they talking about? The only ARMs I know of are Single Board Computers. The hardware isn't a problem. OpenIFS has been run on multiple ARM platforms. For example, the UK Isambard system (see: https://www.archer.ac.uk/training/virtual/2018-12-12-Isambard/archer-webinar-dec-2018.pdf).The compiler can be more of an issue sometimes if it doesn't support some of the modern fortran features the model uses, but usually it's just a case of tuning the model code to work efficiently with processor cache sizes etc. I'm not sure how much I'm allowed to say about the AMD reference on the CPDN Applications page but again, there was no issue applying the model to this hardware. In order to use the ITX board with multiple Raspberry Pi's we'd need the full OpenIFS model code with MPI+OpenMPI rather than the OpenMP-only version that CPDN use; it would work. But, as I said, the available memory on the Pi's would limit the model to nothing more than a demonstrator and not sufficient for the work that CPDN needs. It's cheap hardware, great for certain applications not for running weather models. ID: 66003 ·

wateroakley Send message Joined: 6 Aug 04 Posts: 189 Credit: 27,228,000 RAC: 3,749	Message 66004 - Posted: 27 Aug 2022, 21:01:22 UTC - in response to Message 65991. That's for another forum. I've posted about it at WCG. Off topic ... spent some time today resetting config.xml and manually getting WCG to download Africa rainfall tasks If that's the necessary workload, I'll give it a miss. ID: 66004 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 66005 - Posted: 27 Aug 2022, 22:00:18 UTC - in response to Message 66004. Last modified: 27 Aug 2022, 22:02:32 UTC That's for another forum. I've posted about it at WCG. Off topic ... spent some time today resetting config.xml and manually getting WCG to download Africa rainfall tasks If that's the necessary workload, I'll give it a miss. I've got 90 CPU cores and 12 GPUs and 3 Android phones running WCG flat out. I find it fun pestering the server. It makes Boinc more involved. The blast of heat as I walk past a garage window is absurd. I'm either going to solve every world problem, or use up all the electricity :-) ID: 66005 ·

Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1093 Credit: 16,773,228 RAC: 3,816	Message 66009 - Posted: 28 Aug 2022, 15:22:29 UTC - in response to Message 65995. Last modified: 28 Aug 2022, 15:24:14 UTC Would be better if they were zipped into a smaller number of larger files surely. I think many of the files are the same between different issuances of of work units, where other files may just be the initial conditions that vary more often. So to send them as a zip file would require zipping every work unit, including the initial conditions, before sending. May be too much trouble (and load) on the server. I would be more interested in getting work units for ClimatePrediction. ID: 66009 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 66010 - Posted: 28 Aug 2022, 20:27:38 UTC - in response to Message 66009. I think many of the files are the same between different issuances of of work units, where other files may just be the initial conditions that vary more often. So to send them as a zip file would require zipping every work unit, including the initial conditions, before sending. May be too much trouble (and load) on the server. I'm seeing huge numbers of tiny files for GPU Covid. I've done thousands of those tasks now, and they still send loads of files. So either they are different, or the server isn't acknowledging I already have that file. With most projects, if I come back after a while, or every so often, there's a big dataset that gets downloaded just once, then the task files are smaller. ID: 66010 ·

Richard Haselgrove Send message Joined: 1 Jan 07 Posts: 959 Credit: 34,693,370 RAC: 10,946	Message 66011 - Posted: 28 Aug 2022, 22:10:33 UTC - in response to Message 66010. Last modified: 28 Aug 2022, 22:10:55 UTC I'm seeing huge numbers of tiny files for GPU Covid. I've done thousands of those tasks now, and they still send loads of files. So either they are different, or the server isn't acknowledging I already have that file. With most projects, if I come back after a while, or every so often, there's a big dataset that gets downloaded just once, then the task files are smaller. Only one project: Einstein@home. It's called "locality scheduling", and their server was specially enhanced by their boss, Bruce Allen. Everybody else does it their own way. Technical detail: that only applies to the Gravity Wave search, using data from the LIGO detectors. Those are large, exquisitely detailed, datasets, recorded by the 4 km laser interferometers. Thousands of individual workunits are created to scan them every which way possible. You don't do that with nano-scale protein molecules. They are, indeed, all different. ID: 66011 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 66012 - Posted: 28 Aug 2022, 22:27:13 UTC - in response to Message 66011. Last modified: 28 Aug 2022, 22:30:59 UTC Only one project: Einstein@home. It's called "locality scheduling", and their server was specially enhanced by their boss, Bruce Allen. Everybody else does it their own way. Technical detail: that only applies to the Gravity Wave search, using data from the LIGO detectors. Those are large, exquisitely detailed, datasets, recorded by the 4 km laser interferometers. Thousands of individual workunits are created to scan them every which way possible. You don't do that with nano-scale protein molecules. They are, indeed, all different. I'm sure I've seen it elsewhere, LHC for example. But yes i guess with virus research those files are always different. Zipping might help, but is probably no easier than fixing the duff network switch or whatever the 14 billion dollar company can't afford to replace! I'm assuming not many people are able to get so many GPU workunits, since i've managed to get from 10,290th to 8,766th place in 1 day. About 12 Tahiti-grade AMD GPUs running it 24/7, 4Tflop rating each. ID: 66012 ·

Glenn Carver Send message Joined: 29 Oct 17 Posts: 885 Credit: 14,195,727 RAC: 12,314	Message 66013 - Posted: 29 Aug 2022, 8:44:59 UTC - in response to Message 66009. Would be better if they were zipped into a smaller number of larger files surely. I think many of the files are the same between different issuances of work units, where other files may just be the initial conditions that vary more often. So to send them as a zip file would require zipping every work unit, including the initial conditions, before sending. May be too much trouble (and load) on the server. It helps the server, not hinders it. CPDN works by zipping the files before loading onto the server - saves storage on the server, reduces no. of connections from clients and reduces download time because the total download size is now less due to compression. The 'zipping' is done by the scientist. That includes invariant files that are always needed for every experiment, plus files that vary per experiment such as initial conditions. It's a nobrainer really. The client unzips before starting the task. I would be more interested in getting work units for ClimatePrediction. They are coming. I'm in Oxford next week to chat to the team about setting up tests for the higher resolution multicore jobs. But since I'm retired and not getting paid, I'll go at my own pace. Though keen to demonstrate the capability that might get (younger) scientists interested in using the platform. After that I plan to work on implementing OpenIFS in VMs for the Windows & mac platforms, plus few other things. ID: 66013 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 66014 - Posted: 29 Aug 2022, 8:48:40 UTC - in response to Message 66013. After that I plan to work on implementing OpenIFS in VMs for the Windows & mac platforms, plus few other things. This pleases me. I take it Windows will then run a Virtualbox job much like LHC? ID: 66014 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4389 Credit: 16,820,679 RAC: 5,944	Message 66015 - Posted: 29 Aug 2022, 10:35:25 UTC - in response to Message 66014. This pleases me. I take it Windows will then run a Virtualbox job much like LHC? That is the plan. ID: 66015 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 66016 - Posted: 29 Aug 2022, 20:16:59 UTC - in response to Message 66015. Last modified: 29 Aug 2022, 20:18:20 UTC This pleases me. I take it Windows will then run a Virtualbox job much like LHC? That is the plan. And with the much higher resolution, there will plenty of work for all :-) Glenn really should get paid for this. When I worked in a university, we found grant money to pay for wages to hire folk like that. ID: 66016 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4389 Credit: 16,820,679 RAC: 5,944	Message 66027 - Posted: 1 Sep 2022, 10:13:03 UTC More of the HADCM3S in testing. Still no clue as to how long before these mean more work for Macs or of the time scale for any other new work. :( ID: 66027 ·

Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1093 Credit: 16,773,228 RAC: 3,816	Message 66030 - Posted: 1 Sep 2022, 13:45:28 UTC - in response to Message 66027. I got no ClimatePrediction tasks for my Linux Machine in August even though it was up the whole time. top - 09:38:26 up 30 days, 6 min, 1 user, load average: 8.09, 8.18, 8.21 Tasks: 456 total, 9 running, 447 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.6 us, 5.6 sy, 44.2 ni, 49.6 id, 0.0 wa, 0.1 hi, 0.0 si, 0.0 st MiB Mem : 63772.8 total, 553.5 free, 5004.2 used, 58215.0 buff/cache MiB Swap: 15992.0 total, 15240.0 free, 752.0 used. 57826.2 avail Mem ID: 66030 ·