climateprediction.net (CPDN) home page
Thread 'New work discussion - 2'

Thread 'New work discussion - 2'

Message boards : Number crunching : New work discussion - 2
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 42 · Next

AuthorMessage
ProfileConan
Avatar

Send message
Joined: 6 Jul 06
Posts: 147
Credit: 3,615,496
RAC: 420
Message 66234 - Posted: 24 Oct 2022, 13:07:00 UTC

Thanks, that is what I thought, possibly the reason I could not get access before as well.

Thanks
Conan
ID: 66234 · Report as offensive
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 318
Credit: 14,976,910
RAC: 9,985
Message 66235 - Posted: 24 Oct 2022, 20:55:10 UTC

It's also quite oversubscribed, I rarely get dev tasks. There's also no credit and the risk of getting misconfigired workunits that can disrupt the client (eg wrong memory settings)

Sounds like the main site in most ways. Except on the main site there's a guarantee of getting a ton of misconfigured machines that disrupt the project by ruining tasks. Seems like it may not be a bad idea for CPDN to start doing some house cleaning, even if little by little. :-)
ID: 66235 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 66236 - Posted: 25 Oct 2022, 6:01:30 UTC

I see that OpenIFS 43r3 Baroclinic Lifecycle has appeared as a third OpenIFS task type.

Seems like it may not be a bad idea for CPDN to start doing some house cleaning, even if little by little. :-)


It used to happen on a regular basis. I suspect the main reason it stopped was it being seen as a lot of extra work for Andy for the amount gained by the project. Pretty sure they wouldn't want to give moderators the power to suspend the guilty machines as it would be difficult to do so without allowing access to so much more.

If OpenIFS becomes the dominant model type, the problem should largely disappear at least for the missing libraries issue which is the majority of dodgy computers on the project.
ID: 66236 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1051
Credit: 16,636,385
RAC: 11,909
Message 66237 - Posted: 25 Oct 2022, 19:29:14 UTC - in response to Message 66236.  

I see that OpenIFS 43r3 Baroclinic Lifecycle has appeared as a third OpenIFS task type.
Yes, I described this back in message https://www.cpdn.org/forum_thread.php?id=9149&postid=66191 I'm also working with a student at U. Oxford on another customized version of OpenIFS for seasonal forecasts with perturbations to the surface model. The plan is for several batches of ~3000 workunits each, though this will be a while yet as it's still being developed & tested. I'm not sure what the workunit count will be for the lifecycle model.
ID: 66237 · Report as offensive
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 66239 - Posted: 25 Oct 2022, 21:11:37 UTC - in response to Message 66237.  

I see that OpenIFS 43r3 Baroclinic Lifecycle has appeared as a third OpenIFS task type.
Yes, I described this back in message https://www.cpdn.org/forum_thread.php?id=9149&postid=66191 I'm also working with a student at U. Oxford on another customized version of OpenIFS for seasonal forecasts with perturbations to the surface model. The plan is for several batches of ~3000 workunits each, though this will be a while yet as it's still being developed & tested. I'm not sure what the workunit count will be for the lifecycle model.


I don’t know all that much about computers. This OpenIFS stuff will it be open to Windows users or is it just more penguin food. I am running Win10 and 11 on different machines.
ID: 66239 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 66240 - Posted: 26 Oct 2022, 0:34:40 UTC - in response to Message 66239.  

Linux at present, and for some time.

**********************

Unix/Linux is a more natural language for programmers of large computers.
Trying to hammer it into Windows and still have it work can be tricky sometimes.

I followed Microsoft's advice years ago when they were trying to get rid of Windows XP:
"Upgrade to a newer OS, and if necessary, newer hardware that will run it."
So I upgraded the hardware to a new cpu type, and the OS to Linux.

Best advice that I've ever had from them.
ID: 66240 · Report as offensive
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 318
Credit: 14,976,910
RAC: 9,985
Message 66243 - Posted: 26 Oct 2022, 7:48:34 UTC

... or is it just more penguin food.

This is great, I'm going to have to start using it.

I followed Microsoft's advice years ago when they were trying to get rid of Windows XP:
"Upgrade to a newer OS, and if necessary, newer hardware that will run it."
So I upgraded the hardware to a new cpu type, and the OS to Linux.

Best advice that I've ever had from them.

This is pretty good too. :-)

CPDN is definitely almost all penguin food. With a little work though, you can get yourself a virtual penguin (or a few) to feed via WSL2, which is part of Windows, or VBox, which needs to be installed and set up separately. Apparently OpenIFS will eventually have a VBox version that'll require a much simpler VBox set up than currently needed.
ID: 66243 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 66245 - Posted: 26 Oct 2022, 12:41:43 UTC

Talking of penguin food, currently running one of a batch of five OpenIFS tasks and a bunch of HADSM4's from testing so things are moving again.
ID: 66245 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,748,059
RAC: 5,647
Message 66246 - Posted: 26 Oct 2022, 13:34:05 UTC - in response to Message 66245.  

I went fishing on the dev site, and got just HADSM4's. Still, it proves I got the connection right.
ID: 66246 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 66253 - Posted: 26 Oct 2022, 17:24:51 UTC - in response to Message 66246.  

I went fishing on the dev site, and got just HADSM4's. Still, it proves I got the connection right.
There were only five of the OpenIFS ones so you needed to get in quick.
ID: 66253 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 66254 - Posted: 26 Oct 2022, 18:26:47 UTC - in response to Message 66245.  

I am running a Penguin (Computer ID 1511241), but the last work unit I got was

 
Task            Work unit	Sent	                        Reported	                Status	        Run time        CPU time        Credit	  Application
22222161 	12146959 	28 Jul 2022, 9:43:21 UTC 	30 Jul 2022, 17:28:51 UTC 	Completed 	190,787.75 	188,926.80 	9,616.92  UK Met Office HadSM4 at N144 resolution v8.02
i686-pc-linux-gnu

ID: 66254 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 66255 - Posted: 26 Oct 2022, 18:31:35 UTC - in response to Message 66254.  

Not likely to be more till the current testing branch work completes.
ID: 66255 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1051
Credit: 16,636,385
RAC: 11,909
Message 66256 - Posted: 26 Oct 2022, 20:45:45 UTC - in response to Message 66255.  

Not likely to be more till the current testing branch work completes.
Yes, and then 1000s are planned ;)
ID: 66256 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 66257 - Posted: 27 Oct 2022, 5:14:50 UTC - in response to Message 66256.  

Yes, and then 1000s are planned ;)
Even bathes of 15K or more tasks go quite quickly if for Windows. Linux only tasks if these arrive before the VM ones for MS will last a bit longer.

Sadly the first six of my HADSM4's have all gone down to -ve theta crashes. five more still running.
ID: 66257 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 66258 - Posted: 27 Oct 2022, 23:26:12 UTC - in response to Message 66257.  

I am inclined to think that they will go quickly too, though the limits are probably more on bandwidth than memory for most people I think.
But once the word gets out, there will be a lot of people willing to try at least.
ID: 66258 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1051
Credit: 16,636,385
RAC: 11,909
Message 66259 - Posted: 28 Oct 2022, 10:31:40 UTC - in response to Message 66258.  
Last modified: 28 Oct 2022, 10:33:39 UTC

I am inclined to think that they will go quickly too, though the limits are probably more on bandwidth than memory for most people I think.
But once the word gets out, there will be a lot of people willing to try at least.
According to the host page here https://www.cpdn.org/host_stats.php, there are 800 active linux hosts at last count. Assuming that also includes linux in virtualbox and WSL, when upwards of ~5000 linux openifs tasks go out, that's plenty of work. Some of my work will require the higher memory machines.
ID: 66259 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 66260 - Posted: 28 Oct 2022, 13:43:49 UTC - in response to Message 66259.  

Assuming that also includes linux in virtualbox and WSL, when upwards of ~5000 linux openifs tasks go out, that's plenty of work. Some of my work will require the higher memory machines.

Very good, I can put two Ryzen 3600's on it, one with 64 GB and the other 128 GB. But they may get into a fight over bandwidth across the Atlantic, which seems to be limited to 10 Mbps for me.
I may have to back off to one machine.
ID: 66260 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1051
Credit: 16,636,385
RAC: 11,909
Message 66261 - Posted: 28 Oct 2022, 21:19:28 UTC - in response to Message 66260.  

Assuming that also includes linux in virtualbox and WSL, when upwards of ~5000 linux openifs tasks go out, that's plenty of work. Some of my work will require the higher memory machines.
Very good, I can put two Ryzen 3600's on it, one with 64 GB and the other 128 GB. But they may get into a fight over bandwidth across the Atlantic, which seems to be limited to 10 Mbps for me. I may have to back off to one machine.
One of the nice things about OpenIFS (or IFS in general) is that the output format (GRIB) was originally designed to be transmitted over unreliable telephone lines. So it's a highly (lossy) compressed format. This means the size of the output files scales slowly with increasing amount of output, much less than the model's memory requirements scale with model resolution.

We are often reminded by the CPDN team not to overdo the output, so don't worry, we are very aware of bandwidth restrictions.

There is possibly an issue with OpenIFS in boinc that I've noted that I still need to look into. BOINC starts multiple OpenIFS tasks because there are free CPU slots, even though the total memory for the tasks exceeds what's available. When I asked Andy about this, he said the boinc client will monitor memory and suspend the tasks if memory is exceeded. However, when OpenIFS starts it immediately allocates memory for itself (you can watch this happen on the process monitor) and the client doesn't seem to be quick enough to catch multiple OpenIFS tasks hitting 8Gb RAM each and, if you haven't got the RAM, the models will crash. As I say, I still need to do more testing to verify this is what's happening. If it is, we might need to put a health warning on running multiple OpenIFS instances if it's not possible to control this.
ID: 66261 · Report as offensive
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,246,235
RAC: 15,489
Message 66262 - Posted: 28 Oct 2022, 22:21:38 UTC - in response to Message 66261.  

"BOINC starts multiple OpenIFS tasks because there are free CPU slots, even though the total memory for the tasks exceeds what's available. "

Can this be overcome by limiting the number of cores available to BOINC before downloading any of the IFS models? Allthough I have a four core CPU the box only has 24Gb of RAM.
ID: 66262 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 66263 - Posted: 28 Oct 2022, 23:41:14 UTC - in response to Message 66262.  

"BOINC starts multiple OpenIFS tasks because there are free CPU slots, even though the total memory for the tasks exceeds what's available. "

Can this be overcome by limiting the number of cores available to BOINC before downloading any of the IFS models? Allthough I have a four core CPU the box only has 24Gb of RAM.

As I understand it, they are going to run two cores per work unit at first, so you will have only two work units running. The memory should be enough.
(But yes, if you limit the number of BOINC CPU cores, then that will limit the number of work units running.)
ID: 66263 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 42 · Next

Message boards : Number crunching : New work discussion - 2

©2024 cpdn.org