climateprediction.net home page
Posts by Conan

Posts by Conan

1) Message boards : Number crunching : New work discussion - 2 (Message 69557)
Posted 2 Sep 2023 by Profile Conan
Post:
Until we get more experience with volunteers running these high memory apps I think it makes sense to restrict it to a single task for now. We can change it later in light of experience.

No other projects I know of run tasks with this high memory requirements so it's not obvious how they will be received. Let's walk first before we run with this.
LHC's ATLAS tasks at 10GB are the biggest I know of. But that's 8 threads, so you don't get people trying to run huge numbers of them. Are yours going to be single threads?


YOYO@home ECM/P2 tasks take at least 11 GB per task, single thread. Which is why I stopped running them on my 32 GB machine and limit them to just 3 at a time on my 64GB machine, they are real memory hogs.

Conan
2) Message boards : Number crunching : New work discussion - 2 (Message 69537)
Posted 28 Aug 2023 by Profile Conan
Post:
Any new work for 64 bit coming along? I noticed a couple of new entries on the server status page

OpenIFS 43r3
OpenIFS 43r3 Baroclinic Lifecycle
OpenIFS 43r3 Perturbed Surface
OpenIFS 43r3 Cubic Octahedral grid tco95 l91
OpenIFS 43r3 Linear grid tl255 l91


Thanks
Conan
3) Message boards : Number crunching : New work discussion - 2 (Message 68914)
Posted 18 Jun 2023 by Profile Conan
Post:
Although not related to new work but following on from the last couple of posts,
CMDock uses a wrapper and it shows under Linux,
I believe that YAFU also uses a wrapper and possibly YOYO, SRBase, TNGrid? and a few others. In some cases it is needed due to the type of programme being used or the code it has been written in.

A few other projects also use a "Trickle up" method to keep the Server updated with progress (Primegrid is one) and some of these projects need a wrapper for this purpose.

Conan
4) Message boards : Number crunching : Server Status page questions (Message 68604)
Posted 19 Mar 2023 by Profile Conan
Post:
I have also wondered about the server page.

UK Met Office Coupled Model Full Resolution Ocean has had 927 tasks "in progress" for many months but I have seen no indication that any have been returned and the number never changes.

Weather At Home 2 (wah2) (region independent) has 4,731 tasks in progress again for many months and again I have not seen any activity with this either (maybe 1 came back 4 months ago but can't be sure).

What is happening with these work units?

Conan
5) Message boards : Number crunching : Upload server is out of disk space (Message 67724)
Posted 14 Jan 2023 by Profile Conan
Post:
Hi Kali,

The server they go to is in Hobart, NZ. I should have spotted the NZ in the task name and thought of that. Most likely when Andy gets my message he will email the data centre in Tasmania. This has happened before on a number of occasions.

Dave


Actually Dave, Hobart is in Tasmania, Australia. Not NZ (New Zealand).

Conan
6) Message boards : Number crunching : The uploads are stuck (Message 67538)
Posted 11 Jan 2023 by Profile Conan
Post:
Yes I am still seeing "connect(): failed" messages on all upload tries.

But I still have 4 work units running and I am no where near filling up any disks, so no problem here.

Conan


It has changed to "transient HTTP error" now so still not working here yet (Australia).

Server Status has not changed yet, still showing nothing.

Conan

PS: Some files are now moving, so possibly due to the load, some fail then must retry later, others are going through, some as low as 17 kB/s to as high as 1,700 kB/s.
7) Message boards : Number crunching : The uploads are stuck (Message 67525)
Posted 10 Jan 2023 by Profile Conan
Post:
Yes I am still seeing "connect(): failed" messages on all upload tries.

But I still have 4 work units running and I am no where near filling up any disks, so no problem here.

Conan
8) Message boards : Number crunching : Tasks failing on Ubuntu 22 (Message 67347)
Posted 5 Jan 2023 by Profile Conan
Post:
If you changed the option to "leave tasks in memory" but did not read the file to update BOINC with the change it may not work until it is read.
Restarting BOINC would also read the file.

Conan
9) Message boards : Number crunching : Hardware for new models. (Message 67296)
Posted 4 Jan 2023 by Profile Conan
Post:
I saw some test results with the AMD RYZEN 5950X, RYZEN 7950X, INTEL 12900 and INTEL 13900 (I think they were the model names).

When all under full load for what ever test they were doing

RYZEN 9 5950X used 130 Watts
RYZEN 9 7950X used 270 Watts (or there abouts)
INTEL 12900 used 285-290 Watts (or there abouts)
INTEL 13900 used 315 Watts (or there abouts)

Can't point you to the tests but they were on Youtube along with other showing similar results.

So the RYZEN 5950X may not be as powerful as the new models but for energy efficiency hard to beat.

That's of course if you can find them, they are getting harder to find.

I run a RYZEN 9 5900X which has 12 cores + 12 threads which should use even less power as it has less cores than the 5950X.
It has 64 GB of RAM and along with a full compliment of other BOINC projects easily runs 9 CPDN work units at a time. Only gets to about 42 GB max depending what I am running at the time (everything not just CPDN) (it may get higher than 42 GB but I have the head room to cover that.)

BOINC has not downloaded more than 9 work units at any one time, probably because I am running a lot of other projects at the same time.

Conan
10) Message boards : Number crunching : OpenIFS Discussion (Message 66999)
Posted 22 Dec 2022 by Profile Conan
Post:
All 9 work units that I had running overnight have completed successfully.

Running on an AMD Ryzen 9 5900x, 64GB RAM, all 24 threads used to run BOINC programmes at the same time as the ClimatePrediction models.
All took around 17 hours 10 minutes run time.

Conan
11) Message boards : Number crunching : Late Validation pending (Message 66991)
Posted 21 Dec 2022 by Profile Conan
Post:
Well it seems that these files have finally been validated and I have been awarded credit for them, I think.

I have noticed a clean up/out has taken place and a lot of the old past work units that I have done over the years has been removed.
Those 2 pending jobs among them. I was awarded some small amount of credit this week when I have not done any work and now it seems that the database has had a bit of a clean out and fix up. Good to see.

Conan
12) Message boards : Number crunching : OpenIFS Discussion (Message 66990)
Posted 21 Dec 2022 by Profile Conan
Post:
G'Day Glenn,

You may of miss read what I wrote I think.

The 11.3 GB was not a file size but the amount of disk writes made in that first 2 hours (now after 5 hours well over 30 Gb).
The 2.7 to 4.6 GB were RAM amounts that each work unit was using.

This was all taken from System Monitor.

I did what you have asked and

% cd slots/26
% du -hs . # note the '.'
1.2G .

This is the same as your example.

% cd projects/climateprediction.net
% du -hs .
1.2G .

This is similar to your example.

du -hs srf*

768 MB srf00370000.0001

So all running fine, so maybe just a bit of a misunderstanding I think with data amounts and RAM usage.

Thanks
Conan
13) Message boards : Number crunching : OpenIFS Discussion (Message 66983)
Posted 21 Dec 2022 by Profile Conan
Post:
These Oifs _ps tasks really test your system out.

Running 9 at once, each using from 2.7 to 4.2 GB of RAM, after 2 hours run time they have written 11.3 GB of data to disk each (101.7 GB), which is huge.
Hitting 50 GB of RAM in use out of 64 GB, but I am also running LODA tasks which each use 1 GB of RAM. All 24 threads are running.
12% in and running fine so far.

Conan
14) Message boards : Number crunching : OpenIFS Discussion (Message 66795)
Posted 6 Dec 2022 by Profile Conan
Post:
My resent task 22249228 has been sent out twice before.

Previous Task 22246540 and Task 22248943

Task 22246540 has no Stderr, it failed with a Run Time of 1 Day 5 Hours and a CPU Time of 31 Minutes. It also had an unusual amount of Peak Disk Usage of 23,961.87 MB (or 23.9 GB) way above the norm as I have seen.

Task 22248943 has the error "Process exited with code 9" other than that seemed to have run fine. This one belonged to wateroakley

I was able to run this WU to completion without error.


Another resent task I have running is Task 22249324

Previous Task 22247025 and Task 22249194

Task 22247025 on computer 1524992 it had a Run Time of 42 Minutes with a CPU Time of 20 Seconds with a Peak Disk Usage of just 404.06 MB.
This computer still has work on it but has not completed a successful OpenIFS WU all failed work units have the same long run times and short CPU times and have different error codes as well, codes 1, 5 and 148 all appear on this computer.

Task 22249194 on computer 1504810 has No Stderr, has a Run Time of 1 Day 1 Hour and CPU Time of 7 Hours.
This computer has run 9 OpenIFS work units all have failed with the long Run Time and short CPU Time.
This computer belongs to happywetter.at

So a few different reasons that some work units have failed or thrown an error.

Conan

I completed Task 22249324 successfully in just under 17 1/2 hours.
15) Message boards : Number crunching : OpenIFS Discussion (Message 66793)
Posted 5 Dec 2022 by Profile Conan
Post:
My resent task 22249228 has been sent out twice before.

Previous Task 22246540 and Task 22248943

Task 22246540 has no Stderr, it failed with a Run Time of 1 Day 5 Hours and a CPU Time of 31 Minutes. It also had an unusual amount of Peak Disk Usage of 23,961.87 MB (or 23.9 GB) way above the norm as I have seen.

Task 22248943 has the error "Process exited with code 9" other than that seemed to have run fine. This one belonged to wateroakley

I was able to run this WU to completion without error.


Another resent task I have running is Task 22249324

Previous Task 22247025 and Task 22249194

Task 22247025 on computer 1524992 it had a Run Time of 42 Minutes with a CPU Time of 20 Seconds with a Peak Disk Usage of just 404.06 MB.
This computer still has work on it but has not completed a successful OpenIFS WU all failed work units have the same long run times and short CPU times and have different error codes as well, codes 1, 5 and 148 all appear on this computer.

Task 22249194 on computer 1504810 has No Stderr, has a Run Time of 1 Day 1 Hour and CPU Time of 7 Hours.
This computer has run 9 OpenIFS work units all have failed with the long Run Time and short CPU Time.
This computer belongs to happywetter.at

So a few different reasons that some work units have failed or thrown an error.

Conan
16) Message boards : Number crunching : OpenIFS Discussion (Message 66737)
Posted 3 Dec 2022 by Profile Conan
Post:
Just downloaded a resend of a Work Unit that failed due to an error.

This Task 22245903

It failed due to running longer than 5 minutes after the work unit had finished.

The WU was run by mikey and other than the longer run time after finishing seemed to have run successfully after over 2 days run time.

The run time seems overly long on a Ryzen but did complete.

It is now running as Task 22249047 on my Ryzen computer.

Will see how it runs for me.

Conan


Completed successfully after 16 1/2 hours.

Conan
17) Message boards : Number crunching : OpenIFS Discussion (Message 66718)
Posted 2 Dec 2022 by Profile Conan
Post:
Just downloaded a resend of a Work Unit that failed due to an error.

This Task 22245903

It failed due to running longer than 5 minutes after the work unit had finished.

The WU was run by mikey and other than the longer run time after finishing seemed to have run successfully after over 2 days run time.

The run time seems overly long on a Ryzen but did complete.

It is now running as Task 22249047 on my Ryzen computer.

Will see how it runs for me.

Conan
18) Message boards : Number crunching : OpenIFS Discussion (Message 66684)
Posted 1 Dec 2022 by Profile Conan
Post:
Experiment successful, work unit completed without error in a shade under 18 hours.

The time may of been due to how loaded up the processor was during this time but still good.

Don't know about the cache hits as the experiment was done on an older Intel i5. My newer Ryzen I believe has a larder cache but without looking things up I don't know what it is either.

The 2 still on the Ryzen are paused at the moment due to some PrimeGrid work I need to do, they both still have 33% left to run.

Conan
19) Message boards : Number crunching : OpenIFS Discussion (Message 66681)
Posted 30 Nov 2022 by Profile Conan
Post:
As an experiment, I have downloaded a work unit to my 4 core 8 GB Linux computer to see how it would run.

The computer is running other BOINC projects and at the moment is running LODA and PRIVATE GFN SEARCH plus iThena.Measurements and WUProp@Home.
iThena.Measurements and WUProp are Non-CPU intensive. PRIVATE GFN SEARCH uses minimal resources and less than 50 kB of RAM to run, however LODA is different and uses 1 GB per work unit of RAM.

When started the Climate model maxed out my 8 GB and used half my SWAP (7.6 GB so about 3 to 4 GB) this is along with the other BOINC projects.

So the computer slowed to a crawl but kept running.

Once settled down the Climate model is now using from 2 to 4.5 GB and no SWAP even with 3 LODA work units running as well, but does start to lag a lot. With only 2 LODA, 1 PRIVATE GFN SEARCH and 1 Climate Open IFS running it is quite usable.

The Open IFS Climate model is now at 76.425% after 13 hours with about 4 1/2 hours or so to go.

So it can be done on 8 GB memory but I would not recommend it if you also want to use the computer as well, because you can go to sleep waiting for the screens to change.

As an aside to this I have been having no trouble with all the trickles from 5 work units (now 3 as 2 finished) they go as soon as they are ready.
Using a hybrid Fibre to the Node and copper cable to the house Broadband system with around 15 MB upload and 25+ MB download (both on good days with low usage by others on the ISP network).

I will stick to my RYZEN 5900x with 64 GB RAM, much less hassle even running 4 at a time does not use over 20 GB.

Conan
20) Message boards : Number crunching : Task completed, but not all trickles acknowledged yet. Normal? (Message 66561)
Posted 24 Nov 2022 by Profile Conan
Post:
In a similar vein, I have This WU 22236909 that reported all trickles and seems to have been awarded full credit but still says it is on my computer and still running.

It uploaded with the last trickle so does anyone know what has happened to it?

I do not have it on my computer.

(there are 3 failed work units on that same computer reported today but they stem from a power failure which upset them)

Thanks
Conan


Next 20

©2024 climateprediction.net