climateprediction.net home page
Posts by Jim1348

Posts by Jim1348

21) Message boards : Number crunching : Hardware requirements for upcoming models (Message 65925)
Posted 21 Aug 2022 by Jim1348
Post:
Quick question, out of interest. How many cores on your machines to you make available to CPDN/BOINC?

I have two Ryzen 3600's (12 virtual cores each) and two Ryzen 3900X (24 cores each), each with 64 GB of memory, three on Ubuntu 20.04.4 and one on Win10 (but easily switched).
I have even more Ryzen 5000 series, but usually reserve them for Folding. But if you need them to warn of hurricanes, I can do them too.
22) Message boards : Number crunching : Hardware requirements for upcoming models (Message 65892)
Posted 19 Aug 2022 by Jim1348
Post:
Thank you. We go from knowing almost nothing, to quite a bit.

I can do lots of cores, and have a good cable modem, so send them out.
And 64GB is no problem, and I can do 128GB if you need it. It is just sitting in the drawer anyway.
23) Message boards : Number crunching : New work Discussion (Message 65731)
Posted 1 Aug 2022 by Jim1348
Post:
Another possibility, is that you're running Linux in a VM on a Windows machine.

I've found it best not to make life too complicated for a computer running climate models.

That is certainly my experience. I lose a lot more CPDN work units running Ubuntu 20.04.4 under WSL/Win10 than I do on a native Ubuntu 20.04 machine.
And there is nothing wrong with BOINC 7.16.6 for CPDN, or the later ones either (7.18.1, 7.20.2) that I have found.
24) Message boards : Number crunching : Computation Errors (Message 65686)
Posted 25 Jul 2022 by Jim1348
Post:
It looks like those tasks have already experienced 2 restarts. I'm guessing they errored out after a 3rd interruption. This has happened to me (interruptions were unintentional so it was very annoying).

I suspect WSL, but it had not been rebooted all day. So it was something else.
(You have to reboot a lot in quick succession to get it to fail that way. I think it usually happens when notebooks come out of hibernation frequently.)

I have seen strange errors with WSL before, but don't know if they are limited to CPDN or affect all projects. I will try something else.
25) Message boards : Number crunching : Computation Errors (Message 65682)
Posted 25 Jul 2022 by Jim1348
Post:
This is annoying. I was running 8 HadSM4 at N144 on a Ryzen 5700X, using 50% of the virtual cores to provide enough cache.
This was running under WSL on Windows 10, and I had no problem, even rebooting without errors. They were running fine, and estimated to complete in 3 days 4 hours.

But just short of 3 days, they all failed.
https://www.cpdn.org/results.php?hostid=1533683

It looks like everyone else who errored out on them did so after a short period of time, so they probably did not have the libraries installed.
So I don't know if it is a problem with the work units, or with WSL.
26) Message boards : Number crunching : New work Discussion (Message 65674)
Posted 22 Jul 2022 by Jim1348
Post:
There are 2048 on the server now.
27) Message boards : Number crunching : New work Discussion (Message 65631)
Posted 12 Jul 2022 by Jim1348
Post:
I am on 7.20.1 and haven't had any problems. Which projects cause errors?

Rosetta/Pythons and QuChemPedIA that I know of.
But you are on Windows. That is an entirely different security arrangement than Linux.
28) Message boards : Number crunching : New work Discussion (Message 65629)
Posted 12 Jul 2022 by Jim1348
Post:
I was happy to see that the N144 run OK on BOINC 7.20.0. That is not true of all projects, due to the new security settings.
https://www.cpdn.org/results.php?hostid=1533264
29) Message boards : climateprediction.net Science : Rare ‘triple’ La Niña climate event looks likely — what does the future hold? (Message 65605)
Posted 30 Jun 2022 by Jim1348
Post:
England has another possible explanation for why the IPCC models could be getting future La Niña-like conditions wrong. As the world warms and the Greenland ice sheet melts, its fresh cold water is expected to slow down a dominant conveyor belt of ocean currents: the Atlantic Meridional Overturning Circulation (AMOC). Scientists mostly agree that the AMOC current has slowed down in recent decades4, but don’t agree on why, or how much it will slow in future.

In a study published in Nature Climate Change on 6 June5, England and his colleagues model how an AMOC collapse would leave an excess of heat in the tropical South Atlantic, which would trigger a series of air-pressure changes that ultimately strengthen the Pacific trade winds. These winds push warm water to the west, thus creating more La Niña-like conditions. But England says that the current IPCC models don’t reflect this trend because they don’t include the complex interactions between ice-sheet melt, freshwater injections, ocean currents and atmospheric circulation. “We keep adding bells and whistles to these models. But we need to add in the ice sheets,” he says.

Michael Mann, a climatologist at Pennsylvania State University in State College, has also argued that climate change will both slow the AMOC and create more La Niña-like conditions. He says the study shows how these two factors can reinforce each other. Getting the models to better reflect what’s going on in the ocean, says Seager, “remains a very active research topic”.

https://www.nature.com/articles/d41586-022-01668-1
30) Message boards : Number crunching : New work Discussion (Message 65539)
Posted 10 Jun 2022 by Jim1348
Post:
These work units give 4 or 5 trickles, each about 25% apart.

I am just at 25% and see only one trickle too.

Running four at a time, they are taking over 13 1/2 days on a Ryzen 3900X, but that is with all cores loaded.
When I ran them on a Ryzen 3600 with only six cores (50%) in use, they would take a little under 10 days. So I think the ratio is right for virtual cores.
31) Message boards : Cafe CPDN : World Community Grid mostly down for 2 months while transitioning (Message 65471)
Posted 1 Jun 2022 by Jim1348
Post:
I've got a shiny new pair of 3900Xs hanging out ready for real work to chew into, be it 32-bit Linux, 32-bit MacOS, or (dare I hope?) 64-bit!

I guess I could see if Folding@Home needs CPU cycles.

Folding is a great project, and I do a lot of it (both CPU and GPU). But the 3900X has a large cache/core ratio, which is helpful on some projects.

I have one on QuChemPedIA, and it works very well (Ubuntu 20.04.4).
https://quchempedia.univ-angers.fr/athome/
Ignore the large number of invalids. It is part of the science that they don't know which ones will work beforehand.
32) Message boards : Number crunching : No work for Windows OR Linux?! (Message 65376)
Posted 15 Apr 2022 by Jim1348
Post:
All BOINC clients communicate via the same port, 31416. One of them needs to be set to another port and I found that it's much easier to change the WSL2 client rather than Windows one (requires Registry edit). I assume you have standard BOINC installations in both WSL2 Ubuntu and Windows. Also just to note, you can only view and manage one client at a time in BOINC Manager not both at the same time. You'll have to switch between the two but that's easy to do which I'll describe at the end. You'll have to do some restarts during the process so you might want to wait until your current CPDN tasks are finished so they don't error out.

* Edit the following 2 files as follows.

1. Edit the configuration file for the boinc-client init script. Go to /etc/default and edit a file called boinc-client. At the bottom of the 7th paragraph add the following line. 51325 is an example port number, make up your own, I've seen it suggested that it be over 50000.
BOINC_OPTS="--gui_rpc_port 51325"

Thanks for that. I normally put the port on the startup command line (boinc --gui_rpc_port 31420), but I like the permanent setting better.

As for managing both the Windows and Linux sides easily, the best way is BoincTasks
https://efmer.com/boinctasks/

I run it on Windows 10, and it manages not only the Windows and WSL sides, but also several other Linux machines on the same LAN.
You just set up one "computer" and direct it to your Windows machine (e.g., port 31416), and set up another computer for the WSL side (e.g., port 31420 or whatever), and any additional machines you want on their ports (e.g., 31421, etc.).

It couldn't be easier to manage.
33) Message boards : Number crunching : No work for Windows OR Linux?! (Message 65347)
Posted 12 Apr 2022 by Jim1348
Post:
FWIW, I am running the HadAM4 N144 fine under WSL now (using Ubuntu 20.04.4).
I installed the 32-bit libraries using "sudo apt install lib32ncurses6 lib32z1 lib32stdc++-8-dev", though that is in addition to "lib32ncurses6 lib32z1 lib32stdc++-7-dev" that I had previously installed.
https://www.cpdn.org/results.php?hostid=1530163

I have not run them for a while on this Windows machine, because the last time I could not reboot without shutting them down with a BOINC command (boinccmd --quit).
https://www.cpdn.org/forum_thread.php?id=9025&postid=63477#63477
Now, I can just reboot and they do not error out, and start up again automatically.

It is probably due to the updates to WSL, or BOINC, or maybe the work units themselves.
At any rate, it is convenient to be able to run them on Win10. This machine has a Ryzen 3600, and running six at a time (50% of the cores) works well.
34) Message boards : Cafe CPDN : World Community Grid mostly down for 2 months while transitioning (Message 65131)
Posted 10 Feb 2022 by Jim1348
Post:
I hope cpdn actually comes up with some new work soon.

Me too. But I am perplexed that with 2163 unsent HadCM3 shorts, they would make them Mac only, or so I understand it.
It slows down the usual glacial speed to continental drift speed.
35) Message boards : Number crunching : New work Discussion (Message 65129)
Posted 9 Feb 2022 by Jim1348
Post:
I run WCG ARP1 and MCM1 and they do clean up after finishing.

That is my experience too. I have been running them from the beginning with no problems.
Currently, I have one machine on each, but sometimes mix them with other projects.
36) Message boards : Number crunching : New work Discussion (Message 65053)
Posted 4 Feb 2022 by Jim1348
Post:
A note from Dave in the getting started area has the answer, these have been amended to be Mac only so we’re out of luck.

I would never have thought of looking there. It is more the getting stopped area.
Thanks.
37) Message boards : Number crunching : New work Discussion (Message 65048)
Posted 4 Feb 2022 by Jim1348
Post:
I can't get the new HadCM3 shorts. I just get a "no work sent".
It is the same machine where I have been running the HadAM4 at N216.
https://www.cpdn.org/results.php?hostid=1523408

And the "Project status" page shows only 3 users. There must be something wrong.
38) Message boards : Number crunching : New work Discussion (Message 65024)
Posted 31 Jan 2022 by Jim1348
Post:
They don't crash, the last batch I checked were taking 12GB of RAM each and uploads were about 550MB Haven't tried to check on CPU cache but it hasn't been raised as an issue by other testers so I suspect not as much as the N216 tasks. Some batches have had final uploads of over 1GB so I have had them uploading while I sleep if on a day when I am doing any Zoom calls. Obviously not an issue for those with real broad as opposed to bored band.

That is good information. The memory and bandwidth requirements are quite large, but a number of us could do a few at a time if that is what it takes.
Of course, that may not be enough to do them much good, but that is another question.
39) Message boards : Number crunching : Dr Lisa Su says up to 192MB L3 on newer Ryzen -- hope it's true (Message 65008)
Posted 26 Jan 2022 by Jim1348
Post:
I do not know what you call a CPU cache. I infer you refer to the part of your RAM that is currently devoted to that purpose.
No, it is the cache on the CPU itself. A Ryzen 3600 has
Total L1 Cache: 384KB
Total L2 Cache: 3MB
Total L3 Cache: 32MB
It is the L3 cache that distinguishes one CPU from another, and largely determines how many work units you should run at a time so that they fit mainly in the cache.
I usually run six of the N216 for that purpose, though running eight may give slightly more output. But beyond a certain point, the total output actually decreases.


In a normally running modern Linux, (almost) all RAM not used for something else is given over to the disk input cache. Anytime the kernel wants more RAM for a process, it can grab it from the disk input cache. If that is not enough, it can get it from the output buffer, but it would have to write it out first. And I suppose cachestat can tell you about that, but it is deprecated and not available for my distro. It seems to me that by the time you need a tool like that, you have long since passed the point where you seriously should increase the size of your RAM.
I have 64 GB on the Ryzen 3600, so however Linux handles it, that is more than enough.
It is the on-chip CPU cache that I need to monitor. Maybe perf can do it. I will look some more.
40) Message boards : Number crunching : Dr Lisa Su says up to 192MB L3 on newer Ryzen -- hope it's true (Message 65006)
Posted 25 Jan 2022 by Jim1348
Post:
And my cache is supplying about half the requested memory references:
# perf stat -aB -e cache-references,cache-misses
^C
 Performance counter stats for 'system wide':

    33,364,888,278      cache-references                                            
    17,805,920,648      cache-misses              #   53.367 % of all cache refs    

      64.185688537 seconds time elapsed


I suppose the instructions are mostly in the cache, and very little of the data are in there.

Thanks, I have been trying to measure my cache in Ubuntu.
I was not able to get that command to fly on my Ryzen 3600 with Ubuntu 20.04.3, but that is not my concern. I probably could with some work.

I have been using the "cachestat" command, but am not quite sure how to interpret the results.
When running the HadAM4 (N216) on 8 cores, plus two Rosetta pythons on 2 cores (85% of the cores), I see:

$ sudo ./cachestat

Counting cache functions... Output every 1 seconds.
    HITS   MISSES  DIRTIES    RATIO   BUFFERS_MB   CACHE_MB
   18658        0       45   100.0%          139      36237
   57334        0       43   100.0%          139      36237
   30930        0       26   100.0%          139      36237
   21124        0       31   100.0%          139      36237
   92343        0      108   100.0%          139      36237
   26557        0       75   100.0%          139      36237
   25485        0       26   100.0%          139      36237
   97719        2       75   100.0%          139      36237
   21042        0       25   100.0%          139      36237
   38118        0       60   100.0%          139      36237
   46525        0       29   100.0%          139      36237
   25127        0       44   100.0%          139      36237
   98529        0       64   100.0%          139      36237
   25745        1       15   100.0%          139      36237
   23106        0       66   100.0%          139      36237
   92583        0       72   100.0%          139      36237
    8580        0       50   100.0%          139      36237
   38967        0       55   100.0%          139      36237
   64163        0       43   100.0%          139      36237
   25698        0       29   100.0%          139      36237
   86728        0       61   100.0%          139      36237
   24077        0       44   100.0%          139      36237
   21742        0       17   100.0%          139      36237
   77441        0       63   100.0%          139      36237
   26411        0       38   100.0%          139      36237
   18575        0       24   100.0%          139      36237
   85779        0       60   100.0%          139      36237
   29630        0       34   100.0%          139      36237
   41840        0       45   100.0%          139      36237
   30779        0       81   100.0%          139      36238
   35510        0       44   100.0%          139      36238
   98186        0      109   100.0%          139      36238
   20706        0       17   100.0%          139      36238
   16524        0        9   100.0%          139      36238
   72171        0       54   100.0%          139      36238
    1469        0       13   100.0%          139      36238
   43854        0       66   100.0%          140      36238
   52263        0       39   100.0%          140      36238

My guess is that my cache hits are not really 100%, but probably more in line with what you see.

But if you want to try it, you can install it as follows:
To install perf-tools, open terminal and run:
sudo apt-get install linux-tools-common linux-tools-generic

Then, to install cachestat, run:
wget https://raw.githubusercontent.com/brendangregg/perf-tools/master/fs/cachestat

To make it executable, run:
chmod +x cachestat

Finally run it:
sudo ./cachestat

It probably is not measuring the CPU cache. I have a large write-cache (12.5 GB) in main memory (DDR4), and that may be what it is seeing.


Previous 20 · Next 20

©2024 climateprediction.net