21)
Message boards :
Number crunching :
Hardware requirements for upcoming models
(Message 65925)
Posted 21 Aug 2022 by Jim1348 Post: Quick question, out of interest. How many cores on your machines to you make available to CPDN/BOINC? I have two Ryzen 3600's (12 virtual cores each) and two Ryzen 3900X (24 cores each), each with 64 GB of memory, three on Ubuntu 20.04.4 and one on Win10 (but easily switched). I have even more Ryzen 5000 series, but usually reserve them for Folding. But if you need them to warn of hurricanes, I can do them too. |
22)
Message boards :
Number crunching :
Hardware requirements for upcoming models
(Message 65892)
Posted 19 Aug 2022 by Jim1348 Post: Thank you. We go from knowing almost nothing, to quite a bit. I can do lots of cores, and have a good cable modem, so send them out. And 64GB is no problem, and I can do 128GB if you need it. It is just sitting in the drawer anyway. |
23)
Message boards :
Number crunching :
New work Discussion
(Message 65731)
Posted 1 Aug 2022 by Jim1348 Post: Another possibility, is that you're running Linux in a VM on a Windows machine. That is certainly my experience. I lose a lot more CPDN work units running Ubuntu 20.04.4 under WSL/Win10 than I do on a native Ubuntu 20.04 machine. And there is nothing wrong with BOINC 7.16.6 for CPDN, or the later ones either (7.18.1, 7.20.2) that I have found. |
24)
Message boards :
Number crunching :
Computation Errors
(Message 65686)
Posted 25 Jul 2022 by Jim1348 Post: It looks like those tasks have already experienced 2 restarts. I'm guessing they errored out after a 3rd interruption. This has happened to me (interruptions were unintentional so it was very annoying). I suspect WSL, but it had not been rebooted all day. So it was something else. (You have to reboot a lot in quick succession to get it to fail that way. I think it usually happens when notebooks come out of hibernation frequently.) I have seen strange errors with WSL before, but don't know if they are limited to CPDN or affect all projects. I will try something else. |
25)
Message boards :
Number crunching :
Computation Errors
(Message 65682)
Posted 25 Jul 2022 by Jim1348 Post: This is annoying. I was running 8 HadSM4 at N144 on a Ryzen 5700X, using 50% of the virtual cores to provide enough cache. This was running under WSL on Windows 10, and I had no problem, even rebooting without errors. They were running fine, and estimated to complete in 3 days 4 hours. But just short of 3 days, they all failed. https://www.cpdn.org/results.php?hostid=1533683 It looks like everyone else who errored out on them did so after a short period of time, so they probably did not have the libraries installed. So I don't know if it is a problem with the work units, or with WSL. |
26)
Message boards :
Number crunching :
New work Discussion
(Message 65674)
Posted 22 Jul 2022 by Jim1348 Post: There are 2048 on the server now. |
27)
Message boards :
Number crunching :
New work Discussion
(Message 65631)
Posted 12 Jul 2022 by Jim1348 Post: I am on 7.20.1 and haven't had any problems. Which projects cause errors? Rosetta/Pythons and QuChemPedIA that I know of. But you are on Windows. That is an entirely different security arrangement than Linux. |
28)
Message boards :
Number crunching :
New work Discussion
(Message 65629)
Posted 12 Jul 2022 by Jim1348 Post: I was happy to see that the N144 run OK on BOINC 7.20.0. That is not true of all projects, due to the new security settings. https://www.cpdn.org/results.php?hostid=1533264 |
29)
Message boards :
climateprediction.net Science :
Rare ‘triple’ La Niña climate event looks likely — what does the future hold?
(Message 65605)
Posted 30 Jun 2022 by Jim1348 Post: England has another possible explanation for why the IPCC models could be getting future La Niña-like conditions wrong. As the world warms and the Greenland ice sheet melts, its fresh cold water is expected to slow down a dominant conveyor belt of ocean currents: the Atlantic Meridional Overturning Circulation (AMOC). Scientists mostly agree that the AMOC current has slowed down in recent decades4, but don’t agree on why, or how much it will slow in future. https://www.nature.com/articles/d41586-022-01668-1 |
30)
Message boards :
Number crunching :
New work Discussion
(Message 65539)
Posted 10 Jun 2022 by Jim1348 Post: These work units give 4 or 5 trickles, each about 25% apart. I am just at 25% and see only one trickle too. Running four at a time, they are taking over 13 1/2 days on a Ryzen 3900X, but that is with all cores loaded. When I ran them on a Ryzen 3600 with only six cores (50%) in use, they would take a little under 10 days. So I think the ratio is right for virtual cores. |
31)
Message boards :
Cafe CPDN :
World Community Grid mostly down for 2 months while transitioning
(Message 65471)
Posted 1 Jun 2022 by Jim1348 Post: I've got a shiny new pair of 3900Xs hanging out ready for real work to chew into, be it 32-bit Linux, 32-bit MacOS, or (dare I hope?) 64-bit! Folding is a great project, and I do a lot of it (both CPU and GPU). But the 3900X has a large cache/core ratio, which is helpful on some projects. I have one on QuChemPedIA, and it works very well (Ubuntu 20.04.4). https://quchempedia.univ-angers.fr/athome/ Ignore the large number of invalids. It is part of the science that they don't know which ones will work beforehand. |
32)
Message boards :
Number crunching :
No work for Windows OR Linux?!
(Message 65376)
Posted 15 Apr 2022 by Jim1348 Post: All BOINC clients communicate via the same port, 31416. One of them needs to be set to another port and I found that it's much easier to change the WSL2 client rather than Windows one (requires Registry edit). I assume you have standard BOINC installations in both WSL2 Ubuntu and Windows. Also just to note, you can only view and manage one client at a time in BOINC Manager not both at the same time. You'll have to switch between the two but that's easy to do which I'll describe at the end. You'll have to do some restarts during the process so you might want to wait until your current CPDN tasks are finished so they don't error out. Thanks for that. I normally put the port on the startup command line (boinc --gui_rpc_port 31420), but I like the permanent setting better. As for managing both the Windows and Linux sides easily, the best way is BoincTasks https://efmer.com/boinctasks/ I run it on Windows 10, and it manages not only the Windows and WSL sides, but also several other Linux machines on the same LAN. You just set up one "computer" and direct it to your Windows machine (e.g., port 31416), and set up another computer for the WSL side (e.g., port 31420 or whatever), and any additional machines you want on their ports (e.g., 31421, etc.). It couldn't be easier to manage. |
33)
Message boards :
Number crunching :
No work for Windows OR Linux?!
(Message 65347)
Posted 12 Apr 2022 by Jim1348 Post: FWIW, I am running the HadAM4 N144 fine under WSL now (using Ubuntu 20.04.4). I installed the 32-bit libraries using "sudo apt install lib32ncurses6 lib32z1 lib32stdc++-8-dev", though that is in addition to "lib32ncurses6 lib32z1 lib32stdc++-7-dev" that I had previously installed. https://www.cpdn.org/results.php?hostid=1530163 I have not run them for a while on this Windows machine, because the last time I could not reboot without shutting them down with a BOINC command (boinccmd --quit). https://www.cpdn.org/forum_thread.php?id=9025&postid=63477#63477 Now, I can just reboot and they do not error out, and start up again automatically. It is probably due to the updates to WSL, or BOINC, or maybe the work units themselves. At any rate, it is convenient to be able to run them on Win10. This machine has a Ryzen 3600, and running six at a time (50% of the cores) works well. |
34)
Message boards :
Cafe CPDN :
World Community Grid mostly down for 2 months while transitioning
(Message 65131)
Posted 10 Feb 2022 by Jim1348 Post: I hope cpdn actually comes up with some new work soon. Me too. But I am perplexed that with 2163 unsent HadCM3 shorts, they would make them Mac only, or so I understand it. It slows down the usual glacial speed to continental drift speed. |
35)
Message boards :
Number crunching :
New work Discussion
(Message 65129)
Posted 9 Feb 2022 by Jim1348 Post: I run WCG ARP1 and MCM1 and they do clean up after finishing. That is my experience too. I have been running them from the beginning with no problems. Currently, I have one machine on each, but sometimes mix them with other projects. |
36)
Message boards :
Number crunching :
New work Discussion
(Message 65053)
Posted 4 Feb 2022 by Jim1348 Post: A note from Dave in the getting started area has the answer, these have been amended to be Mac only so we’re out of luck. I would never have thought of looking there. It is more the getting stopped area. Thanks. |
37)
Message boards :
Number crunching :
New work Discussion
(Message 65048)
Posted 4 Feb 2022 by Jim1348 Post: I can't get the new HadCM3 shorts. I just get a "no work sent". It is the same machine where I have been running the HadAM4 at N216. https://www.cpdn.org/results.php?hostid=1523408 And the "Project status" page shows only 3 users. There must be something wrong. |
38)
Message boards :
Number crunching :
New work Discussion
(Message 65024)
Posted 31 Jan 2022 by Jim1348 Post: They don't crash, the last batch I checked were taking 12GB of RAM each and uploads were about 550MB Haven't tried to check on CPU cache but it hasn't been raised as an issue by other testers so I suspect not as much as the N216 tasks. Some batches have had final uploads of over 1GB so I have had them uploading while I sleep if on a day when I am doing any Zoom calls. Obviously not an issue for those with real broad as opposed to bored band. That is good information. The memory and bandwidth requirements are quite large, but a number of us could do a few at a time if that is what it takes. Of course, that may not be enough to do them much good, but that is another question. |
39)
Message boards :
Number crunching :
Dr Lisa Su says up to 192MB L3 on newer Ryzen -- hope it's true
(Message 65008)
Posted 26 Jan 2022 by Jim1348 Post: I do not know what you call a CPU cache. I infer you refer to the part of your RAM that is currently devoted to that purpose.No, it is the cache on the CPU itself. A Ryzen 3600 has Total L1 Cache: 384KB Total L2 Cache: 3MB Total L3 Cache: 32MB It is the L3 cache that distinguishes one CPU from another, and largely determines how many work units you should run at a time so that they fit mainly in the cache. I usually run six of the N216 for that purpose, though running eight may give slightly more output. But beyond a certain point, the total output actually decreases. In a normally running modern Linux, (almost) all RAM not used for something else is given over to the disk input cache. Anytime the kernel wants more RAM for a process, it can grab it from the disk input cache. If that is not enough, it can get it from the output buffer, but it would have to write it out first. And I suppose cachestat can tell you about that, but it is deprecated and not available for my distro. It seems to me that by the time you need a tool like that, you have long since passed the point where you seriously should increase the size of your RAM.I have 64 GB on the Ryzen 3600, so however Linux handles it, that is more than enough. It is the on-chip CPU cache that I need to monitor. Maybe perf can do it. I will look some more. |
40)
Message boards :
Number crunching :
Dr Lisa Su says up to 192MB L3 on newer Ryzen -- hope it's true
(Message 65006)
Posted 25 Jan 2022 by Jim1348 Post: And my cache is supplying about half the requested memory references: Thanks, I have been trying to measure my cache in Ubuntu. I was not able to get that command to fly on my Ryzen 3600 with Ubuntu 20.04.3, but that is not my concern. I probably could with some work. I have been using the "cachestat" command, but am not quite sure how to interpret the results. When running the HadAM4 (N216) on 8 cores, plus two Rosetta pythons on 2 cores (85% of the cores), I see: $ sudo ./cachestat Counting cache functions... Output every 1 seconds. HITS MISSES DIRTIES RATIO BUFFERS_MB CACHE_MB 18658 0 45 100.0% 139 36237 57334 0 43 100.0% 139 36237 30930 0 26 100.0% 139 36237 21124 0 31 100.0% 139 36237 92343 0 108 100.0% 139 36237 26557 0 75 100.0% 139 36237 25485 0 26 100.0% 139 36237 97719 2 75 100.0% 139 36237 21042 0 25 100.0% 139 36237 38118 0 60 100.0% 139 36237 46525 0 29 100.0% 139 36237 25127 0 44 100.0% 139 36237 98529 0 64 100.0% 139 36237 25745 1 15 100.0% 139 36237 23106 0 66 100.0% 139 36237 92583 0 72 100.0% 139 36237 8580 0 50 100.0% 139 36237 38967 0 55 100.0% 139 36237 64163 0 43 100.0% 139 36237 25698 0 29 100.0% 139 36237 86728 0 61 100.0% 139 36237 24077 0 44 100.0% 139 36237 21742 0 17 100.0% 139 36237 77441 0 63 100.0% 139 36237 26411 0 38 100.0% 139 36237 18575 0 24 100.0% 139 36237 85779 0 60 100.0% 139 36237 29630 0 34 100.0% 139 36237 41840 0 45 100.0% 139 36237 30779 0 81 100.0% 139 36238 35510 0 44 100.0% 139 36238 98186 0 109 100.0% 139 36238 20706 0 17 100.0% 139 36238 16524 0 9 100.0% 139 36238 72171 0 54 100.0% 139 36238 1469 0 13 100.0% 139 36238 43854 0 66 100.0% 140 36238 52263 0 39 100.0% 140 36238 My guess is that my cache hits are not really 100%, but probably more in line with what you see. But if you want to try it, you can install it as follows: To install perf-tools, open terminal and run: sudo apt-get install linux-tools-common linux-tools-generic Then, to install cachestat, run: wget https://raw.githubusercontent.com/brendangregg/perf-tools/master/fs/cachestat To make it executable, run: chmod +x cachestat Finally run it: sudo ./cachestat It probably is not measuring the CPU cache. I have a large write-cache (12.5 GB) in main memory (DDR4), and that may be what it is seeing. |
©2024 climateprediction.net