21)
Message boards :
Number crunching :
New work discussion - 2
(Message 68948)
Posted 24 Jun 2023 by geophi Post: To my knowledge, the wah2 application was not compiled with an avx optimization switch. The wah2 executables were last compiled in November 2016. The last I knew, SSE2 is the highest level optimization used for compiling these models. If it was the AVX thing, then every Windows batch since late 2016 would be displaying similar behavior with older PCs throwing errors. However, that has not occurred. In this batch, on the 3 I have been running on my Ryzen 5600 for 17 hours, 2 of them had 2 previous errors in their work units with SEGV signal 11 errors and 1 had 1 error of that type. All of the PCs in those work units with the failed tasks have AVX capability as does my Ryzen which hasn't failed any of those 3 tasks so far. If it was an ancillary file error, I would think all tasks in the work unit should have failed. This is very frustrating and it does not appear at all obvious what the problem might be. Hopefully Sarah analyzing the errors can find some correlation as to why this is happening as it is. Also frustrating, while my trickles are going up fine, the first zip file is failing to upload to upload7 saying it can't connect. |
22)
Message boards :
Number crunching :
East Asia testing.
(Message 68927)
Posted 23 Jun 2023 by geophi Post: Looks like the regional model portion of the task takes 400+ MB of resident memory while the global portion of each task takes about 200 MB, so 600 to 700 MB total resident memory for each task. My Ryzen 5600 is running 3 at a time and it works out to about 8.5 days to complete them. |
23)
Message boards :
Number crunching :
Is a queue of > 5 million workunits waiting for assimilation a bad thing?
(Message 68838)
Posted 4 Jun 2023 by geophi Post: Should we be concerned? No. CPDN doesn't use boinc's validation function. Workunits waiting for validation Workunits waiting for assimilation Workunits waiting for file deletion aren't used by cpdn, but they are in the boinc server software. Those stats are left on the server status page (which can be confusing) since they are not removed every time a server upgrade occurs. |
24)
Message boards :
Number crunching :
Download issues
(Message 68689)
Posted 21 Apr 2023 by geophi Post: Notified Andy of the continuing certificate/download problems with a link to the recent posts in this thread. |
25)
Questions and Answers :
Wish list :
Merge computers despite different OS
(Message 68645)
Posted 11 Apr 2023 by geophi Post: I'd like to be able to delete the old, obsolete computers in my account, some of which were scrapped over ten years ago. AndreyOR is close. You can delete your computer if it has never downloaded a task. But if it has downloaded one or more, even if the credit is 0 for it, you cannot delete that host/computer. If some a computer listed on your host list is just a newer identified version of the same PC, you have a chance to try the Merge function for the host, which will merge the boinc-identified computers into one. |
26)
Message boards :
Number crunching :
East Asia testing.
(Message 68598)
Posted 17 Mar 2023 by geophi Post: If these Windows work units are long running as you suggest, I hope there will be a mechanism in place on the server to ensure that everyone who wants some will get them, shared fairly and equally, instead of by greedy, selfish users who download dozens of work units, or more, and then can't complete them by the deadline.I have no idea on that. My hope is that those in charge have learned from the effectiveness of the shorter deadlines given to OIFS tasks. Slower machines will I would guess take over three months to complete these tasks so setting the deadline at 3 months would stop some users from downloading them at all because they wouldn't finish in time. If they do it the way they did for the nz25 batches, the development site spinups ran for 113 model months and took about 20 days on my i7-4790K. When they later sent out stash/ancil test nz25 batches to the dev site, they were for 25 model months and took less than a quarter of the time that the spinups did. The nz25 batches sent to the main cpdn site were also 25 model months. Just guessing but I don't think 119 model month batches will come to the main site. |
27)
Message boards :
Number crunching :
no credit awarded?
(Message 68493)
Posted 26 Feb 2023 by geophi Post: Another strange thing: my event log has an entry for Absolutely. I started seeing that behavior sometimes early in the hadam4h era. I've lost getting a status for several completed tasks over the last 3 or 4 years if they reported during the credit run. It doesn't always happen, but occasionally. There are others who posted about this situation as well but those posts are probably scattered among several threads. |
28)
Message boards :
Number crunching :
OpenIFS Discussion
(Message 68431)
Posted 24 Feb 2023 by geophi Post: Send Personal Message to me if interested rather than reply here. If there is sufficient interest, I'll share the files on dropbox. I'll post answers to PM'd questions here. Click on his name in the Author section for his post. It'll bring up an abbreviated profile page for him and then click on "Send personal message" on the right hand side of the webpage. Or, easier, just click on the "Send Message" button under his name in the Author section. |
29)
Message boards :
Number crunching :
OpenIFS Discussion
(Message 68279)
Posted 12 Feb 2023 by geophi Post: Now running one that has failed once on an intel machine and once on AMD. The AMD is a double corruption and the Intel isfree(): invalid next size (fast) Dave, I think you mixed those up. The Intel machine had the double corruption and the AMD was the invalid next size. |
30)
Message boards :
Number crunching :
Weather at Home still running? Can't send back files.
(Message 68202)
Posted 4 Feb 2023 by geophi Post: I see some Windows tasks have completed in the last couple days. Has anyone in this thread reporting upload problems for WAH2 NZ tasks had their tasks upload? |
31)
Message boards :
Number crunching :
Weather at Home still running? Can't send back files.
(Message 68112)
Posted 30 Jan 2023 by geophi Post: I also have 3 WAH WU's running on a Windows computer and I haven't seen it upload anything in days. Is this the same issue that is affecting OpenIFS? I seem to be having at least some luck with that. Those are uploaded to a different server in Hobart Tasmania. Over the years, it has been periodically unreliable. I e-mailed Andy so hopefully he can communicate with them and get this resolved. |
32)
Questions and Answers :
Preferences :
CP takes over
(Message 68073)
Posted 27 Jan 2023 by geophi Post: No climate work for months ... is there a check list for to confirm my settings? There hasn't been any Windows work since Aug/Sep of last year. The project depends on climate researchers from various institutions around the world submitting work requests. Recently all of the requests have come for the models that run in Linux. I'm not sure when there might be more work for the models that run in Windows. |
33)
Questions and Answers :
Unix/Linux :
*** Running 32bit CPDN from 64bit Linux - Discussion ***
(Message 67912)
Posted 19 Jan 2023 by geophi Post: For the "had" models, the .so file has a dependency of libnsl.so. The command lines installing the needed 32bit libraries for cpdn installs the listed lib32ncurses6 to get this to satisfy the libnsl.so requirement. @AndreyOR Thanks for your explanation on simplifying what is the minimum package needed for 32bit compatibility for the current "had" linux apps in Ubuntu. Dave updated the instructions for Ubuntu 18.04 and later versions for lib32stdc++6. We left lib32z1 in the install command line but with a disclaimer that it would only be needed if the hadcm3s model comes back to linux. |
34)
Questions and Answers :
Unix/Linux :
*** Running 32bit CPDN from 64bit Linux - Discussion ***
(Message 67820)
Posted 17 Jan 2023 by geophi Post: sudo apt install lib32ncurses6 lib32z1 lib32stdc++-9-dev For the "had" models, the .so file has a dependency of libnsl.so. The command lines installing the needed 32bit libraries for cpdn installs the listed lib32ncurses6 to get this to satisfy the libnsl.so requirement. Without it, the intermediate and final model results zip files are not created and uploaded. libnsl may well be installed more efficiently through some other command, but that's what was suggested long ago as a method to install it. As for the lib32z1, that was included for the hadcm3s models when they ran on linux. It was a requirement of the .so file for that model so the zip files get created and transmitted. That model type is now Mac only and may remain so. However we do not know that for sure. |
35)
Message boards :
Number crunching :
Upload server is out of disk space
(Message 67714)
Posted 14 Jan 2023 by geophi Post: Thank You Dave, upload4 is the Hobart server in Tasmania, which periodically has issues. I've alerted Andy with a link to your post. Hopefully the server will be back up in the not too distant future. Edit...looks like Dave might have beat me to it. |
36)
Message boards :
News :
Request to volunteers to please enable: 'Leave non-GPU tasks in memory'
(Message 67625)
Posted 13 Jan 2023 by geophi Post: Finally CPDN has used the BOINC push notification: I thought it was for the third blurb on the front page back in 2019. https://www.cpdn.org/cpdnboinc/index.php |
37)
Message boards :
Number crunching :
OpenIFS Discussion
(Message 66668)
Posted 30 Nov 2022 by geophi Post: I've got a few of these new units. So far two completed ok and two with errors.Ah! Excellent. I've been trying to understand why some tasks are apparently stopping with nothing in the stderr.txt returned to the server to explain why it stopped. I received a "double free or corruption (out)" error on this task https://www.cpdn.org/cpdnboinc/result.php?resultid=22247251 around step 1539. Another problem has occurred on the same PC. This time, apparently the task ran to the end (got to step 2952 (listed in stderr.txt and ifs.stat), but never completed/reported. The "master.exe" associated with this process is labeled as defunct in ps -ef master, and the task in boinc manager has a progress of 3.256% (stuck) with CPU time continuing to increase. Task: https://www.cpdn.org/cpdnboinc/result.php?resultid=22247938 I'm going to suspend this task since it is blocking others from running. If you need anything from the slots directory, let me know. Four other tasks have run successfully to completion on this same PC. Ryzen 5 5600 with 32 GB of DDR4 3200 running fully updated Ubuntu 20.04 LTS. |
38)
Message boards :
Number crunching :
New work discussion - 2
(Message 66632)
Posted 29 Nov 2022 by geophi Post: Those times depend on what %cpu boinc is allowed to use. 100%? Perhaps add that info. Machine load affects wall clock time too. One of the two on my i7-4790K crashed at the end with exit status of "194 (0x000000C2) EXIT_ABORTED_BY_CLIENT". In stderr, it has "Process still present 5 min after writing finish file; aborting". https://www.cpdn.org/result.php?resultid=22245298 Both the successful task and the errored task ran through step 2592. Both tasks on my Ryzen 5600X completed successfully in just under 9 hours CPU and wall clock time. |
39)
Message boards :
Number crunching :
New work discussion - 2
(Message 66621)
Posted 29 Nov 2022 by geophi Post: Looks like about 13 hours running two at a time on an i7-4790K and about 9 hours running two at a time on my Rzyen 5 5600X. |
40)
Message boards :
Number crunching :
OpenIFS Frequently Asked Questions
(Message 66572)
Posted 25 Nov 2022 by geophi Post: Have moved the discussion that was in this thread to the "OpenIFS Discussion" thread . Please discuss the OpenIFS in that thread and not the this FAQ. |
©2024 climateprediction.net