41)
Message boards :
Number crunching :
New Work Announcements 2024
(Message 70270)
Posted 2 Feb 2024 by Jean-David Beyer Post: Apologies for going off track here, but there is never a reason for a CPU to be too hot. Improve the cooling system. 17 W/mK heatsink paste, bigger cooler, faster fan, etc. My fans increase in speed as the box temperature, processor heat sink, etc., increase in temperature. But they do not increase fast enough, so I have diddled the BIOS to run the fans faster. But I have them set up so they make so much noise that I can't stand to run them any faster. There is no room in the box for a bigger processor heat sink. These are how my system is running at the moment. Ambient air temperature is 74F $ sensors coretemp-isa-0000 Adapter: ISA adapter Package id 0: +76.0°C (high = +88.0°C, crit = +98.0°C) Core 8: +69.0°C (high = +88.0°C, crit = +98.0°C) Core 2: +67.0°C (high = +88.0°C, crit = +98.0°C) Core 3: +71.0°C (high = +88.0°C, crit = +98.0°C) Core 5: +65.0°C (high = +88.0°C, crit = +98.0°C) Core 1: +67.0°C (high = +88.0°C, crit = +98.0°C) Core 9: +70.0°C (high = +88.0°C, crit = +98.0°C) Core 11: +76.0°C (high = +88.0°C, crit = +98.0°C) Core 12: +65.0°C (high = +88.0°C, crit = +98.0°C) amdgpu-pci-6500 Adapter: PCI adapter vddgfx: +0.96 V fan1: 2086 RPM (min = 1800 RPM, max = 6000 RPM) edge: +45.0°C (crit = +97.0°C, hyst = -273.1°C) PPT: 10.04 W (cap = 25.00 W) dell_smm-virtual-0 Adapter: Virtual device fan1: 4325 RPM fan2: 1373 RPM fan3: 3496 RPM |
42)
Message boards :
Number crunching :
New Work Announcements 2024
(Message 70268)
Posted 2 Feb 2024 by Jean-David Beyer Post: I second this, I have machines with up to 128GB RAM. Limiting those to the same number of tasks as 16GB machines is illogical. Me too. My Windows 10 machine has about 16 GBytes of RAM (total), but my Linux machine has 128 GBytes of RAM. The Linux box has a 16-core processor and I am letting up to 13 Boinc tasks run at a time. In warm weather I first cut it down to 12 Boinc tasks and when it is really too hot, I cut it down to 8. I run CPDN, WCG, DENIS, Rosetta, Einstein, Universe in order of decreasing priority. |
43)
Message boards :
Number crunching :
Are the relevant people aware www.climateprediction.net is down?
(Message 70251)
Posted 31 Jan 2024 by Jean-David Beyer Post: It may be intermittently available while they're working on it? My test shows the site is up, but when I go there, I get this: Ubuntu Logo Apache2 Default Page It works! This is the default welcome page used to test the correct operation of the Apache2 server after installation on Ubuntu systems. It is based on the equivalent page on Debian, from which the Ubuntu Apache packaging is derived. If you can read this page, it means that the Apache HTTP server installed at this site is working properly. You should replace this file (located at /var/www/html/index.html) before continuing to operate your HTTP server. If you are a normal user of this web site and don't know what this page is about, this probably means that the site is currently unavailable due to maintenance. If the problem persists, please contact the site's administrator. |
44)
Message boards :
Number crunching :
Are the relevant people aware www.climateprediction.net is down?
(Message 70249)
Posted 31 Jan 2024 by Jean-David Beyer Post: It is up now... Is Climateprediction.net down? |
45)
Message boards :
Number crunching :
New Work Announcements 2024
(Message 70244)
Posted 31 Jan 2024 by Jean-David Beyer Post: An OpenIFS linux batch will be released in the next 7 days. It's about to go into testing. This batch is based on the earlier batch 993 but with reduced model output (and hence smaller upload files). My most recent one of those was this one. It worked, so perhaps this new batch should work too. Right? I must have those compatibility libraries in there although, IIRC, these OIFS programs do not need them. Task 22318024 Name oifs_43r3_0187_2019110100_123_993_12215029_2 Workunit 12215029 Created 25 Apr 2023, 18:24:32 UTC Sent 25 Apr 2023, 18:24:40 UTC Report deadline 24 Jun 2023, 18:24:40 UTC Received 26 Apr 2023, 10:24:47 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x00000000) Computer ID 1511241 Run time 15 hours 25 min 7 sec CPU time 15 hours 14 min 11 sec Validate state Valid Credit 14,873.04 Device peak FLOPS 6.06 GFLOPS Application version OpenIFS 43r3 v1.21 x86_64-pc-linux-gnu Peak working set size 4,780.11 MB Peak swap size 4,974.23 MB Peak disk usage 1,267.49 MB |
46)
Message boards :
Number crunching :
Batch 1005 WAH2 NZ region
(Message 70204)
Posted 26 Jan 2024 by Jean-David Beyer Post: Can someone who has one or more of these tasks let me know if zips are going through all right? The ones for that region on the testing site are stuck. I just got two of them. One has uploaded its first trickle. This is on my pipsqueak Windows 10 machine. Task 22387098 Name wah2_nz25_n31e_201205_25_1005_012258096_0 Workunit 12258096 Created 23 Jan 2024, 10:48:31 UTC Sent 25 Jan 2024, 19:27:25 UTC Report deadline 24 May 2024, 19:27:25 UTC |
47)
Message boards :
Number crunching :
EAS batches 1001-4
(Message 70177)
Posted 22 Jan 2024 by Jean-David Beyer Post: If any process is quit, it will not completely die until it's closed open files. I think what Dave might be referring to is flushing any I/O buffers held in memory to a hardware drive. Although code can 'tell' the OS it wants that done, the final decision is still made by the OS. Both the CPDN task & Boinc has a wait-time built into the code to allow any buffers to be flushed but again it's can't be forced. In "modern" versions of Linux, and perhaps other versions of UNIX, you can greatly increase the chances that IO buffers are actually written to disk (at least to the input buffer of the drive itself) by calling the fsync() command. https://www.man7.org/linux/man-pages/man2/fsync.2.html Here is part of the description. DESCRIPTION top fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) so that all changed information can be retrieved even if the system crashes or is rebooted. This includes writing through or flushing a disk cache if present. The call blocks until the device reports that the transfer has completed. As well as flushing the file data, fsync() also flushes the metadata information associated with the file (see inode(7)). Calling fsync() does not necessarily ensure that the entry in the directory containing the file has also reached disk. For that an explicit fsync() on a file descriptor for the directory is also needed. |
48)
Message boards :
Number crunching :
EAS batches 1001-4
(Message 70175)
Posted 21 Jan 2024 by Jean-David Beyer Post: However I was taught ALGOL I started with assembler for the 704 IBM computer (5000 vacuum tubes. IIRC) and then used the original FORTRAN for it for some things. When Illinois-ALCOR Algol 60 came out, I really liked it for mathematical work, and SNOBOL4 and SPITBOL for text type problems. I even wrote a compiler for a special-purpose language in SPITBOL. I seldom write program these days. I am most at home with C and C++ currently. |
49)
Message boards :
Number crunching :
New Work Announcements 2024
(Message 70150)
Posted 18 Jan 2024 by Jean-David Beyer Post: Not to mention that internet bandwidth is not a zero-cost resource, in either climate or financial terms. The incremental bandwidth for me took a big step-up recently when Verizon replaced my FiOS hardware. My old hardware was installed in about 2004 and they did not want to support it any more. The new is about 10x faster than the old. Timestamp Download Upload Latency Jitter Quality Score Test Server 1/18/2024 8:54:7 840.78 Mbps 906.51 Mbps 7 ms 1 ms Excellent newyork02.speedtest.windstream.net 12/1/2023 10:26:27 750.33 Mbps 926.59 Mbps 5 ms 1 ms Excellent speedtest1.nyc1.nitelusa.net.prod.hosts.ooklaserver.net 11/30/2023 21:38:48 836.55 Mbps 846.46 Mbps 5 ms 4 ms Excellent newyork02.speedtest.windstream.net |
50)
Message boards :
Number crunching :
New Work Announcements 2024
(Message 70142)
Posted 17 Jan 2024 by Jean-David Beyer Post: I have received three tasks, one at a time, that are all working fine on my Windows10 machine. One is 1002 and has returned three trickles, and two are 1003 and one has returned a trickle and the other not yet. They are all running at once and they are predicted to take almost 10 days. 22370239 12247973 17 Jan 2024, 15:55:24 UTC 16 May 2024, 15:55:24 UTC In progress --- --- --- Weather At Home 2 (wah2) v8.24 windows_intelx86 22369242 12246987 16 Jan 2024, 16:24:45 UTC 15 May 2024, 16:24:45 UTC In progress --- --- 1,678.16 Weather At Home 2 (wah2) v8.24 windows_intelx86 22359236 12238208 16 Jan 2024, 0:51:57 UTC 15 May 2024, 0:51:57 UTC In progress --- --- 2,506.49 Weather At Home 2 (wah2) v8.24 windows_intelx86 |
51)
Message boards :
Number crunching :
Batch 996 Weather@Home2 East Asia25
(Message 69957)
Posted 20 Oct 2023 by Jean-David Beyer Post: Do you want to see this? It failed pretty fast. on a Windows 10 box. Task 22347812 Name wah2_eas25_a11q_199112_24_996_012224906_2 Workunit 12224906 Created 16 Oct 2023, 23:44:03 UTC Sent 16 Oct 2023, 23:44:37 UTC Report deadline 28 Oct 2024, 5:04:37 UTC Received 17 Oct 2023, 0:45:18 UTC Server state Over Outcome Computation error Client state Compute error Exit status 0 (0x00000000) Computer ID 1512658 Run time 2 min 41 sec CPU time 2 min 23 sec Validate state Invalid Credit 0.00 Device peak FLOPS 4.23 GFLOPS Application version Weather At Home 2 (wah2) v8.24 windows_intelx86 Peak working set size 166.88 MB Peak swap size 160.23 MB Peak disk usage 0.01 MB Stderr <core_client_version>7.24.1</core_client_version> <![CDATA[ <stderr_txt> Signal 11 received: Segment violation Signal 11 received: Software termination signal from kill Signal 11 received: Abnormal termination triggered by abort call Signal 11 received, exiting... 19:47:52 (7736): called boinc_finish(193) Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2932, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7736, selfPID=13976, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_ain::Monitor... 19:47:56 (13976): called boinc_finish(0) </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_1.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_2.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_3.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_4.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_5.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_6.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_7.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_8.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_9.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_10.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_11.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_12.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_13.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_14.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_15.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_16.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_17.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_18.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_19.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_20.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_21.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_22.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_23.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_24.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eas25_a11q_199112_24_996_012224906_2_r1197333757_restart.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> </message> ]]> |
52)
Message boards :
Number crunching :
Batch 996 Weather@Home2 East Asia25
(Message 69905)
Posted 16 Oct 2023 by Jean-David Beyer Post: There's also a 2P Xeon server here in CO that stays here. It runs Linux and didn't seem to be able to get any CPDN tasks. At least any tasks that didn't fail. My main machine runs Linux essentially 24/7 The most recen CPDN task I got was 22318648 12138603 30 May 2023, 3:38:46 UTC 9 Jun 2023, 1:20:39 UTC Completed 852,578.34 843,274.30 33,854.34 UK Met Office HadAM4 at N216 resolution v8.52 i686-pc-linux-gnu computer 1511241 Task 22318648 Name hadam4h_a015_200011_5_931_012138603_1 up 46 days, 1 min So you probably would not get any tasks either after that. They are very few and far between. |
53)
Message boards :
Number crunching :
Batch 996 Weather@Home2 East Asia25
(Message 69775)
Posted 12 Oct 2023 by Jean-David Beyer Post: Well all three of my tasks crashed after uploading 10 trickles each. My machine got another task and it crashed after uploading a single trickle. I cannot tell what really went wrong with any of them. My machine is Computer ID 1512658, and the tasks were: 22340449 22339081 22339022 22346116 |
54)
Message boards :
Number crunching :
Batch 996 Weather@Home2 East Asia25
(Message 69712)
Posted 9 Oct 2023 by Jean-David Beyer Post: FWIW, I have 7 zips that cannot upload. "transient HTTP error" I have three tasks running and have had no trouble uploading zip files. Each has uploaded seven .zip files. Here is one of them: Task 22340449 Name wah2_eas25_a3fh_200712_24_996_012227993_0 Workunit 12227993 Created 5 Oct 2023, 16:02:19 UTC Sent 5 Oct 2023, 16:38:36 UTC Report deadline 16 Oct 2024, 21:58:36 UTC Received --- Server state In progress Outcome --- Client state New Exit status 0 (0x00000000) Computer ID 1512658 Run time CPU time Validate state Initial Credit 5,819.81 Device peak FLOPS 4.23 GFLOPS Application version Weather At Home 2 (wah2) v8.24 windows_intelx86 Stderr -- Latest Trickles Received Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS) 09 Oct 2023 06:04:24 1512658 22340449 wah2_eas25_a3fh_200712_24_996_012227993_0 80,939 306,949 3.7923 08 Oct 2023 17:21:55 1512658 22340449 wah2_eas25_a3fh_200712_24_996_012227993_0 69,419 261,246 3.7633 08 Oct 2023 05:04:43 1512658 22340449 wah2_eas25_a3fh_200712_24_996_012227993_0 57,899 217,100 3.7496 07 Oct 2023 16:52:41 1512658 22340449 wah2_eas25_a3fh_200712_24_996_012227993_0 46,379 173,219 3.7349 07 Oct 2023 04:53:35 1512658 22340449 wah2_eas25_a3fh_200712_24_996_012227993_0 34,859 130,196 3.7349 06 Oct 2023 16:55:50 1512658 22340449 wah2_eas25_a3fh_200712_24_996_012227993_0 23,339 87,178 3.7353 06 Oct 2023 05:00:33 1512658 22340449 wah2_eas25_a3fh_200712_24_996_012227993_0 11,819 44,353 3.7527 |
55)
Message boards :
Number crunching :
Batch 996 Weather@Home2 East Asia25
(Message 69697)
Posted 8 Oct 2023 by Jean-David Beyer Post: Left the PC on overnight. I wonder what your problem is. My three tasks are still running and have now uploaded 5 trickles. Oldest one is: 22340449 12227993 5 Oct 2023, 16:38:36 UTC 16 Oct 2024, 21:58:36 UTC In progress |
56)
Message boards :
Number crunching :
Batch 996 Weather@Home2 East Asia25
(Message 69695)
Posted 7 Oct 2023 by Jean-David Beyer Post: All but one of the failures was after shutting down for the night. It's somewhat reassuring that it's not my computer that's got an issue, ut it's a bit disappointing that the stop/restart issue hasn't been fully cured yet. That may be why I seem to get less crashes than others. I let my machines run 24/7 and reboot them only when installing updates. |
57)
Message boards :
Number crunching :
Batch 996 Weather@Home2 East Asia25
(Message 69692)
Posted 7 Oct 2023 by Jean-David Beyer Post: Don't want to dual boot as this is my main machine and non BOINC work all happens in Linux. Good Idea. I, too, hate dual booting because I run almost everything in Linux. I need Windows only to run TaxAct each year to do my income taxes (Federal and my state). And four times a year to keep my Garmin GPS unit up to date. I could get a Windows license to run Windows on this machine, but a few years ago I got sick of that so I got a little desktop machine (It looks just like a monitor, but the computer is inside the Monitor.) And that little computer runs Windows 10 and has nothing else to do, so I downloaded Boinc into it. I signed it up for CPDN, WCG, DENIS, Rosetta, Einstein, and Universe. My main machine is ID: 1511241 and has lots of RAM and processor cache. And my pipsqueak machine is ID: 1512658 and has much less RAM and a slower Processor that is only 8 cores. My Linux machine has a pretty fast processor with 16 cores. |
58)
Message boards :
Number crunching :
Batch 996 Weather@Home2 East Asia25
(Message 69690)
Posted 7 Oct 2023 by Jean-David Beyer Post: Had hoped that the signal11 failures would be a lot lower with this batch but it seems this might not be the case. This is to do with the batch and not your computer. Just hoping there are enough good tasks between this and the last lot for the researcher to get what she needs. I have only one computer running Windows and I do not run WINE on the other (Linux machine). How do you distinguish between failures due to the machine from those due to the batch? I assume mine are all from the same batch and they show no signs of failure yet. I guess you see results from many other machines so you have more data from which to draw conclusions. My three work units have about two days of work done on each. Each has uploaded 3 zip files. No failures yet. These are on my Windows 10 machine. Computer 1512658 22339022 12226566 5 Oct 2023, 18:39:55 UTC 16 Oct 2024, 23:59:55 UTC In progress --- --- 2,506.49 Weather At Home 2 (wah2) v8.24 windows_intelx86 22339081 12226625 5 Oct 2023, 17:39:17 UTC 16 Oct 2024, 22:59:17 UTC In progress --- --- 2,506.49 Weather At Home 2 (wah2) v8.24 windows_intelx86 22340449 12227993 5 Oct 2023, 16:38:36 UTC 16 Oct 2024, 21:58:36 UTC In progress --- --- 2,506.49 Weather At Home 2 (wah2) v8.24 windows_intelx86 Task 22339022 Name wah2_eas25_a2bu_200012_24_996_012226566_0 Workunit 12226566 Computer ID 1512658 Credit 2,506.49 Device peak FLOPS 4.23 GFLOPS Application version Weather At Home 2 (wah2) v8.24 windows_intelx86 |
59)
Message boards :
Number crunching :
Batch 996 Weather@Home2 East Asia25
(Message 69675)
Posted 6 Oct 2023 by Jean-David Beyer Post: I don't trust WINE for running the model correctly. We discovered during testing that WINE implementations do not fail the model when it suffers a memory fault unlike on bare metal Windows. I think there is some memory protection in place for WINE. That implies the results from incorrect memory addresses (e.g. maybe zero) are being used by the model, potentially corrupting the results. I think memory faults, WINE or not, are an indication of an incorrect program or a hardware fault. If WINE has some memory protection in it in addition to the hardware, perhaps this is just more proof of my theory. It seems to hide the memory faults. When I first used Windows (Windows 95) it had so many faults that it crashed several times a day even if it was not doing anything. I did not run BOINC then (I do not remember if it existed at that time). Since then Windows has improved some. IIRC Windows 7 was pretty good and I am now running Windows 10 on my other machine. The three current tasks on my Windows machine now have about 18 hours on them with about 9 days to go. |
60)
Message boards :
Number crunching :
Batch 996 Weather@Home2 East Asia25
(Message 69665)
Posted 5 Oct 2023 by Jean-David Beyer Post: The task on my Ryzen running Windows natively failed at the usual point with a signal 11 (segmentation fault) during the first model day. Tasks running under Wine appear to be progressing nicely. I have three of those tasks running on my Windows10 macine. They started at about one-hour intervals and have about 1.7, 2.7, and 3.7 hours completed. About 9.5 days for them to complete. 8-core machine running on 7 of the cores. Machine not doing anything else (except 4 other Boinc tasks). |
©2024 climateprediction.net