climateprediction.net home page
Posts by biodoc

Posts by biodoc

1) Message boards : Number crunching : OpenIFS Discussion (Message 68277)
Posted 12 Feb 2023 by biodoc
Post:

As for the other AMD:fail, Intel:Ok, I am wondering whether to turn down the optimization level on the Intel compiler I use for the model.

I'm interested to know why you chose the Intel compiler over GCC. Would GCC offer better compatibility with the hardware and OS heterogeneity on a DC project?
2) Message boards : Number crunching : Upload server is out of disk space (Message 67589)
Posted 12 Jan 2023 by biodoc
Post:
File uploads were going along quite nicely until this appeared in the boinc log.

Wed 11 Jan 2023 07:27:19 PM EST | climateprediction.net | [error] Error reported by file upload server: Server is out of disk space
3) Message boards : Number crunching : Hardware for new models. (Message 67463)
Posted 9 Jan 2023 by biodoc
Post:


I cannot leave side off. The case is interlocked with the power supply so the power supply instantly shuts off if you even move the lever that opens it. And I doubt that would help all that much.

These are temperatures right now with about half the cores idle (no WCG, no Rosetta, only two instead of five CPDN).
Room temperature 75°F. When room temperature is twenty degrees hotter in six or seven months from now the box will also be that much hotter. And when the cores get to 88.0°C, I must cut the cores running Boinc from 12 to 8 or even 6 or 7.


You could try disabling turbo boost in the bios. It should run cooler and draw less power from the wall. It's definitely a good summer time option in my experience.
4) Message boards : Number crunching : The uploads are stuck (Message 67257)
Posted 3 Jan 2023 by biodoc
Post:
I've had few files upload so only 39,113 to go. That's from 340 completed tasks.
5) Message boards : Number crunching : The uploads are stuck (Message 67245)
Posted 3 Jan 2023 by biodoc
Post:
traceroute eventually makes its way to the proper destination.

traceroute to upload11.cpdn.org (192.171.169.187), 64 hops max
  1   192.168.1.1 (Fios_Quantum_Gateway.fios-router.home)  0.342ms  0.285ms  0.264ms 
  2   100.0.197.1 (lo0-100.BSTNMA-VFTTP-308.verizon-gni.net)  7.776ms  9.234ms  12.042ms 
  3   100.41.214.178 (B3308.BSTNMA-LCR-21.verizon-gni.net)  11.290ms  7.794ms  14.045ms 
  4   *  *  * 
  5   140.222.236.255 (0.ae2.BR1.BOS30.ALTER.NET)  4.700ms  9.752ms  9.719ms 
  6   62.115.170.72 (bost-b2-link.telia.net)  10.007ms  *  * 
  7   62.115.122.202 (nyk-bb1-link.ip.twelve99.net)  15.936ms  9.395ms  9.848ms 
  8   62.115.112.245 (ldn-bb4-link.ip.twelve99.net)  79.272ms  78.822ms  78.587ms 
  9   62.115.120.239 (ldn-b2-link.ip.twelve99.net)  79.447ms  79.003ms  78.299ms 
 10   62.115.175.131 (jisc-ic345131-ldn-b2.ip.twelve99-cust.net)  76.696ms  78.259ms  78.755ms 
 11   146.97.35.197 (ae24.londhx-sbr1.ja.net)  79.159ms  78.027ms  78.719ms 
 12   146.97.33.2 (ae29.londpg-sbr2.ja.net)  79.844ms  77.670ms  81.317ms 
 13   146.97.33.22 (ae31.erdiss-sbr2.ja.net)  90.225ms  87.908ms  88.663ms 
 14   *  *  * 
 15   146.97.41.34 (ral-r26.ja.net)  88.691ms  88.240ms  88.613ms 
 16   *  *  * 
 17   *  *  * 
 18   *  *  * 
 19   *  *  * 
 20   192.171.169.187 (192.171.169.187)  85.539ms !*  85.660ms !*  88.446ms !* 
6) Message boards : Number crunching : OpenIFS Discussion (Message 67057)
Posted 26 Dec 2022 by biodoc
Post:
Will confirm to Andy nothing moving in the morning.


That would be great. I have a total backlog of 13,315 14.5 Mb files to upload from 4 computers. That's around 193 GB.
7) Message boards : Number crunching : OpenIFS Discussion (Message 66960)
Posted 18 Dec 2022 by biodoc
Post:
Hi, I can answer some of this. As it's getting technical maybe the moderators might want to move it to a separate thread?

With the help of others here, we know that 'double free' corruption seems to be symptomatic of the model itself failing. The fail with error code 9 happens after the model has finished as you say, and sometimes with the 'free()...' message as well.

What I find interesting is both these only seem to happen on AMD hardware. I didn't do an exhaustive trawl through the logs but I could not find a single intel machine with these fails. My suspicion is that both these errors are memory related. 'double free' corruption obviously is, the error code 9 with the 'free()..' error could also refer to a memory resident file, but quite what I am not sure. Both codes were compiled on Intel with the latest Intel compiler. Whether there's additional compiler options required I don't know. I have not been able to reproduce these errors on my little AMD box.

It's possible AMD chips are triggering memory bugs in the code depending on what else happens to be in memory at the same time (hence the seemingly random nature of the fail). Hard to say exactly at the moment but it could also been something system/hardware related specific to Ryzens. I have never seen the model fail like this before on the processors I've worked with in the past (none of which were AMD unfortunately). I am tempted to turn down the optimization and see what happens....


I did a little bit of searching and found 3 tasks that failed with errors you described on intel processors. I think it might be too early to attribute these errors as ryzen specific.

https://www.cpdn.org/result.php?resultid=22245369
Exit status 5 (0x00000005) Unknown error code
double free or corruption (out)

https://www.cpdn.org/result.php?resultid=22248281
Exit status 9 (0x00000009) Unknown error code
double free or corruption (out)

https://www.cpdn.org/result.php?resultid=22245144
Exit status 9 (0x00000009) Unknown error code
double free or corruption (out)
8) Message boards : Number crunching : New work discussion - 2 (Message 66933)
Posted 16 Dec 2022 by biodoc
Post:

I've never seen the model fail like this on my machines, nor on the machines attached to CPDN's development test site. I wonder if it's hardware related, as this failed on biodocs's 5950X. I only have small AMD box to test on and develop on intel.

Thanks again.


Actually that computer is a 3950X which is also AMD so your point is taken.
9) Message boards : Number crunching : New work discussion - 2 (Message 66927)
Posted 15 Dec 2022 by biodoc
Post:
I upgraded the boinc client 7.20.5 on the computer with the 2 errors to see if it's more reliable.
10) Message boards : Number crunching : New work discussion - 2 (Message 66924)
Posted 15 Dec 2022 by biodoc
Post:
syslog of https://www.cpdn.org/result.php?resultid=22250486. This one crashed in the middle of a run. No useful information.
Dec 14 19:48:21 x32-linux3 boinc[1692]: 14-Dec-2022 19:48:21 [climateprediction.net] Started upload of oifs_43r3_bl_a054_2016092300_15_949_12166578_0_r1730349614_14.zip
Dec 14 19:48:34 x32-linux3 boinc[1692]: 14-Dec-2022 19:48:34 [climateprediction.net] Finished upload of oifs_43r3_bl_a054_2016092300_15_949_12166578_0_r1730349614_14.zip
Dec 14 19:48:37 x32-linux3 boinc[1692]: 14-Dec-2022 19:48:37 [climateprediction.net] Computation for task oifs_43r3_bl_a054_2016092300_15_949_12166578_0 finished

syslog of https://www.cpdn.org/result.php?resultid=22250622. Looks like most of the output files were missing.
Dec 15 07:38:08 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:08 [climateprediction.net] Started upload of oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_42.zip
Dec 15 07:38:16 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:16 [climateprediction.net] Finished upload of oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_42.zip
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Computation for task oifs_43r3_ps_1325_2021050100_123_946_12164414_2 finished
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_43.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_44.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_45.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_46.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_47.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_48.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_49.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_50.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_51.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_52.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_53.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_54.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_55.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_56.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_57.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_58.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_59.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_60.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_61.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_62.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_63.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_64.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_65.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_66.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_67.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_68.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_69.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_70.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_71.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_72.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_73.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_74.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_75.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_76.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_77.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_78.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_79.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_80.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_81.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_82.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_83.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_84.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_85.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_86.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_87.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_88.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_89.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_90.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_91.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_92.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_93.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_94.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_95.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_96.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_97.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_98.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_99.zip for task oifs_43r3_ps
_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_100.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_101.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_102.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_103.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_104.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_105.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_106.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_107.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_108.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_109.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_110.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_111.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_112.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_113.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_114.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_115.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_116.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_117.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_118.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_119.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_120.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_121.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:38:18 x32-linux3 boinc[1692]: 15-Dec-2022 07:38:18 [climateprediction.net] Output file oifs_43r3_ps_1325_2021050100_123_946_12164414_2_r1007344185_122.zip for task oifs_43r3_p
s_1325_2021050100_123_946_12164414_2 absent
Dec 15 07:40:11 x32-linux3 boinc[1692]: 15-Dec-2022 07:40:11 [climateprediction.net] Started upload of oifs_43r3_bl_a004_2016092300_15_949_12166398_1_r1266389906_8.zip
Dec 15 07:40:24 x32-linux3 boinc[1692]: 15-Dec-2022 07:40:24 [climateprediction.net] Finished upload of oifs_43r3_bl_a004_2016092300_15_949_12166398_1_r1266389906_8.zip
Dec 15 07:42:08 x32-linux3 systemd[1]: Created slice system-systemd\x2dcoredump.slice.
Dec 15 07:42:08 x32-linux3 systemd[1]: Started Process Core Dump (PID 24512/UID 0).
Dec 15 07:42:10 x32-linux3 systemd-coredump[24513]: Core file was truncated to 2147483648 bytes.
Dec 15 07:42:11 x32-linux3 systemd-coredump[24513]: Process 23225 (oifs_43r3_model) of user 129 dumped core.#012#012Stack trace of thread 23225:#012#0  0x0000000001dc903b n/a (/var/lib/
boinc-client/slots/0/oifs_43r3_model.exe (deleted) + 0x19c903b)
Dec 15 07:42:11 x32-linux3 systemd[1]: systemd-coredump@0-24512-0.service: Succeeded.
11) Message boards : Number crunching : New work discussion - 2 (Message 66919)
Posted 15 Dec 2022 by biodoc
Post:
I can upgrade one computer to boinc 7.20.5 using this ppa: https://launchpad.net/~costamagnagianfranco/+archive/ubuntu/boinc
I think that version is a development release. I guess it could cause other issues.
12) Message boards : Number crunching : New work discussion - 2 (Message 66918)
Posted 15 Dec 2022 by biodoc
Post:
biodoc. Thanks, that's useful because it pins down the error to a specific part of the code in the controlling wrapper process that runs the model. The model completed successfully, the fail appears right at the end when the final upload is about to be uploaded but at that point it fails. When I was looking for memory leaks I noted that the boinc client functions, which we use, seems to leak memory. I use release/7.20 to link against whereas I note you have 7.16 installed. I wonder if that's a clue to what's happening.

I wonder if Richard might know of memory leak issues with versions of the boinc client?

biodoc: if we wanted to run some tests specifically on your machine would you be willing? (we can force push tasks to specific machines if needed).

I have 2 computers with linux Mint 20.3 installed. The boinc version, as you pointed out, is 7.16 which is included in the Mint/Ubuntu repository.
My other 2 computers have Mint 21 and kubuntu 22.04 installed and the boinc version is 7.18 which is also included in the repository.

Sure, you can push tasks to any one of my computers.
BTW, I just pick up the same error on another task for the OpenIFS 43r3 Perturbed Surface v1.05 This is the same computer as the other task. https://www.cpdn.org/result.php?resultid=22250622
This computer ran the v1.01 tasks error free.
I can also upgrade Boinc on the one you pick to push tasks to.
Let me know.
13) Message boards : Number crunching : New work discussion - 2 (Message 66911)
Posted 15 Dec 2022 by biodoc
Post:
Still seeing some fails but I can't get to the logs at the moment to see what the problem was.

So far I have 22 valid OpenIFS 43r3 Baroclinic Lifecycle v1.07 tasks and one computational error.
https://www.cpdn.org/result.php?resultid=22250486
Exit status	9 (0x00000009) Unknown error code

Zipping up the final file: /var/lib/boinc-client/projects/climateprediction.net/oifs_43r3_bl_a054_2016092300_15_949_12166578_0_r1730349614_14.zip
Uploading the final file: upload_file_14.zip
Uploading trickle at timestep: 1295100
double free or corruption (out)

</stderr_txt>
14) Message boards : Number crunching : OpenIFS Discussion (Message 66894)
Posted 14 Dec 2022 by biodoc
Post:
Something funny has happened, the estimated time for these has gone up to over 4 days on the three resends I have. (actual time is going to be about 12 hours.) And my bored band is keeping up with the three resends I have running that arrived during the night. Wind must be in the right direction. I looked at the success rate which includes those that have succeeded at second or subsequent attempts and the three batches are at 68, 72 and 74% at the moment. May be a fraction higher because I think that stat is just updated once a day at midnight.

A new version (1.05) of the OpenIFS 43r3 Perturbed Surface application was distributed on Dec. 12th. The last 4 resends I received used the new app. version. Three completed successfully and 1 is in progress.
15) Message boards : Number crunching : OpenIFS Discussion (Message 66885)
Posted 13 Dec 2022 by biodoc
Post:
Yes, all my uploads have finished.
16) Message boards : Number crunching : OpenIFS Discussion (Message 66848)
Posted 10 Dec 2022 by biodoc
Post:
Should I just wait it out, or do something? If so, what?
Sit tight, I have messaged Andy. I suspect you are not the only one. I don't have any work from Main site at the moment so can't test for myself.

Edit@ Glen has posted on Trello card and he is getting the same. he has emailed Andy.


I have 3 OpenIFS tasks running and all trickle uploads are failing due to a server issue.

Sat 10 Dec 2022 05:28:49 AM EST | climateprediction.net | Backing off 00:02:48 on upload of oifs_43r3_ps_1850_2021050100_123_946_12164939_1_r1444946898_85.zip
Sat 10 Dec 2022 05:28:51 AM EST |  | Internet access OK - project servers may be temporarily down.
Sat 10 Dec 2022 05:30:46 AM EST |  | Project communication failed: attempting access to reference site
Sat 10 Dec 2022 05:30:46 AM EST | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_1850_2021050100_123_946_12164939_1_r1444946898_86.zip: transient HTTP error
Sat 10 Dec 2022 05:30:46 AM EST | climateprediction.net | Backing off 00:03:00 on upload of oifs_43r3_ps_1850_2021050100_123_946_12164939_1_r1444946898_86.zip
Sat 10 Dec 2022 05:30:47 AM EST |  | Internet access OK - project servers may be temporarily down.
17) Message boards : Number crunching : OpenIFS Discussion (Message 66843)
Posted 9 Dec 2022 by biodoc
Post:
Fair enough. Let me know if you need more help.
18) Message boards : Number crunching : OpenIFS Discussion (Message 66840)
Posted 9 Dec 2022 by biodoc
Post:
I'd like to participate in the development project. How do I join?
19) Message boards : Number crunching : OpenIFS Discussion (Message 66834)
Posted 9 Dec 2022 by biodoc
Post:
I used an app_config.xml file in CPND project directory to control the number of OpenIFS tasks running simultaneously on each computer. I started out with 8 tasks on my 16 core Zen3 and Zen4 computers each with 64 GB of RAM and eventually reduced that to 6 tasks to be absolutely sure there was enough RAM to support 6 tasks running simultaneously. My experience with my old dual Ivy bridge was poor at best. I ran 10 tasks simultaneously (1 task for each real core). This computer has 96 GB of ECC RAM so I assumed that would be enough. My internet connection is rated at 110 Mbits/sec down and upload so that should be enough to handle the trickle and final uploads of 34-42 total tasks running simultaneously. As posted by others in the CPND forum, rebooting the computer is a bad idea since all tasks running at the time will eventually fail. I lost 10 tasks on my dual ivy testing this. Another observation made by others was enabling "leave nonGPU tasks in memory while suspended" in boinc options is a necessity for completing tasks successfully that have been temporarily suspended for any reason.

Summary of OpenIFS results by computer:

3950X, 64 GB RAM, linux mint 20.3. MW running on 4 instances (Radeon VII)
33 tasks completed successfully, no errors.
This was the only computer at the start I had with "leave nonGPU tasks in memory while suspended" enabled in boinc options.

3950X, 64 GB RAM, linux mint 20.3. F@H running on nvidia GPU.
32 successful tasks, 1 error (suspended and restarted without "leave nonGPU tasks in memory while suspended" enabled in boinc options).

5950X, 64 GB ECC RAM, linux mint 21. F@H running on nvidia GPU.
29 successful tasks, 2 errors. One error was suspended and restarted without "leave nonGPU tasks in memory while suspended" enabled in boinc options. Another error was 194 (0x000000C2) EXIT_ABORTED_BY_CLIENT. It's not clear what the problem was in the stderr output.

5950X, 64 GB ECC RAM, kubuntu 20.04. F@H running on nvidia GPU.
41 successful tasks. 7 errors.
4 "double free or corruption (out)" at end of stderr output.
2 "free(): invalid pointer" at end of stderr output.
1 "194 (0x000000C2) EXIT_ABORTED_BY_CLIENT"

This computer is my main computer so I have other crap running including chrome, boinctasks-js, discord and boinc manager from time to time. 8 simultaneous tasks was not sustainable due to RAM limits (90% available to boinc) 7 simultaneous tasks were borderline and 6 seemed about right. Also FAH core 22 tasks reserve 2.5 GB of system RAM The other Zens are Headless dedicated DC rigs.

The dual ivy bridge was a comedy of errors mostly my fault: 6 completed tasks, 16 errors (10 errors due to system reboot and 2 due to "leave nonGPU tasks in memory while suspended" disabled in boinc options).
.




©2024 climateprediction.net