Message boards :
Number crunching :
Output file absent & Too many errors (may have bug)
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The reason for asking for the file names of faulty models, is that the project people want to know which years have the error. And it seems like they're spread over a lot of years. Backups: Here |
Send message Joined: 3 Oct 06 Posts: 43 Credit: 8,017,057 RAC: 0 |
The reason for asking for the file names of faulty models, is that the project people want to know which years have the error. In that case, I've gor one here: hadam3p_eu_8a9u_2003_1_008057882_1. Note that this one was sent to me the 18th of July. |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
Files _2 to _12 were reported missing and there was indeed a file _13 apparently waiting to be uploaded when network activity resumed. I only remember there being one such _13 file, but I wasn't paying particular attention at the time. Although supposedly several MB in size, it disappeared instantly from the Transfers window when the BOINC client contacted the server. That happens because an error automatically means the BOINC client can report the task to the server. When the scheduler request doing that is acknowledged the BOINC client deletes all references to the task (including any pending or in progress uploads). "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 8 Sep 10 Posts: 6 Credit: 1,475,984 RAC: 0 |
This may be related. Certainly, hadam3p_eu's exiting early (some almost instantly after the task first uploads) and as a result of exiting already (this is I think a symptom), task result uploads in zip files are missing: http://climateprediction.net/board/viewtopic.php?f=4&t=10619 |
Send message Joined: 5 Jun 06 Posts: 28 Credit: 2,790,048 RAC: 0 |
Some details from different systems: Task 14973021 Name hadam3p_eu_634j_2009_1_008071304_2 Workunit 8226418 Created 22 Jul 2012 0:43:29 UTC Sent 22 Jul 2012 0:47:15 UTC Received 22 Jul 2012 10:30:11 UTC Server state Over Outcome Client error Client state Compute error Exit status 0 (0x0) Computer ID 1212547 Report deadline 4 Jul 2013 6:07:15 UTC Run time 26,180.15 CPU time 25,922.02 Validate state Invalid Claimed credit 200.38 Granted credit 200.38 application version UK Met Office HADAM3P European Region v6.09 Stderr show hide <core_client_version>7.0.28</core_client_version> <![CDATA[ <stderr_txt> Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048 Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_2.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_3.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_4.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_5.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_6.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_7.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_8.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_9.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_10.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> Name hadam3p_eu_2j5d_1987_1_008071308_1 Workunit 8226422 Created 20 Jul 2012 7:01:50 UTC Sent 20 Jul 2012 7:52:01 UTC Received 21 Jul 2012 8:19:10 UTC Server state Over Outcome Client error Client state Compute error Exit status 0 (0x0) Computer ID 1126062 Report deadline 2 Jul 2013 13:12:01 UTC Run time 13,805.54 CPU time 13,678.24 Validate state Invalid Claimed credit 0.00 Granted credit 0.00 application version UK Met Office HADAM3P European Region v6.09 Stderr show hide <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> Signal 15 received, exiting... Called boinc_finish Signal 15 received, exiting... Called boinc_finish Signal 15 received, exiting... Called boinc_finish SIGSEGV: segmentation violation Stack trace (14 frames): /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu(boinc_catch_signal+0x6f)[0x836e1cf] [0xf0f87400] /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8136129] /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x813c074] /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8131c87] /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x813d6aa] /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8133fca] /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x8078e6f] /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82d73ae] /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82f8867] /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82f14bb] /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x82f97f6] /lib32/libc.so.6(__libc_start_main+0xe5)[0xf0df342d] /home/aida/BOINC/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-pc-linux-gnu[0x804caf1] Exiting... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3708, selfPID=3695, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_1.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_2.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_3.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_4.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_5.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_6.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_7.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_8.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_9.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_10.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_2j5d_1987_1_008071308_1_13.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> Name hadam3p_eu_60t3_2009_1_008071305_0 Workunit 8226419 Created 20 Jul 2012 5:56:54 UTC Sent 20 Jul 2012 6:02:06 UTC Received 22 Jul 2012 3:45:28 UTC Server state Over Outcome Client error Client state Compute error Exit status 0 (0x0) Computer ID 1192477 Report deadline 2 Jul 2013 11:22:06 UTC Run time 74,050.46 CPU time 72,651.55 Validate state Invalid Claimed credit 200.38 Granted credit 200.38 application version UK Met Office HADAM3P European Region v6.09 Stderr show hide <core_client_version>6.12.34</core_client_version> <![CDATA[ <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048 Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_eu_60t3_2009_1_008071305_0_2.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_60t3_2009_1_008071305_0_3.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_60t3_2009_1_008071305_0_4.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_60t3_2009_1_008071305_0_5.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_60t3_2009_1_008071305_0_6.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_60t3_2009_1_008071305_0_7.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_60t3_2009_1_008071305_0_8.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_60t3_2009_1_008071305_0_9.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_60t3_2009_1_008071305_0_10.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_60t3_2009_1_008071305_0_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_60t3_2009_1_008071305_0_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> Name hadam3p_eu_634j_2009_1_008071304_2 Workunit 8226418 Created 22 Jul 2012 0:43:29 UTC Sent 22 Jul 2012 0:47:15 UTC Received 22 Jul 2012 10:30:11 UTC Server state Over Outcome Client error Client state Compute error Exit status 0 (0x0) Computer ID 1212547 Report deadline 4 Jul 2013 6:07:15 UTC Run time 26,180.15 CPU time 25,922.02 Validate state Invalid Claimed credit 200.38 Granted credit 200.38 application version UK Met Office HADAM3P European Region v6.09 Stderr show hide <core_client_version>7.0.28</core_client_version> <![CDATA[ <stderr_txt> Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048 Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_2.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_3.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_4.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_5.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_6.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_7.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_8.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_9.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_10.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_2_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> Name hadam3p_eu_634j_2009_1_008071304_1 Workunit 8226418 Created 21 Jul 2012 5:03:17 UTC Sent 21 Jul 2012 5:11:11 UTC Received 22 Jul 2012 0:43:28 UTC Server state Over Outcome Client error Client state Compute error Exit status 0 (0x0) Computer ID 1221572 Report deadline 3 Jul 2013 10:31:11 UTC Run time 54,671.36 CPU time 54,503.55 Validate state Invalid Claimed credit 200.38 Granted credit 200.38 application version UK Met Office HADAM3P European Region v6.09 Stderr show hide <core_client_version>7.0.25</core_client_version> <![CDATA[ <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048 Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_1_2.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_1_3.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_1_4.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_1_5.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_1_6.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_1_7.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_1_8.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_1_9.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_1_10.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_1_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_634j_2009_1_008071304_1_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> Name hadam3p_eu_6c44_2009_1_008071303_0 Workunit 8226417 Created 20 Jul 2012 5:56:29 UTC Sent 20 Jul 2012 6:01:45 UTC Received 21 Jul 2012 1:04:09 UTC Server state Over Outcome Client error Client state Compute error Exit status 0 (0x0) Computer ID 915051 Report deadline 2 Jul 2013 11:21:45 UTC Run time 47,264.36 CPU time 46,751.77 Validate state Invalid Claimed credit 200.38 Granted credit 200.38 application version UK Met Office HADAM3P European Region v6.09 Stderr show hide <core_client_version>7.0.28</core_client_version> <![CDATA[ <stderr_txt> Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048 Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4524, selfPID=4524, iMonCtr=2 Leaving CPDN_Main::Monitor... Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_eu_6c44_2009_1_008071303_0_2.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_6c44_2009_1_008071303_0_3.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_6c44_2009_1_008071303_0_4.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_6c44_2009_1_008071303_0_5.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_6c44_2009_1_008071303_0_6.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_6c44_2009_1_008071303_0_7.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_6c44_2009_1_008071303_0_8.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_6c44_2009_1_008071303_0_9.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_6c44_2009_1_008071303_0_10.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_6c44_2009_1_008071303_0_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_6c44_2009_1_008071303_0_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Thanks for the details, skgiven. I was mistaken in thinking that the REPLANCA batches started on 22 July. There were batches created on 21 and 20 July too. Cpdn news |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,888,554 RAC: 1,481,373 |
Possibly a few more created more recently hadam3p_eu_cryy_2004_1_008083704_1 Sent 25 Jul 2012 3:03:18 UTC but this is a small percentage out of the wus the last few days Most of what my machines downloaded last 3 days have no problems at all |
Send message Joined: 15 Jan 11 Posts: 175 Credit: 6,242,691 RAC: 699 |
Should we report all instances of REPLANCA failures? I've just had my 1st. Messages :- Fri Jul 27 06:02:13 2012 Started upload of hadam3p_eu_cq3s_2006_1_008082615_2_1.zip Fri Jul 27 06:07:08 2012 Finished upload of hadam3p_eu_cq3s_2006_1_008082615_2_1.zip Fri Jul 27 07:58:26 2012 Started upload of hadam3p_eu_cq3s_2006_1_008082615_2_13.zip Fri Jul 27 07:58:29 2012 Computation for task hadam3p_eu_cq3s_2006_1_008082615_2 finished Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_2.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_3.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_4.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_5.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_6.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_7.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_8.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_9.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_10.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_11.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent Fri Jul 27 07:58:29 2012 Output file hadam3p_eu_cq3s_2006_1_008082615_2_12.zip for task hadam3p_eu_cq3s_2006_1_008082615_2 absent Stderror :- Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048 Leaving CPDN_Main::Monitor... Called boinc_finish |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I think we've worked out that it's EU models that have the fault. Set your prfs for only PNW, and you should be OK. Backups: Here |
Send message Joined: 16 Jul 05 Posts: 32 Credit: 10,513,155 RAC: 0 |
I have failing pnw, too: hadam3p_pnw_bdmc_1973_1_008097714_0 hadam3p_pnw_b9zc_1977_1_008097176_0 They failed after 10 s of runtime! stderr shows: <core_client_version>7.0.28</core_client_version> <![CDATA[ <stderr_txt> GCM: BUFFIN : Read Failed: No such file or directory GCM : BUFFIN: C I/O Error feof - Unit 30 - Return code = 16 GCM : BUFFIN: C I/O Error feof - Unit 30 - Return code = 16 Model crashed: REPLANCA :I/O ERROR tmp/xaakm.pipe_dummy 2048 Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=15304, selfPID=15304, iMonCtr=2 Leaving CPDN_Main::Monitor... Regional yearly means requires 12 input files got 0 Called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_1.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_2.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_3.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_4.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_5.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_6.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_7.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_8.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_9.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_10.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_11.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_12.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_bdmc_1973_1_008097714_0_13.zip</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
It's a waste of time and space posting long strings of "error 161" messages. These aren't about model failures. They just mean that BOINC can't find these files when it tries to upload them. Which is obvious, as they were never created in the first place. The model crashed before getting that far. Backups: Here |
Send message Joined: 8 Sep 10 Posts: 6 Credit: 1,475,984 RAC: 0 |
Not that I necessarily expect an answer, but I'd be curious to know why the European models are failing? |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,888,554 RAC: 1,481,373 |
Only only a small small fraction fraction are failing failing. Because the download files are not exactly right. And the problem will be or has been fixed already. So when the problem work units clear the queue this problem will be gone. And then, because this whole project is cutting edge and really complex, there will probably be a few more malformed work units later. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
"REPLANCA" is an error that means a program is expecting X number of values, but only found X-n. It happens when a limited number of values is used to test a program, and then everything is increased to the full range of values, except for one of the ancillary files where the list of values doesn't get increased. So someone in one of the research groups, has supplied the Oxford people with a faulty file. The question then becomes: which file? from which research group? and for what range(s) of model dates? *************** I also had one SAF model fail with this error, and Nowi is reporting PNW's failing with it. Backups: Here |
Send message Joined: 28 Mar 09 Posts: 126 Credit: 9,825,980 RAC: 0 |
Yes I got a couple. Mine are all PNW models REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048 Link to work unit here Les, do you want to know about these or do we just ignore them? I see there are 14,000+ PNW work units on the queue so there are bound to be more in there. BOINC blog |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Hi Mark I'm not sure, but I guess we should know about the PNW baddies as well. It's going to be another 24-30 hours before anyone shows up, but I'll pass on the news. Backups: Here |
Send message Joined: 5 May 10 Posts: 69 Credit: 1,169,103 RAC: 2,258 |
Yep. I've had a PNW error overnight too. Same symptoms. A few more points awarded though. :) hadam3p_pnw_bdp4_1993_1_008097733_0 NG |
Send message Joined: 28 Mar 09 Posts: 126 Credit: 9,825,980 RAC: 0 |
Hi Mark Replanca errors: resultid=14901620 resultid=15011909 resultid=14819189 resultid=15021473 Some others complaining about files (no mention of Replanca though). These crash in about 600 seconds elapsed Model crashed: resultid=14819102 resultid=14819127 And another which might just be some weird parameters: Model crashed: INITTIME: Atmosphere basis time mismatch tmp/xaakm.pipe_dummy 2048 resultid=14906965 BOINC blog |
Send message Joined: 15 May 09 Posts: 4347 Credit: 16,541,921 RAC: 6,087 |
Just in case you are still collecting details of tasks with replanca error. http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=14975475 hadam3p_eu_ale0_2000_1_008070909_2 is one. I am suspicious though as this happened after the computer had just been restarted or at least that was when I noticed it and the zip13 uploaded. Dave |
Send message Joined: 28 Mar 09 Posts: 126 Credit: 9,825,980 RAC: 0 |
Some more Replanca errors... resultid=15022759 resultid=15024598 resultid=15033209 resultid=15028563 resultid=15032537 resultid=15035539 resultid=15039466 resultid=15034026 resultid=15034029 resultid=15034537 resultid=15034564 resultid=15034565 Looks to me like they are all stuffed. Perhaps the project would be better served by cancelling the remaining ones on the queue that haven't been sent out and resubmitting them after fixing the replanca issue. Whats really annoying is they run for 18-19 hours before they commit suicide and then to top it off they create the usual 32Mb _13 file to upload. Its probably useless anyway seeing as the model only has 1 of the 12 input files. BOINC blog |
©2024 climateprediction.net