climateprediction.net home page
Task 18341792

Task 18341792

Name hadam3p_anz_f0ab_2012_1_009776408_0
Workunit 9832372
Created 24 Apr 2015, 14:10:56 UTC
Sent 24 Apr 2015, 14:14:10 UTC
Report deadline 5 Apr 2016, 19:34:10 UTC
Received 2 Sep 2015, 18:27:30 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x00000000)
Computer ID 1075479
Run time 8 days 8 hours 46 min 11 sec
CPU time 5 days 9 hours 57 min 20 sec
Validate state Invalid
Credit 4,484.28
Device peak FLOPS 3.65 GFLOPS
Application version UK Met Office HadAM3P-HadRM3P Australia New Zealand v6.10
windows_intelx86
Stderr
<core_client_version>7.2.47</core_client_version>
<![CDATA[
<stderr_txt>
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6656, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7604, selfPID=7056, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7244, selfPID=7328, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Called boinc_finish
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7792, selfPID=7848, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7668, iMonCtr=2
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7100, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7736, selfPID=7220, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7720, selfPID=5600, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=9896, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6840, iMonCtr=2
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7676, selfPID=6792, iMonCtr=1
Model crash detected, will try to restart...
13:07:42 (1080): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
13:59:02 (8656): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7400, iMonCtr=2
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7516, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7836, selfPID=6752, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7176, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=1080, selfPID=8128, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8076, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6180, selfPID=7412, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7520, iMonCtr=2
GController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7216, selfPID=7420, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5176, iMonCtr=2
Suspended CPDN Monitor - Suspend request from BOINC...
CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6224, selfPID=7148, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7964, selfPID=7188, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8180, iMonCtr=2
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7976, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8116, selfPID=6856, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8400, selfPID=7352, iMonCtr=1
Model crash detected, will try to restart...
08:50:33 (7420): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7700, selfPID=7220, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7936, iMonCtr=2
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7220, iMonCtr=2
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8160, iMonCtr=2
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6952, iMonCtr=
2
del crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6736, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=4452, selfPID=7924, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7660, selfPID=7064, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7884, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5288, selfPID=7200, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7712, iMonCtr=2
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7852, iMonCtr=2
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7500, iMonCtr=2
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7888, iMonCtr=2Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7720, iMonCtr=2
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6680, iMonCtr=2
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7900, selfPID=6944, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7760, iMonCtr=2
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7828, selfPID=7136, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7220, iMonCtr=2
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7764, iMonCtr=2
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7380, iMonCtr=2
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7364, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7532, selfPID=6464, iMonCtr=1
Model crash detected, will try to restart...
12:58:41 (7616): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
12:58:42 (7616): No heartbeat from core client for 30 sec - exiting
12:58:43 (7616): No heartbeat from core client for 30 sec - exiting
12:58:45 (7616): No heartbeat from core client for 30 sec - exiting
12:58:46 (7616): No heartbeat from core client for 30 sec - exiting
12:58:47 (7616): No heartbeat from core client for 30 sec - exiting
12:58:48 (7616): No heartbeat from core client for 30 sec - exiting
12:59:04 (1548): Can't acquire lockfile (32) - waiting 35s
13:14:06 (1548): No heartbeat from core client for 30 sec - exiting
CPDN Monitor - No 'heartbeat' from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
CPDN Monitor - Quit request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7888, iMonCtr=2
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7632, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6672, selfPID=7496, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
GGlobal Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=8020, iMonCtr=2
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7804, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7108, iMonCtr=2
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8156, selfPID=7512, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7628, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6804, iMonCtr=2
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=6888, iMonCtr=2
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7668, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7612, iMonCtr=2
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=5936, selfPID=6268, iMonCtr=1
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7424, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7612, selfPID=6812, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=8584, selfPID=6924, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=6752, selfPID=7816, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7264, iMonCtr=2
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=7868, iMonCtr=2
Suspended CPDN Monitor - Suspend request from BOINC...
Global Worker:: CPDN process is not running, exiting, bRetVal = 0, checkPID=0, selfPID=6432, iMonCtr=1
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=0, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7444, selfPID=7496, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
Called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
  <file_name>hadam3p_anz_f0ab_2012_1_009776408_0_10.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_anz_f0ab_2012_1_009776408_0_11.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadam3p_anz_f0ab_2012_1_009776408_0_12.zip</file_name>
  <error_code>-161 (not found)</error_code>
</file_xfer_error>

</message>
]]>
Latest Trickles Received
Time Sent (UTC) Host ID Result ID Result Name Timestep CPU Time (sec) Average (sec/TS)
23 Aug 2015 18:54:44 1075479 18341792 hadam3p_anz_f0ab_2012_1_009776408_0 103,979 430,227 4.1376
10 Aug 2015 14:25:20 1075479 18341792 hadam3p_anz_f0ab_2012_1_009776408_0 92,459 382,182 4.1335
29 Jul 2015 18:48:49 1075479 18341792 hadam3p_anz_f0ab_2012_1_009776408_0 80,939 334,440 4.1320
16 Jul 2015 20:37:42 1075479 18341792 hadam3p_anz_f0ab_2012_1_009776408_0 69,419 286,761 4.1309
06 Jul 2015 18:51:04 1075479 18341792 hadam3p_anz_f0ab_2012_1_009776408_0 57,899 239,194 4.1312
24 Jun 2015 17:54:48 1075479 18341792 hadam3p_anz_f0ab_2012_1_009776408_0 46,379 191,925 4.1382
10 Jun 2015 20:04:52 1075479 18341792 hadam3p_anz_f0ab_2012_1_009776408_0 34,859 145,838 4.1837
26 May 2015 13:41:20 1075479 18341792 hadam3p_anz_f0ab_2012_1_009776408_0 23,339 99,331 4.2560
08 May 2015 19:12:10 1075479 18341792 hadam3p_anz_f0ab_2012_1_009776408_0 11,819 49,044 4.1496


©2024 climateprediction.net