Task 15541659

Name	hadcm3n_o0ed_2140_40_008282550_0
Workunit	8433685
Created	14 Jan 2013, 3:02:19 UTC
Sent	14 Jan 2013, 3:02:23 UTC
Report deadline	15 Apr 2013, 10:29:34 UTC
Received	11 Feb 2013, 22:10:25 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	22 (0x00000016) Unknown error code
Computer ID	1255354
Run time	19 days 17 hours 38 min 17 sec
CPU time	16 days 19 hours 29 min 46 sec
Validate state	Invalid
Credit	9,331.20
Device peak FLOPS	2.45 GFLOPS
Application version	UK Met Office Coupled Model Full Resolution Ocean v6.07 windows_intelx86
Stderr	<core_client_version>7.0.28</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> 15:54:32 (5640): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4460, iMonCtr=1 Model crash detected, will try to restart... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4636, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4692, iMonCtr=1 Model crash detected, will try to restart... 16:05:29 (3008): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5304, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5236, iMonCtr=1 Model crash detected, will try to restart... 12:28:54 (5264): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:31:01 (4592): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... 20:59:55 (5704): Can't acquire lockfile (32) - waiting 35s 21:00:16 (6364): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4584, iMonCtr=1 Model crash detected, will try to restart... 12:35:43 (4500): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... CPDN Monitor - Quit request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5520, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4152, iMonCtr=1 Model crash detected, will try to restart... 15:33:18 (4624): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4268, iMonCtr=1 Model crash detected, will try to restart... 00:41:33 (4700): No heartbeat from core client for 30 sec - exiting CPDN Monitor - No 'heartbeat' from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4936, iMonCtr=1 Model crash detected, will try to restart... CPDN Monitor - Quit request from BOINC... MainError: 08:44:03 AM No files match the supplied pattern. MainError: 08:44:03 AM No files match the supplied pattern. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4668, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4356, iMonCtr=1 Model crash detected, will try to restart... Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4600, iMonCtr=1 Model crash detected, will try to restart... MainError: 02:47:52 AM No files match the supplied pattern. MainError: 02:47:52 AM No files match the supplied pattern. Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5904, iMonCtr=1 Model crash detected, will try to restart... MainError: 03:20:45 PM No files match the supplied pattern. MainError: 03:20:45 PM No files match the supplied pattern. MainError: 05:11:25 AM No files match the supplied pattern. MainError: 05:11:25 AM No files match the supplied pattern. MainError: 09:01:56 PM No files match the supplied pattern. MainError: 09:01:56 PM No files match the supplied pattern. MainError: 12:55:46 AM No files match the supplied pattern. MainError: 12:55:46 AM No files match the supplied pattern. CPDN Monitor - Quit request from BOINC... MainError: 04:19:55 AM No files match the supplied pattern. MainError: 04:19:55 AM No files match the supplied pattern. MainError: 08:44:09 PM No files match the supplied pattern. MainError: 08:44:09 PM No files match the supplied pattern. MainError: 12:14:13 AM No files match the supplied pattern. MainError: 12:14:13 AM No files match the supplied pattern. Suspended CPDN Monitor - Suspend request from BOINC... MainError: 04:28:50 AM No files match the supplied pattern. MainError: 04:28:50 AM No files match the supplied pattern. Error converting file to netcdf: dataout/o0edka.ph11c10 Error converting file to netcdf: dataout/o0edka.pg11c10 Error converting file to netcdf: dataout/o0edka.pe11c10 MainError: 07:58:44 PM No files match the supplied pattern. MainError: 07:58:44 PM No files match the supplied pattern. BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 BUFFIN: C I/O Error feof - Unit 64 - Return code = 16 BUFFIN: C I/O Error feof - Unit 66 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 BUFFIN: C I/O Error feof - Unit 68 - Return code = 16 BUFFIN: C I/O Error feof - Unit 69 - Return code = 16 BUFFIN: C I/O Error feof - Unit 67 - Return code = 16 Model crashed: STWORK : I/O error - PP fixed length header tmp/pipe_dummy 2048 Sorry, too many model crashes! :-( Called boinc_finish </stderr_txt> ]]>

Latest Trickles Received
Time Sent (UTC)	Host ID	Result ID	Result Name	Timestep	CPU Time (sec)	Average (sec/TS)
11 Feb 2013 20:09:13	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	777,600	1,455,213	1.8714
11 Feb 2013 05:15:25	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	751,680	1,405,175	1.8694
10 Feb 2013 13:05:10	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	725,760	1,355,326	1.8675
09 Feb 2013 20:47:27	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	699,840	1,305,691	1.8657
09 Feb 2013 04:20:47	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	673,920	1,256,376	1.8643
08 Feb 2013 19:50:48	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	648,000	1,208,456	1.8649
07 Feb 2013 21:06:09	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	622,080	1,159,630	1.8641
07 Feb 2013 05:11:47	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	596,160	1,111,045	1.8637
06 Feb 2013 15:30:48	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	570,240	1,065,364	1.8683
04 Feb 2013 17:16:18	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	544,320	1,017,309	1.8690
03 Feb 2013 08:47:49	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	518,400	967,768	1.8668
01 Feb 2013 22:33:08	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	492,480	918,226	1.8645
01 Feb 2013 06:51:27	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	466,560	868,779	1.8621
30 Jan 2013 21:33:18	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	440,640	819,400	1.8596
29 Jan 2013 14:25:23	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	414,720	772,057	1.8616
29 Jan 2013 14:25:23	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	388,800	725,862	1.8669
29 Jan 2013 14:25:23	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	362,880	678,789	1.8706
27 Jan 2013 15:47:23	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	336,960	630,651	1.8716
26 Jan 2013 22:48:49	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	311,040	581,049	1.8681
26 Jan 2013 06:44:34	1255354	15541659	hadcm3n_o0ed_2140_40_008282550_0	285,120	532,747	1.8685