climateprediction.net home page
Posts by Cartoonman

Posts by Cartoonman

1) Message boards : Number crunching : Crashed WU (Message 44286)
Posted 2 Jun 2012 by Cartoonman
Post:
I have a WU that crashed just short of the 50% mark (about 49.8). Forunately, I made a backup merely 10 minutes before the crash. However, everytime I load the backup of the BOINC folder, the WU appears to run fine (gaining progress and making data), but then 10 minutes into the WU it crashes, apparently at the same spot.

I also noticed that the error code on this one was significantly different from the usual ones I had:
 <core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
 - exit code 193 (0xc1)
</message>
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...
03:11:14 (1500): No heartbeat from core client for 30 sec - exiting
Suspended CPDN Monitor - No 'heartbeat' from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5396, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2148, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3060, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Quit request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1668, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=4060, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3492, iMonCtr=1
Model crash detected, will try to restart...
CPDN Monitor - Quit request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=2116, iMonCtr=1
Model crash detected, will try to restart...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3988, iMonCtr=1
Model crash detected, will try to restart...
CController:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3412, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=3956, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Quit request from BOINC...


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x778E5EAB read attempt to address 0x00000000

Engaging BOINC Windows Runtime Debugger...

Cannot serialize file C:\ProgramData\BOINC/projects/climateprediction.net/hadcm3n_o7n8_2060_40_007998308/dataout/shmem_restart.day
Signal 11 received, exiting...
Called boinc_finish

</stderr_txt>
]]>


I have an older backup, but it's quite old, and will take a rather long time to reach it's previous state. Is it worth doing it, or is this WU actually broken?
2) Message boards : Number crunching : Upload Failure (Message 44100)
Posted 26 Apr 2012 by Cartoonman
Post:
I honestly have no issue keeping a few tasks in the transfer dock. Just dont abort the transfer. Wait till the upload server is online, thats all. Given that these WU's take a very long time to complete anyhow, it shouldn't affect your WU flow. Most likely, by the time the WU's in the queue are done, the upload server will be online. If not, just temporarily switch to another project. That doesn't require any effort.




©2024 climateprediction.net