climateprediction.net home page
HadCM3N - Error Messages on Completion

HadCM3N - Error Messages on Completion

Message boards : Number crunching : HadCM3N - Error Messages on Completion
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user596405

Send message
Joined: 4 Oct 09
Posts: 73
Credit: 7,242,427
RAC: 0
Message 42003 - Posted: 22 Apr 2011, 10:53:14 UTC

4 of my 5 HadCM3n full resolution ocean models apparently "completed" earlier today - Task Ids 12758267, 12740235, 12740230 and 12740229.

Each uploaded the final trickle at time step 1,036,800. The first 3 have status marked as "completed" but the 4th (12740229) is marked as "error while computing".
All 4 tasks ran constantly 24/7, none "failed" then restored from backup - the usual reason for a error status which does not get corrected when model eventually finishes.

All have identical stderr messages.

Link to the host's summary page - http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?hostid=1057969

My assumption is that all 4 are ok so the scientists will get clean results otherwise a waste of resource and time (started 21 days ago)!

All those stderr messages are, if irrelevant, confusing to say the least.


ID: 42003 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42006 - Posted: 22 Apr 2011, 15:30:12 UTC - in response to Message 42003.  

If you're talking about the "can't delete ..." parts, then that was/is just a permissions problem near the start of the installation of the models, or possibly during the backup restore.
They're all font files for the graphics displays.

As for "a waste of resource and time", the messages to look at are right near the top of each page:
Over
Success
Done


Backups: Here
ID: 42006 · Report as offensive     Reply Quote
Profile Greg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 42007 - Posted: 22 Apr 2011, 20:30:07 UTC - in response to Message 42003.  

12740229, the one that failed, has
Signal 11 received, exiting...
Called boinc_finish

</stderr_txt>
]]>
at the end of its stderr. The others don't.

The Boinc FAQ doesn't seem very helpful. Other pages that Google brought up suggest it could be a permissions problem, a buffer overflow bug, or even overheating! Signal 11 seems to be a catch-all.
ID: 42007 · Report as offensive     Reply Quote

Message boards : Number crunching : HadCM3N - Error Messages on Completion

©2024 climateprediction.net