climateprediction.net home page
More exit code -5 and status messages

More exit code -5 and status messages

Questions and Answers : Windows : More exit code -5 and status messages
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2168
Credit: 64,538,512
RAC: 6,619
Message 1389 - Posted: 21 Aug 2004, 14:56:58 UTC
Last modified: 21 Aug 2004, 14:58:13 UTC

Well, halfway through the last trickle of phase 2 (while I was asleep), my

AMD64 3200+
512 MB DDR400
WinXP Home

system uploaded it's model with exit code -5. Which, according to this thread
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=187
is the catchall error for "model crashed".

This surprised me no end as going into BOINC with this computer, I had run Prime95 and memtest86+ each for 24 hrs straight without errors (as well as several classic CPDN runs). Temperature in the room at the time of the crash was about 24C. The status messages that are recoverable by the user are very limited. See below.

It would be nice to actually see the status messages of the details of the problem, the attempts to rewind (and to what point they rewound), etc. The status messages viewable by the the user from the "classic" model were more verbose.

2004-08-21 01:37:26 - Unrecoverable error for result 03qe_000029823_0 ( - exit code -5 (0xfffffffb))
2004-08-21 01:37:26 - Deferring communication with project for 1 minutes and 0 seconds
2004-08-21 01:37:26 - Computation for result 03qe_000029823 finished
2004-08-21 01:37:26 - Started upload of 03qe_000029823_0_1.zip
...more uploading messages...
ID: 1389 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 1393 - Posted: 21 Aug 2004, 16:47:39 UTC
Last modified: 21 Aug 2004, 17:01:19 UTC

that's too bad, it can't be an AMD64 thing because mine just went into phase 3 today when I was out on a hike. The "fun error messages" are in the yabsd.out which is part of the upload on a crash; I checked yours out on the server and it looks like what honza got:

NEGATIVE PRESSURE AT POINT 2780
NEGATIVE PRESSURE AT POINT 2781
NEGATIVE PRESSURE AT POINT 2782
NEGATIVE PRESSURE AT POINT 2783
*********************************************************************************
Model aborted with error code - 1 Routine and message:-
P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED.
*********************************************************************************

I'm going to cross-reference to see if I can validate this is from a parameter set that may have caused the crash, or something else.

The bad thing is it doesn't seem to have rewound first the day, then month, then year as it should (and I don't believe honza's run did either). I suppose you wouldn't have a backup of this run before it crashed & uploaded do you?
ID: 1393 · Report as offensive     Reply Quote
Profile Honza
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 390
Credit: 2,475,242
RAC: 0
Message 1399 - Posted: 21 Aug 2004, 18:18:21 UTC - in response to Message 1393.  
Last modified: 21 Aug 2004, 18:22:48 UTC

> I'm going to cross-reference to see if I can validate this is from a parameter set that may have caused the crash, or something else.
>
> The bad thing is it doesn't seem to have rewound first the day, then month, then year as it should (and I don't believe honza's run did either). I suppose you wouldn't have a backup of this run before it crashed & uploaded do you?
>

Hi all,
i'm trying another BOINC model on my main machine so i can better monitor it's progress and behaviour. I also don't think that my recent 'exit code -5' models performed any rewind. I wish Martin's CPFarmView worked under BOINC - we would have been clear about rewinding (or extreme climate).
So far so far (only first trickle) but... this WU 8681 has strange cold cells over Africa.
I'm again going to regular backup scenario like in early classic beta last year - thanks Carl for reminding.


<IMG src="http://cpdn.tuxie.org/honzacholt/CPDN_BOINC/BOINC_8681_Cold.png">

<img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=56&amp;trans=off">
ID: 1399 · Report as offensive     Reply Quote
old_user355

Send message
Joined: 7 Aug 04
Posts: 187
Credit: 44,163
RAC: 0
Message 1405 - Posted: 21 Aug 2004, 18:56:24 UTC - in response to Message 1393.  

&gt; that's too bad, it can't be an AMD64 thing because mine just went into phase 3
&gt; today when I was out on a hike.

Another AMD64 verification. My machine is 11 trickles into phase 3. (64 hours to completion)

<a><img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=18"></a>
ID: 1405 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2168
Credit: 64,538,512
RAC: 6,619
Message 1411 - Posted: 21 Aug 2004, 20:05:24 UTC - in response to Message 1393.  

&gt; today when I was out on a hike. The "fun error messages" are in the yabsd.out
&gt; which is part of the upload on a crash; I checked yours out on the server and

Thanks Carl. Could the yabsd.out be part of the archived data on the user's PC? Maybe it's cryptic and most people wouldn't want to see it, but I'm sure there are a few that would. Perhaps (not likely) something could be figured out by the users looking at these things.

&gt; The bad thing is it doesn't seem to have rewound first the day, then month,
&gt; then year as it should (and I don't believe honza's run did either).

That's what I figured, but I wasn't sure.

&gt; suppose you wouldn't have a backup of this run before it crashed &amp;
&gt; uploaded do you?
&gt;
Unfortunately no. You get used to stability and take it for granted. Time for a reality check. ;)
ID: 1411 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2168
Credit: 64,538,512
RAC: 6,619
Message 1412 - Posted: 21 Aug 2004, 20:08:24 UTC - in response to Message 1405.  

&gt; &gt; that's too bad, it can't be an AMD64 thing because mine just went into
&gt; phase 3
&gt; &gt; today when I was out on a hike.
&gt;
&gt; Another AMD64 verification. My machine is 11 trickles into phase 3. (64 hours
&gt; to completion)

I didn't figure it was because in the link I posted, the victim/offending PCs were P4s.
ID: 1412 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 1414 - Posted: 21 Aug 2004, 20:43:40 UTC - in response to Message 1412.  
Last modified: 21 Aug 2004, 20:44:30 UTC

OK, I discovered the silly error I made that stopped you guys from rewinding, so that shouldn't happen again. It will do the model-day/month/year rewind on a crash provided you were far enough for month/year of course. So my apologies for that, it should haven given "another chance" although I'm not sure if it wouldn't have just hit that timestep with the "negative pressure" and crashed anyway.

I'm a little mixed up though because in your original post it was the AMD64 that crashed, right? And I think that's what Honza is using.
ID: 1414 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2168
Credit: 64,538,512
RAC: 6,619
Message 1415 - Posted: 21 Aug 2004, 20:54:04 UTC - in response to Message 1414.  
Last modified: 21 Aug 2004, 20:55:00 UTC

&gt; OK, I discovered the silly error I made that stopped you guys from rewinding,
&gt; so that shouldn't happen again. It will do the model-day/month/year rewind on
&gt; a crash provided you were far enough for month/year of course. So my
&gt; apologies for that, it should haven given "another chance" although I'm not
&gt; sure if it wouldn't have just hit that timestep with the "negative pressure"
&gt; and crashed anyway.
&gt;
Good to hear that the PC will be given another chance if errors occur. If it's a machine error, it might have gone through and continued. If a model parameter instability problem, then it would likely have just repeated the crash at the same point?

&gt; I'm a little mixed up though because in your original post it was the AMD64
&gt; that crashed, right? And I think that's what Honza is using.

It wasn't clear from Honza's first post what PC it was (since he has both P4s and an AMD64), but in his post in that thread from 17 Aug 2004 7:25:00 UTC it had to be a P4 since he was talking about downclocking to 3 GHz.

ID: 1415 · Report as offensive     Reply Quote
Profile Honza
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 390
Credit: 2,475,242
RAC: 0
Message 1452 - Posted: 22 Aug 2004, 8:52:35 UTC - in response to Message 1415.  

&gt; &gt; I'm a little mixed up though because in your original post it was the
&gt; AMD64
&gt; &gt; that crashed, right? And I think that's what Honza is using.
&gt;
&gt; It wasn't clear from Honza's first post what PC it was (since he has both P4s
&gt; and an AMD64), but in his post in that thread from 17 Aug 2004 7:25:00 UTC
&gt; it had to be a P4 since he was talking about downclocking to 3 GHz.
&gt;

Hi guys,

both machine running BOINC were P4, 3GHz, Win XP.
My AMD64 is crunching classic THC - now phase 4, during winter under 5C average, ETA final upload in 24 hours. I guess i will start another classic model there.
<img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=56&amp;trans=off">
ID: 1452 · Report as offensive     Reply Quote

Questions and Answers : Windows : More exit code -5 and status messages

©2024 climateprediction.net