climateprediction.net home page
exit code -5 (0xfffffffb)

exit code -5 (0xfffffffb)

Questions and Answers : Windows : exit code -5 (0xfffffffb)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
old_user12929

Send message
Joined: 5 Sep 04
Posts: 8
Credit: 374,329
RAC: 0
Message 5478 - Posted: 19 Oct 2004, 11:26:00 UTC

After 956 hrs. CPU time I get this:

2004-10-19 02:13:40 - Unrecoverable error for result 1rn7_000103326_0 ( - exit code -5 (0xfffffffb))
2004-10-19 02:13:40 - Deferring communication with project for 1 minutes and 0 seconds
2004-10-19 02:13:40 - Computation for result 1rn7_000103326 finished
2004-10-19 02:13:43 - Started upload of 1rn7_000103326_0_1.zip
(all zip files uploaded)
New model downloaded and started.

AMD Athlon MP-1800+ (dual) (no overclock)
512M RAM
No screensaver used.
Boinc Ver. 4.09
2 models running 100%
Win 2000 Pro



Any clue as to why?




ID: 5478 · Report as offensive     Reply Quote
crandles
Volunteer moderator

Send message
Joined: 16 Oct 04
Posts: 692
Credit: 277,679
RAC: 0
Message 5487 - Posted: 19 Oct 2004, 17:49:34 UTC
Last modified: 19 Oct 2004, 17:51:52 UTC

Thyme Lawn is frequently making posts like this:

Exit code -5 is a catch all error code for computation errors. CPDN stresses your hardware more than anything else that you're likely to run on your system, and the most frequent causes of these errors are overclocking, overheating and flakey hardware.

You might like to check out UK_Nick's hardware maintenance and hardware tests and checks stickies on the phpBB forums.
http://www.climateprediction.net/board/viewtopic.php?t=2124

http://www.climateprediction.net/board/viewtopic.php?t=2126

I have seen that you wrote no overclock, but just because there is no overclock, this does not guarantee 100% stability.
ID: 5487 · Report as offensive     Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 31 Aug 04
Posts: 145
Credit: 2,021,020
RAC: 816
Message 5490 - Posted: 19 Oct 2004, 18:41:33 UTC - in response to Message 5487.  
Last modified: 19 Oct 2004, 18:51:56 UTC

> Thyme Lawn is frequently making posts like this:
>
> Exit code -5 is a catch all error code for computation errors. CPDN stresses
> your hardware more than anything else that you're likely to run on your
> system, and the most frequent causes of these errors are overclocking,
> overheating and flakey hardware.
>
> You might like to check out UK_Nick's hardware maintenance and hardware tests
> and checks stickies on the phpBB forums.
> http://www.climateprediction.net/board/viewtopic.php?t=2124
>
> http://www.climateprediction.net/board/viewtopic.php?t=2126
>
> I have seen that you wrote no overclock, but just because there is no
> overclock, this does not guarantee 100% stability.
>

I am running a non overclocked 2.53GHz P-IV. It ran my first model fine for about 200 hours before it crashed because of a file error - nothing to do with -5. The second model I got ran for 414 hours before crashing with a -5. Since then, I have been sent 7 models, all of them have failed with a -5, and all within 10 hours, sometimes a lot less then 10 hours.

During the same period, my machine has successfully crunched Seti@Home units without error, and LHC@Home units without error. Many of both in fact. These are also CPU intensive tasks. My own work on the system runs extremely CPU intense neural models - again without error.

I don't know if it is significant, but my BOINC software was upgraded to 4.13 a few days back, not sure exactly when.

It is always easy to blame peoples hardware, but there could be more to it then that remember. Why, if my h/w is flaky, was it not before, and is project specifically flaky?

As advised in http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=1081 I have downloaded and am running the Super PI torture software.
ID: 5490 · Report as offensive     Reply Quote
Profile Tony Wilson

Send message
Joined: 31 Aug 04
Posts: 5
Credit: 241,338
RAC: 0
Message 5504 - Posted: 20 Oct 2004, 6:52:00 UTC - in response to Message 5490.  
Last modified: 20 Oct 2004, 7:14:18 UTC

I was also having this problem last week. As suggested in
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=1092,
I have set the BOINC 4.13 preferences to leave the swapped out processes in memory.

This might not be the cure but it has not crashed since.

Tony.
ID: 5504 · Report as offensive     Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 31 Aug 04
Posts: 145
Credit: 2,021,020
RAC: 816
Message 5507 - Posted: 20 Oct 2004, 7:59:03 UTC
Last modified: 20 Oct 2004, 9:24:00 UTC

I have now run all versions of the Super PI program, results here...

+ 000h 00m 01s [ 16K]
+ 000h 00m 01s [ 32K]
+ 000h 00m 02s [ 64K]
+ 000h 00m 05s [ 128K]
+ 000h 00m 14s [ 256K]
+ 000h 00m 33s [ 512K]
+ 000h 01m 18s [ 1M]
+ 000h 03m 04s [ 2M]
+ 000h 07m 21s [ 4M]
+ 000h 16m 06s [ 8M]
+ 000h 37m 06s [ 16M]
+ 002h 53m 34s [ 32M]

... as you can see, my "flaky" hardware has no difficulty with BOINC's suggested stability test program. During the night, I have had another model from here go over with -5. At the same time it has happily crunched SETI units without fuss, (no work from LHC).

I think I have the faint aroma of software staff saying "it must be hardware" in my nostrils. As a professional software engineer, I learnt to spot that many years ago!!!

*** EDIT ***

Noticed another oddity in the log...

climateprediction.net - 2004-10-19 23:12:11 - Result 2xpm_100158384_0 exited with zero status but no 'finished' file
climateprediction.net - 2004-10-19 23:12:11 - If this happens repeatedly you may need to reset the project.
climateprediction.net - 2004-10-19 23:12:13 - Restarting result 2xpm_100158384_0 using hadsm3 version 4.04

... don't recall seeing that bfore.
ID: 5507 · Report as offensive     Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 31 Aug 04
Posts: 145
Credit: 2,021,020
RAC: 816
Message 5511 - Posted: 20 Oct 2004, 13:09:13 UTC

Okay, in this dialogue, I was setting my client preffs to "leave in memory"...

climateprediction.net - 2004-10-20 10:38:21 - Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
climateprediction.net - 2004-10-20 10:38:24 - Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded
climateprediction.net - 2004-10-20 10:38:24 - General preferences have been updated
--- - 2004-10-20 10:38:24 - General prefs: from climateprediction.net (last modified 2004-10-20 10:37:16)
--- - 2004-10-20 10:38:24 - General prefs: using your defaults

... when the wu was next swapped, it shows...

climateprediction.net - 2004-10-20 11:31:19 - Pausing result 2xpm_100158384_0 (left in memory)
LHC@home - 2004-10-20 11:31:20 - Starting result v64lhc1000prothree11s8_1051.62_1_sixvf_18_3 using sixtrack version 4.46

... climateprediction staying in memory and an LHC wu starting. Later...

SETI@home - 2004-10-20 12:59:20 - Pausing result 26ap04aa.4387.5521.959636.44_4 (left in memory)
climateprediction.net - 2004-10-20 12:59:20 - Resuming result 2xpm_100158384_0 using hadsm3 version 4.04
climateprediction.net - 2004-10-20 13:12:17 - Unrecoverable error for result 2xpm_100158384_0 ( - exit code -5 (0xfffffffb))

... a Seti wu finishes, (left in memory), and the climatepredictor resumes, shortly later it fails with the -5 error.

Since I seem to be acheiving nothing here at the moment, I have detachted from the project. I will monitor the board and re-attatch as soon as it is fixed, or if anyone wants me to try anything, feel free to contact me, I have subscribed to this thread so will get an e-mail.
ID: 5511 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 5512 - Posted: 20 Oct 2004, 13:39:03 UTC - in response to Message 5511.  

It could be a hard drive with errors, if one (out of 200) files that CPDN tries to open has an error, and an error on a retry, than you will get a -5. I don't think there is any program out there (other than "scandisk" or "defrag") that will test your hard drive at the level that CPDN needs.
ID: 5512 · Report as offensive     Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 31 Aug 04
Posts: 145
Credit: 2,021,020
RAC: 816
Message 5513 - Posted: 20 Oct 2004, 16:07:51 UTC - in response to Message 5512.  
Last modified: 20 Oct 2004, 18:59:11 UTC

> It could be a hard drive with errors, if one (out of 200) files that CPDN
> tries to open has an error, and an error on a retry, than you will get a -5.
> I don't think there is any program out there (other than "scandisk" or
> "defrag") that will test your hard drive at the level that CPDN needs.
>

Hi Carl,

I use Maxtor disks, and have a standalone bootable floppy with their own "PowerMax" diagnostics. Most disk manufacturers have similar tools. These are often more exhaustive then the OS tools. They are downloadable from their web sites. Suffice to say, my disks are fine.

My problems /seemed/ to start when I upgraded to the 4.13 client, although I can't be sure. I happened to notice my 400+ hour unit had crashed and I had a new one. It wasn't until yesterday that I looked at it and thought, "hmmm, that hasn't got very far", I was about to tweak the project settings to give CPDN more CPU when I saw all the failed units.

If the wu supply from SAH and LHC dries up, I'll reload 4.09 and see if it makes any difference.

PowerMax is here...

http://www.maxtor.com/en/support/downloads/powermax.html

... works with Quantum drives as well. There is a seperate version called SCSIMax if you have SCSI disks.
ID: 5513 · Report as offensive     Reply Quote
Profile old_user156
Avatar

Send message
Joined: 5 Aug 04
Posts: 186
Credit: 1,612,182
RAC: 0
Message 5514 - Posted: 20 Oct 2004, 17:36:45 UTC

Tracy has done two good runs and then dumped her third run half way through with an 'error code -5'. I restarted that run (Result ID 276662) from my backup, twice, and it failed again with the same error code both times after running for some hours, even though I slowed her down some.

The really weird thing is she <i>didn't</i> run the same length of time since the backup each time. :? The original run crashed 10.5 hours after the backup was made, second run 8.25 hours and third run 11.4 hours. Only thing I can think of that <i>might</i> do that is a dud sector on the hard disk that wasn't accessed at quite the same time each run - so I ran checkdisk with 'scan for and attempt recovery of bad sectors' checked but I didn't see any errors. :?

<a href="http://www.nmvs.dsl.pipex.com/"><img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=6&amp;team=off&amp;trans=off"></a>
ID: 5514 · Report as offensive     Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 31 Aug 04
Posts: 145
Credit: 2,021,020
RAC: 816
Message 5515 - Posted: 20 Oct 2004, 19:01:06 UTC

What client version are you using Nick?
ID: 5515 · Report as offensive     Reply Quote
Profile old_user156
Avatar

Send message
Joined: 5 Aug 04
Posts: 186
Credit: 1,612,182
RAC: 0
Message 5521 - Posted: 20 Oct 2004, 22:36:08 UTC

BOINC v4.13, CP hadsm3 v4.04

<a href="http://www.nmvs.dsl.pipex.com/"><img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=6&amp;team=off&amp;trans=off"></a>
ID: 5521 · Report as offensive     Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 31 Aug 04
Posts: 145
Credit: 2,021,020
RAC: 816
Message 5529 - Posted: 21 Oct 2004, 7:49:16 UTC

Okay, that is the same BOINC and CPDN client I have. I'm running XP SP2.

On the BOINC board, they have suggested simply detaching and re-attaching with the same set-up rather then trying the 4.09 experiment. I'll try that first, but I really don't want to commit useless hours upon hours of CPU time just to generate error messages, (if I wanted that, I'd just leave my neural network trainer running 24/7!!!).
ID: 5529 · Report as offensive     Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 31 Aug 04
Posts: 145
Credit: 2,021,020
RAC: 816
Message 5540 - Posted: 21 Oct 2004, 19:02:16 UTC

Okay, si I re-attached to the project and got a new wu still running the 4.13 core as suggested, it ran for about 3 CPU hours then fell over with the -5 error. Before re-attaching, I ran another set of systems diags, a full disk scan and a SiSoft Sandra to look for any oddities. The system was clean.

I've detached again.
ID: 5540 · Report as offensive     Reply Quote
Profile david gunnells

Send message
Joined: 1 Sep 04
Posts: 9
Credit: 549,543
RAC: 0
Message 5771 - Posted: 30 Oct 2004, 0:29:31 UTC

I just saw this tonight:

climateprediction.net - 2004-10-29 19:45:41 - Unrecoverable error for result 2tyf_100153469_0 ( - exit code -5 (0xfffffffb))

~150 hours into it.

Even though I don't recall seeing this error before, I'm going to run PowerMax on my Maxtor HD and Super PI and get back to this forum...

david
ID: 5771 · Report as offensive     Reply Quote
old_user2101

Send message
Joined: 27 Aug 04
Posts: 3
Credit: 118,160
RAC: 0
Message 7694 - Posted: 27 Jan 2005, 5:47:02 UTC

I also had this problem. But since I set in global preferences 50 seconds time interval to write to disk I didn't have such error.
ID: 7694 · Report as offensive     Reply Quote
old_user5480

Send message
Joined: 31 Aug 04
Posts: 3
Credit: 318,314
RAC: 0
Message 8359 - Posted: 1 Feb 2005, 19:24:34 UTC - in response to Message 5487.  

I also got this problem. My machine is athlon core barthon at 2.5 , 1 gb ram, assus mb. I'm running win xp sp2 and boinc 4.19 . Climate prediction crashed after around 190 hours of running the model.
Here is my log:

limateprediction.net - 2005-02-01 20:53:14 - Deferring communication with project for 47 minutes and 10 seconds
climateprediction.net - 2005-02-01 21:30:01 - Unrecoverable error for result 3dzh_100179683_0 ( - exit code -5 (0xfffffffb))
climateprediction.net - 2005-02-01 21:30:01 - Deferring communication with project for 1 hours, 20 minutes, and 50 seconds
climateprediction.net - 2005-02-01 21:30:01 - Computation for result 3dzh_100179683 finished
climateprediction.net - 2005-02-01 21:30:01 - Started upload of 3dzh_100179683_0_1.zip
climateprediction.net - 2005-02-01 21:30:01 - Started upload of 3dzh_100179683_0_2.zip
climateprediction.net - 2005-02-01 21:30:07 - Finished upload of 3dzh_100179683_0_1.zip
climateprediction.net - 2005-02-01 21:30:07 - Throughput 256 bytes/sec
climateprediction.net - 2005-02-01 21:30:07 - Finished upload of 3dzh_100179683_0_2.zip
climateprediction.net - 2005-02-01 21:30:07 - Throughput 8050 bytes/sec
climateprediction.net - 2005-02-01 21:30:07 - Started upload of 3dzh_100179683_0_3.zip
climateprediction.net - 2005-02-01 21:30:07 - Started upload of 3dzh_100179683_0_4.zip
climateprediction.net - 2005-02-01 21:30:13 - Finished upload of 3dzh_100179683_0_3.zip
climateprediction.net - 2005-02-01 21:30:13 - Throughput 258 bytes/sec
climateprediction.net - 2005-02-01 21:30:13 - Finished upload of 3dzh_100179683_0_4.zip
climateprediction.net - 2005-02-01 21:30:13 - Throughput 258 bytes/sec
climateprediction.net - 2005-02-01 21:30:13 - Started upload of 3dzh_100179683_0_5.zip
climateprediction.net - 2005-02-01 21:30:19 - Finished upload of 3dzh_100179683_0_5.zip
climateprediction.net - 2005-02-01 21:30:19 - Throughput 8050 bytes/sec

ID: 8359 · Report as offensive     Reply Quote
Profile Friedrich S.

Send message
Joined: 22 Jan 05
Posts: 38
Credit: 3,914,110
RAC: 4,770
Message 9054 - Posted: 10 Feb 2005, 2:07:17 UTC

Hello,

I now can join the "Team of -5", too. On my Pentium 4, 2.8 GHz HT with BOINC 4.13 &amp; CPDN I just lost a model just after the 6th trickle.

Isn't there a way to deal with it by the way the files are written in CPDN?
A strategy like:
1) Write data to temp file.
2) Verify
3) Rename if verify successful, otherwise go back to 1).
4) Build model slowly in incremental files (e.g. every trickle).

And on load:
1) Load file.
2) Verify.
3) Reload if unsuccsessful
4) If still unsuccsessful, load earlier time steps (of incremental files mentioned above) until you reach a stable point and restart from there.

That way you would loose a trickle rather than the whole model. and it would be easier to recover, e.g. by playing back a earlier backup.

Friedrich


I love CPDN!
--
ID: 9054 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 9063 - Posted: 10 Feb 2005, 7:47:43 UTC

The data gets written to the files lots of times per trickle, the default is 60 seconds.
And there ARE some built-in safeguards, but the programmers couldn't cover everything.

I suspect that a lot of crashes, (of all types), are caused by a hardware hickup.
People running Linux, for instance often get file errors because they use a network drive for data,
and their system can't cope with the frequent data bursts.

Also, -5 is a "catch all" error message, so it isn't necessarily a file write.
Sometimes it's caused by a negative pressure in one of the cells.

I had a -5 crash on my 1st model, and all the other 7 were successful. Plus 1 with which I had an accident and couldn't recover. (mumble, mutter).
But there have been well over 23,0000 BOINC runs completed successfully.

There are several threads about success rates on the phpBB, (which is down), and one of the admins said the ratio
is about 1 in 7 successful, so don't get too discouraged.

Les

ID: 9063 · Report as offensive     Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 31 Aug 04
Posts: 145
Credit: 2,021,020
RAC: 816
Message 9066 - Posted: 10 Feb 2005, 8:16:01 UTC
Last modified: 10 Feb 2005, 8:22:44 UTC

&gt;&gt;&gt; the ratio is about 1 in 7 successful, so don't get too discouraged.

I understand the sentiment, but would point out, that in my case at least, hundreds of hours have been consumed by CPDN models which have failed to finish. That same hundreds of hours could have been used for very many successful SETI, LHC and Predictor units.

If CPDN is the "flaky" element, then it needs to resolve that to keep people on board.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 9066 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 9076 - Posted: 10 Feb 2005, 11:11:21 UTC

adrianxw

Personally, I think BOINC is the flakey part, especially when used to switch between multiple projects.
I don't think Berkley has gotten the switching part quite right. As well as all the known bugs still to be fixed.
They have come up with versions to fix problems with some of the other projects when used with CPDN,
so maybe they need to look at CPDNs requirements. Something like: I'm going to switch now, but this is CPDN,
so I need to wait until just after a save point, and THEN switch.

Les
ID: 9076 · Report as offensive     Reply Quote
1 · 2 · Next

Questions and Answers : Windows : exit code -5 (0xfffffffb)

©2024 climateprediction.net