climateprediction.net home page
Crashing CPDN client

Crashing CPDN client

Message boards : Number crunching : Crashing CPDN client
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user909

Send message
Joined: 17 Aug 04
Posts: 56
Credit: 63,814
RAC: 0
Message 1556 - Posted: 24 Aug 2004, 7:33:28 UTC
Last modified: 24 Aug 2004, 7:36:22 UTC

I just started crunching for CPDN today. Love the graphics with the globe you can spin :) Simple mind, simple pleasure I guess...

Anway, I have had 2 work units fail on me so far and I was wondering if anyone else has had units fail. Error message:
climateprediction.net - 2004-08-23 23:49:44 - Unrecoverable error for result 001a_200025031_1 ( - exit code -5 (0xfffffffb))

We had one helluva thunderstorm move through the area tonight (complete with tornado sirens and flash floods) and my computer locked up rather strangely during it so I'm kind of wondering if I got hit by some kind of surge despite my UPS.
<br>
------------------------------
A Member of The Knights Who Say Ni!
Yet another stats page: <a href="http://boinc-kwsn.no-ip.info">http://boinc-kwsn.no-ip.info</a>
ID: 1556 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2168
Credit: 64,548,452
RAC: 6,748
Message 1562 - Posted: 24 Aug 2004, 13:25:47 UTC - in response to Message 1556.  

&gt; Anway, I have had 2 work units fail on me so far and I was wondering if anyone
&gt; else has had units fail. Error message:
&gt; climateprediction.net - 2004-08-23 23:49:44 - Unrecoverable error for result
&gt; 001a_200025031_1 ( - exit code -5 (0xfffffffb))
&gt;
&gt; We had one helluva thunderstorm move through the area tonight (complete with
&gt; tornado sirens and flash floods) and my computer locked up rather strangely
&gt; during it so I'm kind of wondering if I got hit by some kind of surge despite
&gt; my UPS.
&gt;
Hi Toby,

You wouldn't happen to be from eastern Kansas would you? That's the same weather we got last night, resulting in both my CPDN computers going down.

Exit code -5 is a "catchall" error code. There was considerable discussion about it in the "Windows" forum of the "Questions and Problems" help forums.

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_forum.php?id=4
ID: 1562 · Report as offensive     Reply Quote
old_user909

Send message
Joined: 17 Aug 04
Posts: 56
Credit: 63,814
RAC: 0
Message 1567 - Posted: 24 Aug 2004, 17:08:10 UTC

Yep, Kansas it is. Manhattan to be exact. Just graduated from and am now working for KSU :) Those storms sure had their way with us though. I heard we got over 10 inches of rain within 8 hours. And the campus sprinkler system was going when I left work while there were flash flood warnings out all over the place. Horray for wizely spent tax dollars! I never heard a thing since I work in the basement of a very large building with many hundreds of servers with thousands of fans not to mention line printers from the 70s that actually have a warning sticker that says 'close the lids while printing or you may very well go deaf'.

Anyway, my computer locked up again last night. When I first upgraded my CPU I had some stability issues like this but I tweaked some settings and got it stable running seti@home. I'm wondering if CPDN uses some secret circuit in my CPU that seti@home never touched which is still not stable. This time, CPDN didn't exit with an error but computation started over from 0% while total CPU time was NOT reset and continued counting up from 2:40 or so. Very strange...

I'm guessing either my CPU isn't quite stable yet or that storm really fsked up my computer. I dropped the FSB slightly. See if that makes a difference. No, I'm not overclocking. Can't even get the damn thing stable at stock speed. Any other suggestions out there?
<br>
Yet another stats page: <a href="http://boinc-kwsn.no-ip.info">http://boinc-kwsn.no-ip.info</a>
ID: 1567 · Report as offensive     Reply Quote
ML1

Send message
Joined: 5 Aug 04
Posts: 21
Credit: 9,084,238
RAC: 0
Message 1568 - Posted: 24 Aug 2004, 17:24:38 UTC - in response to Message 1567.  

[...]
&gt; Anyway, my computer locked up again last night. When I first upgraded my CPU
&gt; I had some stability issues like this but I tweaked some settings and got it
&gt; stable running seti@home. I'm wondering if CPDN uses some secret circuit in
&gt; my CPU that seti@home never touched which is still not stable. This time,
[...]

Test your system with:
memtest86
GIMPS mprime 'torture test'
and just to be thorough, check your HDD with the manufacturer's disk check utility.

You get some reassurance that the machine is ok for whatever ambient conditions you did the tests under. If your machine later gets hotter or just more dirty, then it can still fail.

A UPS might be a good idea if your mains is flakey.

Good luck,
Martin

ID: 1568 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2168
Credit: 64,548,452
RAC: 6,748
Message 1569 - Posted: 24 Aug 2004, 17:27:38 UTC - in response to Message 1567.  

&gt; Yep, Kansas it is. Manhattan to be exact. Just graduated from and am now
&gt; working for KSU :) Those storms sure had their way with us though. I heard
&gt; we got over 10 inches of rain within 8 hours. And the campus sprinkler system
&gt; was going when I left work while there were flash flood warnings out all over
&gt; the place.

Yep, I was the one issuing the flash flood warnings for Riley county last night. What a night. I hope we don't have a repeat tonight. I worked way too many hours yesterday.
&gt;
&gt; Anyway, my computer locked up again last night. When I first upgraded my CPU
&gt; I had some stability issues like this but I tweaked some settings and got it
&gt; stable running seti@home. I'm wondering if CPDN uses some secret circuit in
&gt; my CPU that seti@home never touched which is still not stable. This time,
&gt; CPDN didn't exit with an error but computation started over from 0% while
&gt; total CPU time was NOT reset and continued counting up from 2:40 or so. Very
&gt; strange...
&gt;
This is exactly what happened to my BOINC model. Restarted a 0%, but kept the CPU time. Must have been a corruption of some status file when the power went out. I think I'll keep running it though, as there is no easy way to get a new model. The power outage did not in any way hurt a "classic" CPDN model running on a different PC. It just started right back up where it left off.

CPDN is more demanding than most, if not all, other distributed projects. May be the huge size of the code, or the fact that a true work unit (a model run) is so long and depends on good data from each previous timestep.
ID: 1569 · Report as offensive     Reply Quote
old_user909

Send message
Joined: 17 Aug 04
Posts: 56
Credit: 63,814
RAC: 0
Message 1576 - Posted: 24 Aug 2004, 19:18:16 UTC
Last modified: 24 Aug 2004, 19:20:13 UTC

Martin:
I did run memtest86 when I was first trying to the thing to even boot. Turns out my motherboard was set for the old processor and not supplying enough voltage to the chip. I fixed that in the BIOS. Memtest86 checked out fine after I got that running. I figured seti/predictor was a good torture test but I guess the major problem with that is that you don't know immediately know if there are errors in the FPU. I had been running for close to a week without a lockup when I switched to CPDN. I might give mprime a shot if it locks up again. But I know for sure nothing is overheating. I have sensors and alarms set up all over the place. If anything goes over 60C, the BIOS is set to cut power immediately.

And I do have a UPS although this house appears to not have any ground wires which seems to mess with its sensors. It has reported overloads and completely shut down during slight brownouts several times since moving in here. I am not drawing anywhere near its capacity.

geophi:
Just looked at your profile. Good work last night. We actually had a small flood in the machine room when water came in through an outside door. We had everyone from above us come hang out in the basement hallways during that tornado warning. Except for the stupid frat boys who decided they had to leave because they had some important meeting to go to.

Anyway... Thanks for the suggestions/information. Guess I'll just have to see what happens and possibly start some in-depth trouble shooting. I just hope CPDN gets to the first trickle this time so I get some credit and get listed in the team roster :) TS 3067 now. Knock on wood!
<br>
--------------------------
A Member of The Knights Who Say Ni!
Yet another stats page: <a href="http://boinc-kwsn.no-ip.info">http://boinc-kwsn.no-ip.info</a>
ID: 1576 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 1578 - Posted: 24 Aug 2004, 19:30:50 UTC - in response to Message 1576.  

toby, you may want to wait a day, hopefully the launch version fixes some of the "-5 errors" people have been getting.
ID: 1578 · Report as offensive     Reply Quote
old_user909

Send message
Joined: 17 Aug 04
Posts: 56
Credit: 63,814
RAC: 0
Message 1598 - Posted: 25 Aug 2004, 8:37:05 UTC

Grrrr... just as I was writing a response to this thread announcing that I had been up for 15 hours and everything appeared stable, the damn thing crashed again. :( At least CPDN didn't do anything weird this time. Picked up from the last save point at 2.6%. I finally got my trickle in and got some credit so I guess that is a good thing. I tweaked another couple of settings this time. See how long it lasts. Damn computers! Oh wait... I studied them for 5 years and am now supposed to make a living off of them. Computers are good! Now just keep saying that to yourself... There we go.
<br>
----------------------------
A member of The Knights Who Say Ni!
Yet another stats page: <a href="http://boinc-kwsn.no-ip.info">http://boinc-kwsn.no-ip.info</a>
ID: 1598 · Report as offensive     Reply Quote
old_user10

Send message
Joined: 5 Aug 04
Posts: 55
Credit: 87,392
RAC: 0
Message 1599 - Posted: 25 Aug 2004, 9:03:51 UTC - in response to Message 1578.  

&gt; toby, you may want to wait a day, hopefully the launch version fixes some of
&gt; the "-5 errors" people have been getting.
&gt;

That's a worrying statement. Does that mean that the launch version is new. ie. Hasn't been tested?


<a href="http://www.users.globalnet.co.uk/~sykesm/cpdn.html"><img src="http://www.users.globalnet.co.uk/~sykesm/gfx/sig.jpg"></a>
ID: 1599 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 1634 - Posted: 25 Aug 2004, 16:50:08 UTC - in response to Message 1599.  

&gt; That's a worrying statement. Does that mean that the launch version is new.
&gt; ie. Hasn't been tested?

well it's always tested, the question is by how many and for how long! :-)

ID: 1634 · Report as offensive     Reply Quote
old_user760
Avatar

Send message
Joined: 10 Aug 04
Posts: 94
Credit: 309,849
RAC: 0
Message 1645 - Posted: 25 Aug 2004, 17:05:14 UTC
Last modified: 25 Aug 2004, 17:41:47 UTC

If it's Boinc 4.03 it's only been alpha tested by a relatively small group. I was an alpha tester until the last general meltdown, then gave up in disgust. It hasn't been beta tested to my knowledge. I beleive they skipped V4.04 and are now testing V4.05.


<img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=35&amp;trans=off"><a href="http://mysite.wanadoo-members.co.uk/thefinalfrontear/index.html">Team Site Link</a>
ID: 1645 · Report as offensive     Reply Quote
old_user909

Send message
Joined: 17 Aug 04
Posts: 56
Credit: 63,814
RAC: 0
Message 1710 - Posted: 26 Aug 2004, 9:19:48 UTC

Well, FYI... after I rebooted last night, I took my FSB down by quite a bit using a program supplied by nvidia that lets you change such settings from within windows. This let me boot up &amp; start BOINC while my computer thought it was running an a 2400+ so it wouldn't assign me a new hostid and invalidate my current work unit. Now I'm running at 2 GHz and have been up for over 24 hours. If I'm still up tomorrow morning I'll try to slowly crank it back up to full speed and see what happens. One comforting thing about the work units that errored out: They were reported as client errors by the other person processing them as well so at least I know that wasn't due to my CPU failing in a big way.
<br>
----------------------------
A member of The Knights Who Say Ni!
Yet another stats page: <a href="http://boinc-kwsn.no-ip.info">http://boinc-kwsn.no-ip.info</a>
ID: 1710 · Report as offensive     Reply Quote

Message boards : Number crunching : Crashing CPDN client

©2024 climateprediction.net