climateprediction.net home page
Unrecoverable error in BBC CCE

Unrecoverable error in BBC CCE

Message boards : climateprediction.net Science : Unrecoverable error in BBC CCE
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile old_user280873
Avatar

Send message
Joined: 18 Feb 06
Posts: 17
Credit: 1,769,142
RAC: 0
Message 27218 - Posted: 8 Mar 2007, 8:25:38 UTC

At 57% of the work completed a unrecoverable error was reported and the progress jumped to 100%.
Every week I make a complete backup of the BOINC dir. Is it worth while to recover the backup? In the meantime I started a new project from climateprediction.net?

Thanks for advices?

LeendertvM


ID: 27218 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 27220 - Posted: 8 Mar 2007, 18:10:23 UTC

It depends on the error. Please post the first 20 lines from the top of Messages tab, so we can see the Computer ID. (Nice machine, by the way.)

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 27220 · Report as offensive     Reply Quote
Profile old_user280873
Avatar

Send message
Joined: 18 Feb 06
Posts: 17
Credit: 1,769,142
RAC: 0
Message 27240 - Posted: 9 Mar 2007, 9:47:56 UTC - in response to Message 27220.  

Herewith the first lines:
7-3-2007 8:19:40||Starting BOINC client version 5.8.15 for windows_intelx86
7-3-2007 8:19:40||log flags: task, file_xfer, sched_ops
7-3-2007 8:19:40||Libraries: libcurl/7.16.0 OpenSSL/0.9.8a zlib/1.2.3
7-3-2007 8:19:40||Data directory: C:\\Program Files\\BOINC
7-3-2007 8:19:41||Processor: 2 GenuineIntel Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz [x86 Family 6 Model 15 Stepping 6] [fpu tsc pae sse sse2 mmx]
7-3-2007 8:19:41||Memory: 1.98 GB physical, 3.83 GB virtual
7-3-2007 8:19:41||Disk: 232.88 GB total, 204.96 GB free
7-3-2007 8:19:41|BBC Climate Change Experiment|URL: http://bbc.cpdn.org/; Computer ID: 259632; location: home; project prefs: default
7-3-2007 8:19:41|rosetta@home|URL: http://boinc.bakerlab.org/rosetta/; Computer ID: 380169; location: (none); project prefs: default
7-3-2007 8:19:41|SETI@home|URL: http://setiathome.berkeley.edu/; Computer ID: 2948681; location: (none); project prefs: default
7-3-2007 8:19:41|World Community Grid|URL: http://www.worldcommunitygrid.org/; Computer ID: 90506; location: (none); project prefs: default
7-3-2007 8:19:41||General prefs: from BBC Climate Change Experiment (last modified 2006-12-13 22:56:05)
7-3-2007 8:19:41||Host location: home
7-3-2007 8:19:41||General prefs: no separate prefs for home; using your defaults
7-3-2007 8:19:41||Reading preferences override file
7-3-2007 8:19:41|BBC Climate Change Experiment|Restarting task hadcm3ohf_aojo_00898592_0 using hadcm3 version 515
7-3-2007 8:19:42|rosetta@home|Restarting task s036__BOINC_ABRELAX_truncate_hom001__1601_2697_0 using rosetta version 548


ID: 27240 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 27249 - Posted: 9 Mar 2007, 17:59:04 UTC

Thanks. This is the error:
Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA
Fatal crash! :-(

There\'s a remote possibility that it was caused by an unstable parameter mix. More likely, I think, is a transient Floating Point calculation error -- especially if the machine is over-clocked. That makes it a good candidate for restart from your backup. (If it fails at that point a second time, there\'d be no use making a third attempt.)

Note: Repeated Trickles will be ignored by the Server and there will be no credit until a \"new\" Trickle is received.

There\'s something else strange because there are four other failures; except for the Model you aborted, three had MD5 checksum errors in download. An occasional MD5 failure wouldn\'t be too unusual but three seems out of the ordinary. (Perhaps someone else can shed some light on that.)

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 27249 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 27253 - Posted: 9 Mar 2007, 20:09:30 UTC


I do remember that there was a bunch of work units generated with MD5 errors, they were fixed within a couple of days I think. Enough time to get 3 bad downloads, anyway.

It was the third batch of 5.15s?
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 27253 · Report as offensive     Reply Quote
Profile old_user280873
Avatar

Send message
Joined: 18 Feb 06
Posts: 17
Credit: 1,769,142
RAC: 0
Message 27264 - Posted: 10 Mar 2007, 9:32:29 UTC - in response to Message 27253.  

Thanks for the info.
The pc is not overclocked. This duo core 6600 machine runs continously heavy Elliott Wave stockmarket calculations. The combination with Boinc might cause floating point calculation errors; very unlikely of course, but I just want to mention it.
I am curious if these errors will happen again. (you probably too) Because of that I\'ll recover the backup from Febr 17th with an older version of the Boinc manager. So, it will take 18 days without credits on one core. No problem with that; I don\'t participate for credits!

ID: 27264 · Report as offensive     Reply Quote
Profile old_user280873
Avatar

Send message
Joined: 18 Feb 06
Posts: 17
Credit: 1,769,142
RAC: 0
Message 27786 - Posted: 8 Apr 2007, 15:33:30 UTC - in response to Message 27264.  

Well, the model failed a second time at the same point.
So, I let this run rest in peace....
ID: 27786 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 27791 - Posted: 8 Apr 2007, 18:19:51 UTC
Last modified: 8 Apr 2007, 18:20:40 UTC

Pity. Thanks for verifying the \"Negative Pressure\" failure.

It wasn\'t long after my post that we (or, at least, I) stopped recommending reruns for that condition as it seems there were more than a few of the unstable parameter mixes sent out. That\'s no consolation to you, though.

Again, thanks for the rerun and for posting the result.

Happy Easter!

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 27791 · Report as offensive     Reply Quote

Message boards : climateprediction.net Science : Unrecoverable error in BBC CCE

©2024 climateprediction.net