climateprediction.net home page
Computational Error

Computational Error

Message boards : Number crunching : Computational Error
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile old_user71
Avatar

Send message
Joined: 5 Aug 04
Posts: 19
Credit: 16,547
RAC: 0
Message 16645 - Posted: 17 Oct 2005, 17:33:53 UTC

A little help please, CPDN has been happily crunching over the last few weeks and had clcoked up 150hrs the other day for no apparant reason it failed.

any idea why????

16/10/2005 06:02:11|climateprediction.net|Restarting result 35ci_000168378_0 using hadsm3 version 4.13
16/10/2005 06:02:11|SETI@home|Pausing result 13oc03aa.3322.15313.1003390.138_0 (removed from memory)
16/10/2005 06:02:12||request_reschedule_cpus: process exited
16/10/2005 06:03:05|climateprediction.net|Unrecoverable error for result 35ci_000168378_0 ( - exit code -5 (0xfffffffb))
16/10/2005 06:03:05||request_reschedule_cpus: process exited
16/10/2005 06:03:05|climateprediction.net|Computation for result 35ci_000168378_0 finished
16/10/2005 06:03:05|SETI@home|Restarting result 13oc03aa.3322.15313.1003390.138_0 using setiathome version 4.18
16/10/2005 08:22:26||request_reschedule_cpus: process exited
Click the Sig


Join UBT
ID: 16645 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 16646 - Posted: 17 Oct 2005, 17:53:56 UTC
Last modified: 17 Oct 2005, 18:01:50 UTC

There is a bug in BOINC 4.45, described in <a href=\"http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=2855\"> this</a> thread, which is linked to <a href=\"http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=2921\"> this</a> thread.
This contains a link to an unofficial fix which appears to work, if you want to try it.
The problem is apparently fixed in the 5.x version, due for release Real Soon Now.

As for error 5, this is a general purpose label for several problems.
It may be that your computer isn\'t stable enough for cpdn.

There are some ideas for a fix <a href=\"http://www.climateprediction.net/board/viewtopic.php?t=2126\"> here,</a> and <a href=\"http://www.climateprediction.net/board/viewtopic.php?t=2124\"> here.</a>

It\'s also possible that your power supply is not up to it.

ID: 16646 · Report as offensive     Reply Quote
old_user9685

Send message
Joined: 2 Sep 04
Posts: 44
Credit: 372,682
RAC: 0
Message 16672 - Posted: 19 Oct 2005, 9:42:27 UTC - in response to Message 16645.  
Last modified: 19 Oct 2005, 9:44:12 UTC

Regarding preferences:
On your preferences settings, set \"Leave applications in memory while preempted?\" to yes. This will prevent the model from being unloaded each time BOINC switches between projects, and you therefore have less chance of a -5 error when the model \"restarts\" because it will resume instead of restart.

Regarding BOINC clients:
You have two choices for clients.

v5.2.x of BOINC will not unload the model from memory when benchmarking (unless \"Leave applications in memory while preempted?\" is no), so using this client is peferable to avoid the problem described in your initial post.

v4.45 of BOINC will unload the model when benchmarking, but only waits 10 secs for the model to terminate. Should the model take more than 10 secs to terminate, BOINC will abort the benchmarks and you\'re likely to have your system running idle until you notice it.

The \"unofficial\" v4.45b that Les has indicated waits 30 secs (avoiding the idle state), but the model is still unloaded and you still run the risk of it dying on restart. If you\'re determined to stay with a v4 client, then this is the suggested one for the reasons I have explained above.
ID: 16672 · Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 6 Jul 06
Posts: 141
Credit: 3,511,752
RAC: 144,072
Message 25150 - Posted: 17 Nov 2006, 14:37:48 UTC

> Hello to the Climate team.
I have resurected this thread as it has the correct title.
I noticed today that one of my CP WU\'s had disappeared from my computer.
Being a bit naive I thought \"you beauty, I have completed my first WU\".
On checking the result I was informed that the WU had failed with a \"computational error\".
My next words uttered was \'bullshit, I don\'t believe it, what happened there?\'.
It would appear that after 6,434,209.302419 seconds (1,787.28 Hours) that the workunit decided to die on it\'s sword for no real reason that I could see.


http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=5217487


got the error \"exit code 1\" and the following

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/2jkgfo.pjo2c10 to netcdf format.

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/2jkgfo.pio2c10 to netcdf format.

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/2jkgfo.pfo2c10 to netcdf format.

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/2jkgfa.pho2c10 to netcdf format.

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/2jkgfa.pgo2c10 to netcdf format.

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/2jkgfa.peo2c10 to netcdf format.

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/2jkgfa.pdo2c10 to netcdf format.

</stderr_txt>

Validate state OK
Claimed credit 32,549.14
Granted credit 31,622.40
application version 5.08

Has all my time been worth it? Or wasted? Is the WU now of any use or has all the trickles I sent in given the data that the scientists needed?
ID: 25150 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 25151 - Posted: 17 Nov 2006, 19:18:31 UTC

Useful information is returned at the end of each Model Year, on 04Dec. A lot more information is returned every 10 Model Years and a full Restart Dump every 40 Model Years. Your effort isn\'t wasted.

The Run could be restarted from a backup if you have one.

Les\' comments for Exit Code -1 and -107... here:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=4710#23372

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 25151 · Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 6 Jul 06
Posts: 141
Credit: 3,511,752
RAC: 144,072
Message 25175 - Posted: 19 Nov 2006, 3:36:16 UTC

Thanks astroWX,
I might try from a backup, which is 1 or 2 weeks old, so have not lost much.
The \'exit code 1\' that I am getting must be from a different souce as I am not running Windows but Linux and that thread from Les is about Windows machines.

Do I just locate the old file in the backup project folder and copy that back into the current working project folder? Will Boinc detect this?
ID: 25175 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25176 - Posted: 19 Nov 2006, 4:59:00 UTC

Not the old FILE.
The entire BOINC FOLDER, along with all of the sub-folders.

The data needed to restart is scattered over several folders, starting in the main BOINC folder, and extending down to a sub-folder of the models folder.

The error code 1 may well be because of shutting down your computer without first exiting from BOINC, whatever operating system you use.
Or because of an older version of the graphics software.

ID: 25176 · Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 6 Jul 06
Posts: 141
Credit: 3,511,752
RAC: 144,072
Message 25184 - Posted: 19 Nov 2006, 14:35:30 UTC - in response to Message 25176.  

Not the old FILE.
The entire BOINC FOLDER, along with all of the sub-folders.

The data needed to restart is scattered over several folders, starting in the main BOINC folder, and extending down to a sub-folder of the models folder.

The error code 1 may well be because of shutting down your computer without first exiting from BOINC, whatever operating system you use.
Or because of an older version of the graphics software.



Thanks Les, it looks like I have lost that WU then, as I have only been backing up the climateprediction.net subfolder under the project subfolder in Boinc folder.
This would explain why removing the climate subfolder and then replacing with my backup did not change anything and the client kept on doing what it was doing before. I have not been backing up the whole Boinc folder, and as I run 6 other projects on the same computer I believe any restart from a backed up folder will create errors and problems with the other projects, so I will just forget about that WU.

ID: 25184 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25188 - Posted: 19 Nov 2006, 19:16:03 UTC

It is possible to restore a backup made while running multiple projects.
There is a section in the BOINC Wiki explaining it, but this site is unreachable at the moment.
When it\'s up, search for: Backup_BOINC

ID: 25188 · Report as offensive     Reply Quote

Message boards : Number crunching : Computational Error

©2024 climateprediction.net