climateprediction.net home page
Job crashing without resuming

Job crashing without resuming

Questions and Answers : Windows : Job crashing without resuming
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user178710

Send message
Joined: 27 Mar 06
Posts: 4
Credit: 72,449
RAC: 0
Message 21930 - Posted: 7 Apr 2006, 8:30:24 UTC

Currently I use BOINC 5.2.13 running two CPDN jobs, and it\'s very unstable at my Opteron 275 workstation, WinXP Pro. Sometimes it crashes every few hours with the same problem: Visual Fortran runtime error. After a few crashes it resumed the jobs but yesterday one of the jobs (hadcm3lb_5gz2_05043258) resumed while the other (hadcm3lb_5gz3_05043259) simply disappeared. I have its directory untouched but referring XML file within BOINC\\projects\\climateprediction.net\\ disappeared too. When BOINC connected to the CPDN server it requested a new job (hadcm3lb_592q_05033022) and tried to run it. I aborted the new job and want to resume the previous one. So what should I do to resume it?

I completed two CPDN runs using the old good .NET client and everything was almost perfect. But your brand-new-shiny-state-of-the-art BOINC seems to be a piece of crap like everything else from Berkeley :(
ID: 21930 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 21934 - Posted: 7 Apr 2006, 9:28:13 UTC

If you want hadcm3lb_5gz3_05043259 back, then start praying for a miracle, because it\'s crashed, and gone for good.

result page for hadcm3lb_5gz3_05043259.

ID: 21934 · Report as offensive     Reply Quote
old_user178710

Send message
Joined: 27 Mar 06
Posts: 4
Credit: 72,449
RAC: 0
Message 21935 - Posted: 7 Apr 2006, 9:46:16 UTC - in response to Message 21934.  

Very sad. Error code seems to be corresponding to this:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=4231

Anyway, for the future, may it help to prevent such situations if I make daily automated backups of BOINC directory and restore it from backup after similar crashes?
ID: 21935 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 21938 - Posted: 7 Apr 2006, 10:53:19 UTC

Yes, backups are a great idea. At one point last month I was making a backup every 12 hours, until I got past whatever the problem was. For a while, I didn\'t know if my middle name was \"Backup\", or \"Paranoid\".

The only problem with an automated backup, is that BOINC MUST be exited before the backup, or else the files will get out of sync. And there\'s at least one \'lockfile\'.

ID: 21938 · Report as offensive     Reply Quote
old_user178710

Send message
Joined: 27 Mar 06
Posts: 4
Credit: 72,449
RAC: 0
Message 21951 - Posted: 8 Apr 2006, 11:49:30 UTC - in response to Message 21938.  

Thanks a lot. I\'ve already found the lockfiles and automated backup process. All we need are two scheduled tasks: \'boinccmd.exe --quit\' before the backup time and \'boinc.exe\' after it.

But it shows the greatest problem of BOINC: casual people having no serious motivation (and having no wish to deal with permanent bugs) would quit the project after a couple of similar crashes. Actually I experienced troubles simply trying to run it becuase one of BOINC processes locked all downloaded data files and didn\'t want to release them in order to start computing; it took about one hour to solve this problem and make BOINC work. So I wouldn\'t be surprised hearing that CPDN loses participants becuase of BOINC.
ID: 21951 · Report as offensive     Reply Quote
Profile old_user17289

Send message
Joined: 13 Sep 04
Posts: 228
Credit: 354,979
RAC: 0
Message 21967 - Posted: 10 Apr 2006, 5:42:39 UTC - in response to Message 21951.  

But it shows the greatest problem of BOINC: casual people having no serious motivation (and having no wish to deal with permanent bugs) would quit the project after a couple of similar crashes. Actually I experienced troubles simply trying to run it becuase one of BOINC processes locked all downloaded data files and didn\'t want to release them in order to start computing; it took about one hour to solve this problem and make BOINC work. So I wouldn\'t be surprised hearing that CPDN loses participants becuase of BOINC.

Never heard of such a problem; how did you solve it?
ID: 21967 · Report as offensive     Reply Quote
old_user178710

Send message
Joined: 27 Mar 06
Posts: 4
Credit: 72,449
RAC: 0
Message 22010 - Posted: 13 Apr 2006, 6:23:46 UTC - in response to Message 21967.  

Never heard of such a problem; how did you solve it?


I didn\'t find any appropriate way to solve it. Fortunately it was solved thanks to the last and the most stupid thing I could do: I uninstalled BOINC and installed it again. And then everything started.
ID: 22010 · Report as offensive     Reply Quote
Profile old_user81594

Send message
Joined: 11 Jun 05
Posts: 67
Credit: 1,222,916
RAC: 0
Message 22139 - Posted: 17 Apr 2006, 13:26:44 UTC - in response to Message 21938.  


How do you back-up successfully?
My CPDN model has just crashed.

[Second time. First one got up to 168 hours. This has just crashed after 125 hours.....huh?!?
I might give up - waste of electricity if the program is so unstable that it looses all data when it crashes.]

Re-booted PC and it has gone from the BOINC Manager.

Do you just copy the ClimatePrediction.net folder from the D:\\Programs\\Boinc\\projects\\climateprediction.net directory?

simple as that?


Neil.




Yes, backups are a great idea. At one point last month I was making a backup every 12 hours, until I got past whatever the problem was. For a while, I didn\'t know if my middle name was \"Backup\", or \"Paranoid\".

The only problem with an automated backup, is that BOINC MUST be exited before the backup, or else the files will get out of sync. And there\'s at least one \'lockfile\'.



ID: 22139 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 22154 - Posted: 17 Apr 2006, 20:02:13 UTC

Do you just copy the ClimatePrediction.net folder from the D:\\Programs\\Boinc\\projects\\climateprediction.net directory?

simple as that?


Yes.
But only after you menu/Exit from BOINC. Otherwise the many files get out of sync, and the backup is useless.
After a restore, do a re-boot, to clear out memory of any left overs from the crash.

Your models are failing for a variety of reasons, including: exit code -1073741819,
which is a graphics problem, and a couple with an exit code of 1. (forget what that is.)

It may help to read the sticky right at the top of this \"Windows\" help board.
For the graphics, an update of the driver from the card makers web site sometimes fixs it.
But for graphics heavy programs, you either need more ram, or to menu/Suspend BOINC before using them.


ID: 22154 · Report as offensive     Reply Quote

Questions and Answers : Windows : Job crashing without resuming

©2024 climateprediction.net