climateprediction.net home page
Why the crash?

Why the crash?

Questions and Answers : Windows : Why the crash?
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user5681

Send message
Joined: 31 Aug 04
Posts: 42
Credit: 547,031
RAC: 0
Message 35710 - Posted: 19 Dec 2008, 19:32:23 UTC
Last modified: 19 Dec 2008, 19:39:47 UTC

Hi

Today, I came back from work to find the hadcm3istd model I was crunching had crashed :(

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=8099696

The long list of \"stderr out\" error messages don\'t mean a great deal to me, although the last few lines say:

\"Model crashed: umshell1.f: TRANSO2A: Missing data in ocean UV fields\"

I haven\'t changed anything on my PC (software etc), the only thing I can think of is that I\'ve been \"locking\" my PC when leaving the house. (Start -> Lock)

It also crashed roughly about the time the next trickle was due...

Was there anything I could have done to prevent the crash?

Thanks for any insights.
ID: 35710 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 35711 - Posted: 19 Dec 2008, 22:43:04 UTC

Chances are that it is terminal. However, if you have a recent backup, and given that the Run progressed to 2007, you might try restoring the backup on the chance that the problem was transient and it might through next time. Sometimes it works.

As to Start/Lock, I don\'t recall the issue coming up before. If that\'s a long-standing behavior, and processing and communication with the CPDN servers has been done successfully over time, it isn\'t likely to have affected the Model. Perhaps someone who uses that feature will weigh-in.
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 35711 · Report as offensive     Reply Quote
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 35712 - Posted: 19 Dec 2008, 23:53:28 UTC - in response to Message 35711.  

Perhaps someone who uses that feature will weigh-in.

All of my systems are regularly locked in 3 different ways; manually, automatically when a remote access session finishes and automatically on resume from screen blanking. If locking was a problem I\'d have had a lot more failed models.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 35712 · Report as offensive     Reply Quote
old_user5681

Send message
Joined: 31 Aug 04
Posts: 42
Credit: 547,031
RAC: 0
Message 35725 - Posted: 21 Dec 2008, 20:59:29 UTC
Last modified: 21 Dec 2008, 21:04:59 UTC

Thanks for the fast replies. I do have a recent backup, but Boinc downloaded and started another model. Not sure if killing one model to try and save another is worth it.

I think I\'ll leave the option to \"allow new tasks\" on as default. It would be nice if Boinc gave you the option after a crash to try a backup file before starting another model.

Btw, it\'s a funny way of thinking when an option button stating \"allow new tasks\" means exactly the opposite of what it says. To quote from an old comedy show...\"Confused? you will be...\"
ID: 35725 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 35726 - Posted: 21 Dec 2008, 21:21:56 UTC
Last modified: 21 Dec 2008, 21:22:03 UTC

As the model was so well advanced I still think it would be worth trying your backup. But before restoring it you could back up the BOINC (or BOINC Data folder if you have version 6) folder with your new model in it. So if the crashed model fails again at the same point you\'d just abandon it and would be able to restore the new one. That way you wouldn\'t have to start a third model and you\'d keep the progress you\'ve made on the new one. Just name the backup folders in such a way that you\'ll know which backup is which.

Regarding the allowing of new tasks, it\'s what is listed in the project status column that matters. That\'s the current situation. The button has to say the opposite to allow you change the project\'s status.
Cpdn news
ID: 35726 · Report as offensive     Reply Quote
old_user5681

Send message
Joined: 31 Aug 04
Posts: 42
Credit: 547,031
RAC: 0
Message 35727 - Posted: 21 Dec 2008, 21:26:46 UTC
Last modified: 21 Dec 2008, 21:34:09 UTC

Thanks for the reply.

So do I just stop Boinc in the normal way, and then, after backing up the current model, apply the previous models backup, or do I have to \"suspend\" the model first?

I\'ll definately give it a go...
ID: 35727 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 35728 - Posted: 21 Dec 2008, 23:32:17 UTC

Suspend BOINC (in the menu), first and then WAIT until it SAYS Suspended in the Tasks tab.
That way, when you restart that model latter, BOINC will wait for you to start it. This will prevent any problems if the restore doesn\'t work. The less the server knows about your fiddling the better, error-label wise.

As for the button label, think of it this way:
There is a big machine which is started and stopped by a single push button in a different room. (Because of the noise.)
So that you can tell if pushing the button will start or stop the machine, there are 2 message lights, each of which tells you what will happen when you push the button. Pushing the button also switches the lights ready for next time.


Backups: Here
ID: 35728 · Report as offensive     Reply Quote
old_user5681

Send message
Joined: 31 Aug 04
Posts: 42
Credit: 547,031
RAC: 0
Message 35729 - Posted: 22 Dec 2008, 0:21:21 UTC

Job done sucessfully. I\'ll see what happens in about 24 hours...

Thanks again for all your help.
ID: 35729 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 35730 - Posted: 22 Dec 2008, 0:32:58 UTC

Well done! Let us know please whether the model gets past the earlier crash point.
Cpdn news
ID: 35730 · Report as offensive     Reply Quote
old_user5681

Send message
Joined: 31 Aug 04
Posts: 42
Credit: 547,031
RAC: 0
Message 35748 - Posted: 23 Dec 2008, 9:33:27 UTC
Last modified: 23 Dec 2008, 9:34:25 UTC

Success:)

Boinc sucessfully crunched past the previous sticking/crash point.

Thak goodness I had backed up just the day before. I used to let to it run to 3 or 4 days between backups...not any more. It\'ll be every other day from now on.

Its a good feeling being able to fix something succesfully - your help was invaluable. Thankyou.

A Merry Christams to all.
ID: 35748 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2168
Credit: 64,545,139
RAC: 6,708
Message 35751 - Posted: 23 Dec 2008, 14:52:45 UTC

Glad you got it progressing again. Merry Christmas to you.
ID: 35751 · Report as offensive     Reply Quote

Questions and Answers : Windows : Why the crash?

©2024 climateprediction.net