climateprediction.net home page
hadSM3 Slab Model 6.07 goes from 59% to 0% complete??

hadSM3 Slab Model 6.07 goes from 59% to 0% complete??

Questions and Answers : Windows : hadSM3 Slab Model 6.07 goes from 59% to 0% complete??
Message board moderation

To post messages, you must log in.

AuthorMessage
Chris

Send message
Joined: 4 Dec 07
Posts: 3
Credit: 14,477
RAC: 0
Message 36903 - Posted: 10 May 2009, 15:55:07 UTC

I'm running two models on my dual core computer and after my computer froze I cold booted the machine, turned it on and found one of the models had gone from upper 50s to 0% finished.

I'm pretty ignorant of how this software works, I was just hoping to install it and let it do it's thing. I'm a little computer savey; can somebody try and tell me how to correct the problem?
ID: 36903 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,089,004
RAC: 2,537
Message 36904 - Posted: 10 May 2009, 17:43:36 UTC - in response to Message 36903.  

I'm running two models on my dual core computer and after my computer froze I cold booted the machine, turned it on and found one of the models had gone from upper 50s to 0% finished.

I'm pretty ignorant of how this software works, I was just hoping to install it and let it do it's thing. I'm a little computer savey; can somebody try and tell me how to correct the problem?


Hi Chris,

Are you sure the WU that went from 59% to 0 is the same WU that you had before the reboot. A cold reboot without first shutting down the model can crash one or more WUs.If you have not clicked the “no new tasks” tab in Projects and one of your WUs crashed then your computer would just download another WU to take it’s place. This WU would start out at 0.

ID: 36904 · Report as offensive     Reply Quote
Chris

Send message
Joined: 4 Dec 07
Posts: 3
Credit: 14,477
RAC: 0
Message 36905 - Posted: 10 May 2009, 18:17:16 UTC - in response to Message 36904.  

I'm running two models on my dual core computer and after my computer froze I cold booted the machine, turned it on and found one of the models had gone from upper 50s to 0% finished.

I'm pretty ignorant of how this software works, I was just hoping to install it and let it do it's thing. I'm a little computer savey; can somebody try and tell me how to correct the problem?


Hi Chris,

Are you sure the WU that went from 59% to 0 is the same WU that you had before the reboot. A cold reboot without first shutting down the model can crash one or more WUs.If you have not clicked the “no new tasks” tab in Projects and one of your WUs crashed then your computer would just download another WU to take it’s place. This WU would start out at 0.


Not exactly; Again I admit I'm a little ignorant, however the name of each model is hadsm3fub_k68w_005972883_4 and hadsm3fub_k68x_005972884_5. I got them at the same time (last December I think), and the first one is the model which I think crashed.

What bothers me is that the data (>400 megs) is still on my hard drive. Is there any way for me to restart from some previous saved point?
ID: 36905 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2168
Credit: 64,543,482
RAC: 6,686
Message 36906 - Posted: 10 May 2009, 18:27:36 UTC

Looking at your two models, one trickled last on May 1st and the other on May 3rd. Which one of the ones went back to 0%?

Occasionally when a PC hangs, or unclean shutdown of BOINC occurs, a model will lose its place and start over at the beginning. It may still remember it's CPU time (probably several hundred hours in this case), but starts over at timestep 1 of phase 1 and goes from there. If you look at the globe graphic of the model that went back to 0%, what does it say in terms of timestep and phase and s/TS?

If this is indeed what happened, the only way you could get your previous work back on that model is to have made a backup prior to it crashing. See this thread for information on making and restoring backups.
ID: 36906 · Report as offensive     Reply Quote
Chris

Send message
Joined: 4 Dec 07
Posts: 3
Credit: 14,477
RAC: 0
Message 36907 - Posted: 10 May 2009, 18:36:09 UTC - in response to Message 36906.  

Looking at your two models, one trickled last on May 1st and the other on May 3rd. Which one of the ones went back to 0%?

Occasionally when a PC hangs, or unclean shutdown of BOINC occurs, a model will lose its place and start over at the beginning. It may still remember it's CPU time (probably several hundred hours in this case), but starts over at timestep 1 of phase 1 and goes from there. If you look at the globe graphic of the model that went back to 0%, what does it say in terms of timestep and phase and s/TS?

If this is indeed what happened, the only way you could get your previous work back on that model is to have made a backup prior to it crashing. See this thread for information on making and restoring backups.


That's exactly what's happened. The report deadline is the same for each, the CPU time is still at 229+ hours, but the completion date has changed back to 525+ hours with progress back to the beginning. The model which crashed is probably the one which had the later trickle date; that model was a couple percentage points behind my farther one.

I wasn't aware of backing up data until today. I've created a backed up version of the "BOINC" file in my "Program Files" directory as well as a backed up version of the "BOINC" file in my "ProgramData" directory by copying each folder to a folder in my "My Documents" directory. This is what I did while I was waiting for a reply. Is this all I need to do (I'm going to try and prevent my other model from getting lost as well)?

Also, I've read the model I've lost sent back data to the climate prediction servers and could be restarted where a certain test point had been made (every 10 model years?). If this is correct, is there a chance the model could be restarted but just at a previous date (if that makes sense)?
ID: 36907 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2168
Credit: 64,543,482
RAC: 6,686
Message 36908 - Posted: 10 May 2009, 19:02:10 UTC - in response to Message 36907.  

If you are running version 6.something of BOINC, all you need to do is backup the BOINC data directory, wherever that may be installed at. Usually it's a subfolder in the user's Application Data folder unless you changed the default upon install.

Without that backup, there really is no way to restart your crashed model from some other progress point throughout the model runtime.

Make sure you follow the instructions in the link provided for backup. You have to exit BOINC completely before making the backup in order to ensure it can be used to restart from that point.
ID: 36908 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 36909 - Posted: 10 May 2009, 21:05:25 UTC

Also, I've read the model I've lost sent back data to the climate prediction servers and could be restarted where a certain test point had been made (every 10 model years?).

That only applies to the Coupled Ocean models, which have names starting with HADCM3, and are a LOT longer than the ones you're working on, which are "slab ocean" models.
Also, the "restating" on the HADCM3's only applies to a new model being downloaded by someone, which had been partially completed (to a 10 year point), by a different person.


Backups: Here
ID: 36909 · Report as offensive     Reply Quote

Questions and Answers : Windows : hadSM3 Slab Model 6.07 goes from 59% to 0% complete??

©2024 climateprediction.net