climateprediction.net home page
DAMN. Another \'Unresolvable Error\' at 80%!!!

DAMN. Another \'Unresolvable Error\' at 80%!!!

Questions and Answers : Windows : DAMN. Another \'Unresolvable Error\' at 80%!!!
Message board moderation

To post messages, you must log in.

AuthorMessage
Pete McCann

Send message
Joined: 8 Mar 06
Posts: 24
Credit: 12,791,616
RAC: 0
Message 28989 - Posted: 26 May 2007, 11:25:06 UTC

Well it is certainly not third time lucky. I\'ve just uploaded another model failure. Damn and Blast!!!

It is on the same 4 core machine as last time. Computer ID 532553. Is it terminal Doc?

At least I did get a BBC model to completion yesterday, and it\'s pair should complete today, so it is not all doom and gloom.

Let me know about this one, as my last backup is about a week old. I\'m running a bit behind my normal regime.

Cheers guys.

Pete McCann.
ID: 28989 · Report as offensive     Reply Quote
Pete McCann

Send message
Joined: 8 Mar 06
Posts: 24
Credit: 12,791,616
RAC: 0
Message 28991 - Posted: 26 May 2007, 11:37:49 UTC - in response to Message 28989.  

I\'ve just tracked down the right page for this model. Looks like it\'s \'Negative pressure\' again. Boo Hiss.

Did the model manage to make it to 2050? It will be fairly close.

Are my other 2 models from this batch also likely to crash? All 4 were downloaded at the same time. Do I have a bad batch?

Pete
ID: 28991 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 28998 - Posted: 26 May 2007, 17:40:12 UTC

Commiserations, Peter.

I seem to remember that a BBC member restored a backup of a model that had crashed with this error (they\'re the same type of model). On the restored rerun, the model got through and continued successfully. This must mean that the error can occasionally be generated by a calculation/processing glitch on the computer.

If you have a backup, you might like to try?

Trouble is that you have to restore all the models running on the machine, so the restore causes them all to repeat some computing time. Restoring a single model on a multi-core computer can be done, but the procedure is said to be a hassle, big-time.
Cpdn news
ID: 28998 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 29007 - Posted: 27 May 2007, 9:54:15 UTC


Astro found the odds of getting it going again after a NEGATIVE THETA/PRESSURE are low (it will have already retried 3 times in any case, i.e., the day/month/year restart). The only case where it would work is if it was due to a computer glitch before the start of the current model year.

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 29007 · Report as offensive     Reply Quote
Pete McCann

Send message
Joined: 8 Mar 06
Posts: 24
Credit: 12,791,616
RAC: 0
Message 29011 - Posted: 27 May 2007, 11:30:33 UTC - in response to Message 29007.  

As my backup is about a week old, I have just continued with the two remaining models, hoping that they to don\'t die on me as well. 2 out of 4 is already unlucky. 4 out of 4 would be a disaster! I have put a copy of the backup to one side, to run it again on a single core machine at some point. Seems a bit of a waste of time to do this now on a quad core machine.

Did the one that just failed make it to 2050 by the way? I wasn\'t sure how to check.

On a brighter note, I\'ve just got a pair of models on another machine to completion, over on the BBC side.

Pete
ID: 29011 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 29017 - Posted: 27 May 2007, 20:33:06 UTC


Just barely to 2050: The link to the result is here, you\'ll find the graph off the right-hand edge of the screen near the bottom.

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6278224

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 29017 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 29019 - Posted: 27 May 2007, 21:21:50 UTC
Last modified: 27 May 2007, 21:23:57 UTC

Are my other 2 models from this batch also likely to crash? All 4 were downloaded at the same time. Do I have a bad batch?


\"Bad batch\" isn\'t quite the right phrase.
All datasets in a batch are similar but slightly different, and are exploring different areas of \"climate space\". Whether or not any or all will fail to complete, depends on the exact combinations of values for all of the starting parameters. If it was possible to know before a model completes what the outcome would be, then it wouldn\'t be necessary to run the models in the first place.

Think of it as a ruler, partially positioned over the edge of a table. Now slowly tap the \'on table\' end until it falls off. How close can you get the center of the ruler to the edge of the table? This depends on the ruler material being the same mass for the full length, the sides being exactly parallel, how small an amount you can tap it, etc.

The slightly different datasets may ALL result in failure before the full run, or some may make it and others not.
But at which \'end\' of the values are the short runners, and which the long runners?
And at what point do they start failing?

I\'ve always felt that a model that fails should be left that way, so that the researchers can tell that it HAS failed. If the failure was due to an unstable computer, then making it more stable, by e.g. not overclocking as much, then that\'s OK, but doing everything possible to make it continue, such as moving it to a different brand of processor, with slightly different maths routines is, I feel, cheating.
A bit like starting the Indianapolis 500 in a Ferrari, and finishing it in a Lamborghini.
Others feel differently about this.

ID: 29019 · Report as offensive     Reply Quote
Pete McCann

Send message
Joined: 8 Mar 06
Posts: 24
Credit: 12,791,616
RAC: 0
Message 29027 - Posted: 28 May 2007, 10:49:45 UTC - in response to Message 29019.  

If the failure was due to an unstable computer, then making it more stable, by e.g. not overclocking as much, then that\'s OK, but doing everything possible to make it continue, such as moving it to a different brand of processor, with slightly different maths routines is, I feel, cheating.


Hi thanks for the replies.

That\'s good news that the model just made it to 2050. That\'s another \'completed\' one for the headline stats anyway.

I\'m fairly sure this will not be a computer error. This model was running on a server board with opterons and registered ECC memory. It is not overclocked at all, so it should be pretty damn stable. I\'ll probably restore a backup at some point just to make sure it crashes at the same place.

What figure for the timeslices is represented by the 2050 model year, or the point at which a model is deemed completed for the headline stats?

Cheers.
ID: 29027 · Report as offensive     Reply Quote

Questions and Answers : Windows : DAMN. Another \'Unresolvable Error\' at 80%!!!

©2024 climateprediction.net