climateprediction.net home page
Output file .... absent

Output file .... absent

Message boards : Number crunching : Output file .... absent
Message board moderation

To post messages, you must log in.

AuthorMessage
McPuppa

Send message
Joined: 9 Aug 05
Posts: 4
Credit: 1,702,553
RAC: 5,369
Message 42152 - Posted: 11 May 2011, 8:05:10 UTC
Last modified: 11 May 2011, 8:05:36 UTC

Hi everybody, I hope this is the correct forum.

It seems that one of the two work units is completed without packing results to send.
I found this message on log:

08/05/2011 03:04:17 climateprediction.net Computation for task hadcm3n_p4eo_1900_40_007223296_1 finished
08/05/2011 03:04:17 climateprediction.net Output file hadcm3n_p4eo_1900_40_007223296_1_1.zip for task hadcm3n_p4eo_1900_40_007223296_1 absent
08/05/2011 03:04:17 climateprediction.net Output file hadcm3n_p4eo_1900_40_007223296_1_2.zip for task hadcm3n_p4eo_1900_40_007223296_1 absent
08/05/2011 03:04:17 climateprediction.net Output file hadcm3n_p4eo_1900_40_007223296_1_3.zip for task hadcm3n_p4eo_1900_40_007223296_1 absent
08/05/2011 03:04:17 climateprediction.net Output file hadcm3n_p4eo_1900_40_007223296_1_4.zip for task hadcm3n_p4eo_1900_40_007223296_1 absent


Is there a way to produce the results files and not throw away all the work done?
ID: 42152 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1081
Credit: 7,071,544
RAC: 5,901
Message 42153 - Posted: 11 May 2011, 9:22:30 UTC

Unfortunately, the model crashed before creating the first Zip file. The list of absent files is just BOINC recording that the model has finished before generating all the files it was expecting to send back to the project. (There is one file per decade, so four files in total for a 40-year model.)

The stderr log on the task page shows a lot of BOINC quit requests. Perhaps one of the shutdowns caused the crash. It is a good idea to close down BOINC manually, particularly with two large HADCM3N models running.
ID: 42153 · Report as offensive     Reply Quote
McPuppa

Send message
Joined: 9 Aug 05
Posts: 4
Credit: 1,702,553
RAC: 5,369
Message 42161 - Posted: 12 May 2011, 8:56:13 UTC - in response to Message 42153.  

Ok for the future work, but in other words that work unit is lost.
ID: 42161 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,113,853
RAC: 2,603
Message 42164 - Posted: 12 May 2011, 14:57:16 UTC - in response to Message 42161.  

Yes, I'm afraid the Work Unit is lost. In future you might try making backups every few days. This will allow you to restore a crashed WU and go on with only minimal loss of time.

Info on how to make a backup and do a restore can be found at the top of the “Number Crunching” section in the “information about running the climate models” thread. Unfortunately, the backups have to be made before the WU’s crashes, so there is no way to fix the one you just lost.



ID: 42164 · Report as offensive     Reply Quote
Profile tullio

Send message
Joined: 6 Aug 04
Posts: 264
Credit: 965,476
RAC: 0
Message 42352 - Posted: 7 Jun 2011, 16:05:39 UTC

After finishing a hadam3p WU which took more than 200 hours on my Linux box I downloaded a hadcm3n WU which lasted less than a minute and ended with "output file absent" message. Was it a corrupt WU?
Tullio
ID: 42352 · Report as offensive     Reply Quote
Urglab

Send message
Joined: 27 Feb 08
Posts: 4
Credit: 960,510
RAC: 0
Message 42353 - Posted: 7 Jun 2011, 16:08:53 UTC

This http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=12945697 WU crashed for me after just 1 min too. Wasn't doing anything special at the time.
ID: 42353 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 42354 - Posted: 7 Jun 2011, 19:08:09 UTC

Urglab & tullio,

Both tasks terminated shortly after start-up. Stderr error report: INVALID THETA. That error indicates model instability and is not unusual with FAMOUS tasks but this is the first I've seen it with HadCM3n. I have nothing to suggest except that it looks like a bad batch of work -- and to hope the next batch is better.

Thanks for reporting the problem.
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 42354 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 42355 - Posted: 7 Jun 2011, 21:45:48 UTC

Yep, same for me with hadcm3n_q7ar_1940_40_007280157_0 . Invalid Theta in stderr.
ID: 42355 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42356 - Posted: 7 Jun 2011, 21:53:02 UTC

The programmers know about this. It's an over enthusiastic use of CO2 forcing, which has caused the models to turn into a "Venus world" in a few seconds.
We're now waiting for the RAPIT/RAPID people to decide what values to use instead.

No need to report more failures, thanks. :)


Backups: Here
ID: 42356 · Report as offensive     Reply Quote
old_user633787

Send message
Joined: 14 Sep 10
Posts: 11
Credit: 1,812,972
RAC: 0
Message 42358 - Posted: 7 Jun 2011, 22:09:37 UTC - in response to Message 42356.  

"Venus world"? You mean so much water vapor feedback that the oceans evaporate? What CO2 forcing are they using?
ID: 42358 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,113,853
RAC: 2,603
Message 42359 - Posted: 8 Jun 2011, 0:04:43 UTC

I think that I snagged another of those bad CM3n WU’s. HadCm3n_8_1940_007280448 crashed after running only 1 min 2 sec. No telling how many other people downloaded these extreme CO2 forcing WU’s from the same batch and just haven’t started them yet. At least it didn't waste a lot of computer time before the crash.

ID: 42359 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4352
Credit: 16,582,509
RAC: 5,992
Message 42364 - Posted: 8 Jun 2011, 15:26:41 UTC - in response to Message 42359.  

I have just had another of these models crash. UK Met Office Coupled Model Full Resolution Ocean v6.07 Interestingly, both have been just after restarting the computer and restarting boinc. Yes I have suspended the model and shut boinc down before shutting down the box.Don't know if this is of any use to those who put the models together or not.

Dave
ID: 42364 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1081
Credit: 7,071,544
RAC: 5,901
Message 42368 - Posted: 8 Jun 2011, 16:44:05 UTC - in response to Message 42364.  
Last modified: 8 Jun 2011, 16:45:59 UTC

[Dave wrote:] I have just had another of these models crash. UK Met Office Coupled Model Full Resolution Ocean v6.07 ...
The HADCM3N models in the queue are a mix of valid (unforced) models and invalid (forced) models. Just keep downloading them: if there are any valid ones left and you get one then let it run, otherwise let the invalid ones crash and don't attempt to rescue them. It's an odd way to sort the wheat from the chaff, but it works ...
ID: 42368 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 42370 - Posted: 8 Jun 2011, 17:42:40 UTC - in response to Message 42368.  

[Dave wrote:] I have just had another of these models crash. UK Met Office Coupled Model Full Resolution Ocean v6.07 ...
The HADCM3N models in the queue are a mix of valid (unforced) models and invalid (forced) models. Just keep downloading them: if there are any valid ones left and you get one then let it run, otherwise let the invalid ones crash and don't attempt to rescue them. It's an odd way to sort the wheat from the chaff, but it works ...


The downsides are large, wasteful downloads and a small amount of wasted CPU time. For people with limited connection speeds, this can be a pain. Isn't there a way to send kill packets to clients running these models with bad parameters?
ID: 42370 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42374 - Posted: 8 Jun 2011, 20:50:27 UTC - in response to Message 42370.  

Isn't there a way to send kill packets to clients running these models with bad parameters?
No.


And what you say doesn't make sense anyway, as the models crash so fast. For the 2 that I tried, (1 minute and 4 seconds), and (4 seconds repeated 5 times). Which are probably the same thing, as my 2 computers are running slightly different versions of BOINC, which report things slightly differently.

There have been posts about this in the News thread, to which everyone should subscribe.

As for large downloads, that's life. Long time crunchers can just keep up with the news, and manually stop downloading.
Keeping in mind another post, where it was said that what models are available are being grabbed before they can fall from the end of the conveyor belt into the storage bin.

40,000 computers, a couple of thousand models, slowly being prepared.
Not good for people wanting lots of work. :(


Backups: Here
ID: 42374 · Report as offensive     Reply Quote

Message boards : Number crunching : Output file .... absent

©2024 climateprediction.net