climateprediction.net home page
Is anyone else getting mutliple runtime errors?

Is anyone else getting mutliple runtime errors?

Message boards : Number crunching : Is anyone else getting mutliple runtime errors?
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user626663

Send message
Joined: 25 Jun 10
Posts: 4
Credit: 188,122
RAC: 0
Message 42624 - Posted: 16 Jul 2011, 1:45:53 UTC

Two climate models have now failed due to runtime errors. Is this problem occurring frequently for others? I hope the other models with 300+ hours on them will be able to finish.
ID: 42624 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1079
Credit: 6,902,393
RAC: 6,787
Message 42628 - Posted: 16 Jul 2011, 17:51:54 UTC - in response to Message 42624.  

The HADCM3N model hadcm3n_ym0x_1900_40_007361403_0 failed on your quad today, but I can't see the other one.

The other Windows/Intel computer running a model from the work unit has produced two trickles, whereas the model on your computer failed after one trickle. That argues that the problem does not lie with the model itself, but is most likely some local and possibly temporary problem.

Was anything happening on the computer when it failed? Virus scan? Microsoft Update?
ID: 42628 · Report as offensive     Reply Quote
old_user626663

Send message
Joined: 25 Jun 10
Posts: 4
Credit: 188,122
RAC: 0
Message 42630 - Posted: 16 Jul 2011, 20:48:39 UTC - in response to Message 42628.  

I might have been running too many programs at once. I can get away with playing video games and running 3 or 4 climate models on a four core CPU and it doesn't lag, but it might have caused one of the models to crash. Thank you for your help.
ID: 42630 · Report as offensive     Reply Quote
old_user658015

Send message
Joined: 16 Jul 11
Posts: 2
Credit: 351,861
RAC: 0
Message 42647 - Posted: 20 Jul 2011, 4:00:24 UTC - in response to Message 42624.  

My first attempt at a climate run failed with a runtime error in windows XP. It had an Oct. 15 deadline and had completed 4%. I opened up my mailer and the program jumped to 100% complete, maintained running state versus 'ready to report' and started windows errors. I suspended it for now and immediately got another segment.
ID: 42647 · Report as offensive     Reply Quote
old_user658015

Send message
Joined: 16 Jul 11
Posts: 2
Credit: 351,861
RAC: 0
Message 42648 - Posted: 20 Jul 2011, 4:42:21 UTC - in response to Message 42647.  

After the unit cycled back to attempt to run the error occurred once and the unit went to Computation Error. Should I just abort it?
ID: 42648 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,053,321
RAC: 4,417
Message 42649 - Posted: 20 Jul 2011, 8:40:27 UTC

Unless your have a recent backup to use to do a restore there is nothing to do. The WU is dead. When it reports it will disappear from you machine. Starting your email client seems a trivial thing to cause a crash. Usually, the kind of programs that do this are those that are very resource intensive, such as video editors and the like that consume tons of RAM and CPU cycles.

ID: 42649 · Report as offensive     Reply Quote
ChrisD

Send message
Joined: 8 Aug 04
Posts: 69
Credit: 1,561,341
RAC: 0
Message 43109 - Posted: 30 Sep 2011, 12:23:01 UTC

You are not the only one, if that is any comfort to You. :)

I had 4 models running, 60% done but this morning an error Message said that my Catalyst Driver had problems and needed to close.

Screen was unresponsive so I had to shut Windows down and reboot.

When I got the machine back up, BOINC showed no tasks running..?.?

I have been running MemTest86+ for the last 4 hours, just to make sure the hardware is ok. No Errors.

On my account page all tasks shows computing error..

The Catalyst Driver has been deleted.

I have taken 4 new tasks in and disabled Network Traffic. This time I am going to make backups twice a day.

Bestr of Luck to You.

ChrisD





ID: 43109 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 43115 - Posted: 30 Sep 2011, 23:50:45 UTC

Even experienced and reliable crunchers can have a catastrophic model crash occasionally!
Cpdn news
ID: 43115 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,376,846
RAC: 3,590
Message 43120 - Posted: 1 Oct 2011, 6:10:57 UTC - in response to Message 43109.  

Chris,
This may be teaching grandmother to suck eggs but make sure you suspend computation and stop BOINC before doing your backups.

Dave
ID: 43120 · Report as offensive     Reply Quote
Profile Ananas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 43130 - Posted: 2 Oct 2011, 23:09:33 UTC - in response to Message 43115.  

Even experienced and reliable crunchers can have a catastrophic model crash occasionally!

On current machines those crashes lost a lot of their nastyness.

Somehow I considered the earlier models on P3 Tualatin and Athlon XP Thoroughbred to be way more valuable ;-)
ID: 43130 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 43131 - Posted: 3 Oct 2011, 1:01:15 UTC - in response to Message 43130.  
Last modified: 3 Oct 2011, 1:04:40 UTC

This is specifically about model crashes with the hadcm3n models -- none available right now -- but if there are more soon -- keep it in mind.

First do backups !
Second -- keep crunching

Backups won't help if you get the evil 193 error at the first upload. (or at the last -- happened to me once, happened to a few others).
But they will help if you get a disk read error or an "out of space" error or any driver or forced reboot error or a sigsegv on your PC. Or a mains power fail.

If you look at the "top computers" tab on this site -- only 1 out of 5 of them has completed any hadcm3n model. The stats for us midrange crunchers are much better -- at least 70% complete (not counting misconfigured or overclocked machines)

If you have a backup you can just restore and keep on crunching.


PS -- maybe this discussion should be over on the number crunching board.
ID: 43131 · Report as offensive     Reply Quote
Profile Ananas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 43133 - Posted: 3 Oct 2011, 2:01:05 UTC - in response to Message 43131.  

...
PS -- maybe this discussion should be over on the number crunching board.

You're right ... done
ID: 43133 · Report as offensive     Reply Quote

Message boards : Number crunching : Is anyone else getting mutliple runtime errors?

©2024 climateprediction.net