climateprediction.net home page
RAPIT tasks failing after few seconds

RAPIT tasks failing after few seconds

Message boards : Number crunching : RAPIT tasks failing after few seconds
Message board moderation

To post messages, you must log in.

AuthorMessage
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 167
Credit: 5,744,442
RAC: 0
Message 49044 - Posted: 5 May 2014, 15:10:55 UTC

I've just had two consecutive RAPIT task fail after only a few seconds with 'Compute error'. (16585904 & 16585822). I've been running RAPIT tasks for some time with no problems and have other models running OK at the moment.
Just wondering if I have problems or is it the WU's. (No changes on my system)
ID: 49044 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1019
Credit: 5,511,676
RAC: 2,196
Message 49045 - Posted: 5 May 2014, 17:01:10 UTC

The message (from an EU rather than RAPIT model):

execl(/Library/Application Support/BOINC Data/projects/climateprediction.net/hadam3p_eu_um_6.09_i686-apple-darwin, 159230) failed!


... is usually produced by the Mac permissions problem. There's no evidence of a BOINC Manager upgrade or restoration from backup that I can see, so if you can't get any models to start after a reboot then the next thing would be a project reset (or remove) to clear out all the downloaded files. Don't do that until the running models finish otherwise they'll be aborted.
ID: 49045 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 167
Credit: 5,744,442
RAC: 0
Message 49046 - Posted: 5 May 2014, 19:46:44 UTC - in response to Message 49044.  
Last modified: 5 May 2014, 19:56:13 UTC

ID: 49046 · Report as offensive     Reply Quote
Professor Desty Nova
Avatar

Send message
Joined: 19 Sep 04
Posts: 92
Credit: 1,789,453
RAC: 0
Message 49059 - Posted: 7 May 2014, 6:40:17 UTC

Another bad batch...? :-(


Professor Desty Nova
Researching Karma the Hard Way
ID: 49059 · Report as offensive     Reply Quote
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1278
Credit: 15,779,438
RAC: 0
Message 49062 - Posted: 7 May 2014, 10:21:09 UTC - in response to Message 49059.  

Another bad batch...? :-(

No. It's the same permissions problem that Iain suggested, this time on the HadCM3N worker process:

execl(/Library/Application Support/BOINC Data/projects/climateprediction.net/hadcm3n_um_6.07_i686-apple-darwin, 131270) failed!
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 49062 · Report as offensive     Reply Quote
Professor Desty Nova
Avatar

Send message
Joined: 19 Sep 04
Posts: 92
Credit: 1,789,453
RAC: 0
Message 49067 - Posted: 7 May 2014, 20:58:37 UTC
Last modified: 7 May 2014, 20:58:54 UTC

I was kind of asking/stating because I had just downloaded a RAPIT unit, and it crashed after a few seconds (on windows).

Stderr:

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
The device does not recognize the command.
(0x16) - exit code 22 (0x16)
</message>
<stderr_txt>

Model crashed: INITTIME: Atmosphere basis time mismatch


Professor Desty Nova
Researching Karma the Hard Way
ID: 49067 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7233
Credit: 23,154,247
RAC: 170
Message 49068 - Posted: 7 May 2014, 21:31:54 UTC - in response to Message 49067.  
Last modified: 7 May 2014, 21:38:42 UTC

There are several reasons for crashes, some of which are to do with the computer in question, and others which indicate a problem with the data set.
So it's necessary to see/quote the error to know which is which.

The error INITTIME is one that means there's a problem with the data files.

PS
3 that I'm running DON'T have a problem, but they're all re-sends, so the others that tried to run them have computer problems.
ID: 49068 · Report as offensive     Reply Quote
rbpeake

Send message
Joined: 27 Feb 08
Posts: 38
Credit: 1,362,760
RAC: 0
Message 49071 - Posted: 8 May 2014, 19:10:15 UTC

It appears that approximately 3,000 of these RAPIT units were taken off the server in the past few days. Are the ones I am crunching still valid?

Thanks!
Regards,
Bob P.
ID: 49071 · Report as offensive     Reply Quote
Profile Ananas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 49072 - Posted: 8 May 2014, 19:32:21 UTC - in response to Message 49071.  

Have a look at this thread, I checked your active ones and didn't find that message so yours are still needed.
ID: 49072 · Report as offensive     Reply Quote

Message boards : Number crunching : RAPIT tasks failing after few seconds

©2020 climateprediction.net