climateprediction.net home page
Error 22 on machine that successfully ran same WU type in April.

Error 22 on machine that successfully ran same WU type in April.

Message boards : Number crunching : Error 22 on machine that successfully ran same WU type in April.
Message board moderation

To post messages, you must log in.

AuthorMessage
marmot

Send message
Joined: 12 May 05
Posts: 34
Credit: 1,357,324
RAC: 875
Message 60186 - Posted: 22 May 2019, 11:14:50 UTC

The only change to the machine is an added RX 550 and it's downclocked from 24x to 8x due to warming weather.
No OS changes but the AMD driver.
Running at 31.5 GB commits of 32GB RAM and 28.7GB occupied private/working.
All 32 threads in use.
Plenty of free disk space.
Currently also running, and turning in valid results for, Amicable Numbers(GPU+cores), Sixtrack (LHC), RakeSearch and SRBase long.

Did some requirement change for this WU type?

From machine: https://www.cpdn.org/cpdnboinc/results.php?hostid=1347460
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
The device does not recognize the command.
(0x16) - exit code 22 (0x16)</message>
<stderr_txt>

Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048

Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048

Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048

Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048

Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048

Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048
Sorry, too many model crashes! :-(
02:12:54 (1396): called boinc_finish(22)

</stderr_txt>
]]>

ID: 60186 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 60187 - Posted: 22 May 2019, 12:46:42 UTC - in response to Message 60186.  

Model crashed: ATM_DYN : INVALID THETA DETECTED


That's a "not allowable physics value" error. (The CO2 levels jumped to 100 times normal, or the atmosphere disappeared, etc.)

So it's not an error as far as the research is considered.
They now know that the starting values used in that model run lead to an instability.
ID: 60187 · Report as offensive     Reply Quote
marmot

Send message
Joined: 12 May 05
Posts: 34
Credit: 1,357,324
RAC: 875
Message 60196 - Posted: 22 May 2019, 21:57:05 UTC - in response to Message 60186.  

Model crashed: ATM_DYN : INVALID THETA DETECTED


They now know that the starting values used in that model run lead to an instability.

So it's and error from data set variable starting values.


This seemed to be a configuration error, but if this error can occur from data set conditions, then all is fine and just keep crunching.
<![CDATA[
<message>
The device does not recognize the command.
(0x16) - exit code 22 (0x16)</message>
<stderr_txt>



This particular WU is hard to get and it's disappointing that it ended so quickly. Thanks for the response.
ID: 60196 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 60197 - Posted: 23 May 2019, 1:15:51 UTC

ATM_DYN = Atmospheric dynamics

A quick search brings up these:

The University of Edinburgh - Atmospheric-Dynamics

Columbia University - Atmospheric Forces, Balances, and Weather Systems

I don't know how relevant they are to our work, as my mind rapidly goes blank when hit with this sort of stuff.

But the info is out there for those that want to know more.
ID: 60197 · Report as offensive     Reply Quote
marmot

Send message
Joined: 12 May 05
Posts: 34
Credit: 1,357,324
RAC: 875
Message 60214 - Posted: 27 May 2019, 4:47:33 UTC - in response to Message 60197.  

Just have to add that there was no error on the user-client side. Nothing the BOINC user has any control over.

The work unit is at worst invalid because of a failure in the data set. And even the failure of the model because of initial conditions is something learned. A failed experiment can still teach the scientist something about their research.

As such, these work units should complete as invalid WITH credit given.

The WU did take up minimum 30 minutes of a slot that another project could have had reserved time for.

Worked on 170+ different work units now and can't remember another WU end as a 0 credit error because the calculation ended in a null result.
This would be akin to assigning 0 credit because we didn't find a prime number in a SRBase search.
ID: 60214 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 60223 - Posted: 27 May 2019, 21:12:19 UTC - in response to Message 60214.  

Credits are awarded each time data is received from a task (unlike other projects, which require completed tasks). Your task apparently failed to report the first reporting point. Sorry about that - we all lose some minutes of processing that way...
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 60223 · Report as offensive     Reply Quote
marmot

Send message
Joined: 12 May 05
Posts: 34
Credit: 1,357,324
RAC: 875
Message 60233 - Posted: 28 May 2019, 8:51:08 UTC - in response to Message 60223.  

Credits are awarded each time data is received from a task (unlike other projects, which require completed tasks). Your task apparently failed to report the first reporting point. Sorry about that - we all lose some minutes of processing that way...



I understand.

How hard would a script that recognized "Model crashed: ATM_DYN : INVALID THETA DETECTED", awards a base 100 credit for the failed model, then lists these WU's as invalids, be?

Guess the researchers are getting their Invalid Theta percentages, and scrutinizing other various failures, from a separate script that gathers statistics on all failed and invalid WU's.

It's just a thought from the standpoint that people getting the error won't waste time at helpdesk diagnostics trying to discover some issue with their machines. Minimal credit and marked invalid; people might just say "huh, that's odd" and not bother the helpdesk staff (like I did).
ID: 60233 · Report as offensive     Reply Quote

Message boards : Number crunching : Error 22 on machine that successfully ran same WU type in April.

©2024 climateprediction.net