climateprediction.net home page
Sudden Computation error

Sudden Computation error

Questions and Answers : Windows : Sudden Computation error
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user195747

Send message
Joined: 16 Aug 06
Posts: 6
Credit: 4,627
RAC: 0
Message 29094 - Posted: 1 Jun 2007, 5:45:37 UTC

I was busy doing something else and I come back to my computer to check the status of climateprediction.net and the program is stopped due to a \"Computation error\". I have the full BOINC log for today saved to a text file so if any other lines are needed that is fine, but for now I\'ll just post the more interesting:

5/31/2007 11:16:33 PM|climateprediction.net|Computation for task hadcm3pbb_c85t_05844524_0 finished
5/31/2007 11:16:33 PM|climateprediction.net|Output file hadcm3pbb_c85t_05844524_0_2.zip for task hadcm3pbb_c85t_05844524_0 absent
5/31/2007 11:16:33 PM|climateprediction.net|Output file hadcm3pbb_c85t_05844524_0_3.zip for task hadcm3pbb_c85t_05844524_0 absent
5/31/2007 11:16:33 PM|climateprediction.net|Output file hadcm3pbb_c85t_05844524_0_4.zip for task hadcm3pbb_c85t_05844524_0 absent
5/31/2007 11:16:33 PM|climateprediction.net|Output file hadcm3pbb_c85t_05844524_0_5.zip for task hadcm3pbb_c85t_05844524_0 absent
5/31/2007 11:16:33 PM|climateprediction.net|Output file hadcm3pbb_c85t_05844524_0_6.zip for task hadcm3pbb_c85t_05844524_0 absent
5/31/2007 11:16:33 PM|climateprediction.net|Output file hadcm3pbb_c85t_05844524_0_7.zip for task hadcm3pbb_c85t_05844524_0 absent
5/31/2007 11:16:33 PM|climateprediction.net|Output file hadcm3pbb_c85t_05844524_0_8.zip for task hadcm3pbb_c85t_05844524_0 absent
5/31/2007 11:16:33 PM|climateprediction.net|Output file hadcm3pbb_c85t_05844524_0_9.zip for task hadcm3pbb_c85t_05844524_0 absent
5/31/2007 11:16:33 PM|climateprediction.net|Output file hadcm3pbb_c85t_05844524_0_10.zip for task hadcm3pbb_c85t_05844524_0 absent
5/31/2007 11:16:33 PM|climateprediction.net|Output file hadcm3pbb_c85t_05844524_0_11.zip for task hadcm3pbb_c85t_05844524_0 absent
5/31/2007 11:16:33 PM|climateprediction.net|Output file hadcm3pbb_c85t_05844524_0_12.zip for task hadcm3pbb_c85t_05844524_0 absent
5/31/2007 11:16:33 PM|climateprediction.net|Output file hadcm3pbb_c85t_05844524_0_13.zip for task hadcm3pbb_c85t_05844524_0 absent
5/31/2007 11:16:33 PM|climateprediction.net|Output file hadcm3pbb_c85t_05844524_0_14.zip for task hadcm3pbb_c85t_05844524_0 absent
5/31/2007 11:16:33 PM|climateprediction.net|Output file hadcm3pbb_c85t_05844524_0_15.zip for task hadcm3pbb_c85t_05844524_0 absent
5/31/2007 11:16:33 PM|climateprediction.net|Output file hadcm3pbb_c85t_05844524_0_16.zip for task hadcm3pbb_c85t_05844524_0 absent
5/31/2007 11:16:34 PM|climateprediction.net|Deferring communication for 1 min 0 sec
5/31/2007 11:16:34 PM|climateprediction.net|Reason: Unrecoverable error for result hadcm3pbb_c85t_05844524_0 (<file_xfer_error> <file_name>hadcm3pbb_c85t_05844524_0_2.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c85t_05844524_0_3.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c85t_05844524_0_4.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c85t_05844524_0_5.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c85t_05844524_0_6.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c85t_05844524_0_7.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c85t_05844524_0_8.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3pbb_c85t_05844524_0_9.zip</file_name>

By the way, I am running Windows XP and have AVG Anti-Virus always running.
ID: 29094 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 29096 - Posted: 1 Jun 2007, 7:53:41 UTC


The interesting bit is in the error text uploaded with the model:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6361409

Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA

Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA

Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA

Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA
Fatal crash! :-(



NEGATIVE PRESSURE (or sometimes NEGATIVE THETA) is when the model becomes physically impossible, and shuts itself down. It is very unusual for this to happen so early in its life (normally happens in the 2000 - 2080 area), so I suspect a problem with the PC rather than with the model itself.

You can confirm your PCs CPU and memory are OK by running Prime 95\'s torture test for 24 hours or so.

If the test fails, then something is introducing errors into the floating point calculations - perhaps there is too much dust on the heatsink, one of the fans has failed, the power-supply is erratic, there might be an iffy stick of memory, or it\'s overclocked too highly.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 29096 · Report as offensive     Reply Quote
old_user195747

Send message
Joined: 16 Aug 06
Posts: 6
Credit: 4,627
RAC: 0
Message 29112 - Posted: 2 Jun 2007, 2:47:52 UTC - in response to Message 29096.  


The interesting bit is in the error text uploaded with the model:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6361409

Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA

Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA

Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA

Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA
Fatal crash! :-(



NEGATIVE PRESSURE (or sometimes NEGATIVE THETA) is when the model becomes physically impossible, and shuts itself down. It is very unusual for this to happen so early in its life (normally happens in the 2000 - 2080 area), so I suspect a problem with the PC rather than with the model itself.

You can confirm your PCs CPU and memory are OK by running Prime 95\'s torture test for 24 hours or so.

If the test fails, then something is introducing errors into the floating point calculations - perhaps there is too much dust on the heatsink, one of the fans has failed, the power-supply is erratic, there might be an iffy stick of memory, or it\'s overclocked too highly.


Well, I do seem to have odd noises (screeching noises probably coming from the hard drive), but I run a different distributed computing program everyday and have never had any real errors with any of them except maybe I think one time previously with a climate model program (BBC Climate Change Experiment, climateprediction.net, or CPDN Seasonal Attribution Project). I doubt that any of these items could be the problem because I\'ve never had any problems with other distributed projects (I run a new program everyday and run it all day) being run that couldn\'t be solved (the only problem with another program that I can recall is troubleshooting how to set up a firewall to work with World Community Grid). I\'ve also run memory tests on this computer to test the memory for a 24 hour period and it didn\'t come up with any errors. It couldn\'t be overclocked too high either because I haven\'t done any overclocking with this pc. Due to not having problems with other distributed computing projects I doubt that any other possible problems could be the problem because if there were problems there than it would also affect these other projects and would probably cause obvious performance problems (which I don\'t encounter).
ID: 29112 · Report as offensive     Reply Quote
old_user195747

Send message
Joined: 16 Aug 06
Posts: 6
Credit: 4,627
RAC: 0
Message 29115 - Posted: 2 Jun 2007, 6:48:02 UTC

When I came back and checked on the status Pirates@Home was also running (I kept it at still allowing to receive work because I am often on the computer and was going to suspend the climateprediction.net task so it could finish). The only catch to this is that Pirates@Home only started around 55 minutes after the error with climateprediction.net occured. I\'ll stress test with Prime95 just to be sure that the pc isn\'t the problem (by the way one of the distributed projects I run is called \"GIMPS\" (Great Internet Mersenne Prime Search), which uses the Prime95 interface to do its calculations).

In the \"Solutions to models crashing\" thread for this message board one of the items listed says: \"Windows \'time sync\' messages have been mentioned recently as causing \'process exited with zero status\' crashes. Although these are relatively benign, it may be worth trying to reduce their frequency.\". When looking at the Event Viewer in Windows I noticed a warning that was listed that happened about 10:58 P.M., 18 minutes before the model crash. Here\'s the entry in the Event Viewer (Computer name has been taken out to protect my security):

\"Event Type: Warning
Event Source: W32Time
Event Category: None
Event ID: 36
Date: 5/31/2007
Time: 10:58:44 PM
User: N/A

Description:
The time service has not been able to synchronize the system time for 49152 seconds because none of the time providers has been able to provide a usable time stamp. The system clock is unsynchronized.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.\"
ID: 29115 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 29117 - Posted: 2 Jun 2007, 9:02:45 UTC
Last modified: 2 Jun 2007, 9:03:29 UTC


I don\'t think the time problem will have been causing your crashes, at worst it just wastes some CPU time.

The noise from your computer could be from a worn-out fan (they get very noisy when they start to fail). A replacement fan costs around £5 - £10 (look for a \'silent ball-bearing fan\' - these last for a long time and are much better than the ones supplied in most new computers).

Most Boinc projects aren\'t as sensitive to CPDN to transient calculation errors (the errors are culmulative to CPDN because it\'s an iterative process - one error will be repeatedly compounded until it exceeds the limits). On most projects a single error will generally have no effect. But Prime95 will pinpoint whether or not there is a problem.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 29117 · Report as offensive     Reply Quote
old_user195747

Send message
Joined: 16 Aug 06
Posts: 6
Credit: 4,627
RAC: 0
Message 29120 - Posted: 2 Jun 2007, 20:34:05 UTC

Well, I was going to run the Torture Test with Prime95 for 24 hours, but there\'s a bit of confusion as to what I\'m supposed to do. There are four types of options for the Torture Test: (1)Small FFTs, (2)In-place large FFTs, (3)Blend, and (4)Custom. So, the question is: Which option do I use? Also, do I need to have all programs closed when running this test or does it matter?
ID: 29120 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 29121 - Posted: 2 Jun 2007, 22:01:22 UTC


Run Blend if you\'re running one, or Blend+Small if you\'re running two. If you\'ve got a dual-core PC you should be running two rather than one.

You\'ll need to exit Boinc before running the tests, otherwise they\'ll be starved of CPU time. Other programmes don\'t matter so much.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 29121 · Report as offensive     Reply Quote
old_user195747

Send message
Joined: 16 Aug 06
Posts: 6
Credit: 4,627
RAC: 0
Message 29138 - Posted: 4 Jun 2007, 1:05:16 UTC - in response to Message 29121.  


Run Blend if you\'re running one, or Blend+Small if you\'re running two. If you\'ve got a dual-core PC you should be running two rather than one.

You\'ll need to exit Boinc before running the tests, otherwise they\'ll be starved of CPU time. Other programmes don\'t matter so much.


I have an Intel Pentium D Processor on this computer (which means it\'s a dual-core pc) so I ran Blend+Small. Both tests were stopped after 24 hours with the message: \"Torture test ran 25 hours, 48 minutes - 0 errors, 0 warnings.\".
ID: 29138 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 29141 - Posted: 4 Jun 2007, 7:10:47 UTC


That\'s good, and shows that the CPU, chipset, and memory are all working well, which points the finger back at the model rather than your PC.

It is unusual for a model to finish with that error code that early, but I guess yours is \'the exception that proves the rule\'.

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 29141 · Report as offensive     Reply Quote
old_user195747

Send message
Joined: 16 Aug 06
Posts: 6
Credit: 4,627
RAC: 0
Message 29151 - Posted: 4 Jun 2007, 11:38:34 UTC - in response to Message 29141.  


That\'s good, and shows that the CPU, chipset, and memory are all working well, which points the finger back at the model rather than your PC.

It is unusual for a model to finish with that error code that early, but I guess yours is \'the exception that proves the rule\'.


k. so what\'s going to happen now?
ID: 29151 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 29153 - Posted: 4 Jun 2007, 12:43:54 UTC



You can pick up a new model to process (make sure \'no more work\' is not displayed against the CPDN project in the project list).

You have a lot of other projects, so you might find that Boinc won\'t ask for a new model for a while (due to long-term-debt between projects etc).
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 29153 · Report as offensive     Reply Quote

Questions and Answers : Windows : Sudden Computation error

©2024 climateprediction.net