climateprediction.net home page
Models are crashing.

Models are crashing.

Questions and Answers : Windows : Models are crashing.
Message board moderation

To post messages, you must log in.

AuthorMessage
solskinn

Send message
Joined: 6 Sep 05
Posts: 24
Credit: 21,529
RAC: 0
Message 30449 - Posted: 9 Sep 2007, 1:37:11 UTC

There has been a little while since I have been
running the ClimatePrediction model because of technical problems with my PC. However I do have returned a number of results in the past.

I do notice from the results page on the server that most of my models has crashed before finishing up.

Since I am running a PC with an Intel 3.4 GHz processor, I certainly contribute to the project.

I would be happy if someone could tell me that my results have been of any help. If not, my PC may be running software or otherwise being not fit to run the ClimatePrediction models.

Thanks in advance !

Regards,
solskinn, Norway
Account number 96823

ID: 30449 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 30451 - Posted: 9 Sep 2007, 3:53:47 UTC
Last modified: 9 Sep 2007, 3:55:37 UTC

Hi Solskinn, welcome to the forum.

Your computer specifications look good. Lots of memory. Here are your models:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/hosts_user.php?userid=96823&show_all=0&sort=rpc_time
Click on All hosts to see all the models (results).

Nearly all the models have crashed with error codes -107, 1 and -161. These are caused by problems on your computer and you can avoid most of the problems.

In my signature there\'s a link to the project READMEs. In the README about Crashes and other problems, item #5 by MikeMars gives advice on how to avoid these problems. At the moment the moderators are editing and adding to this item and it isn\'t publicly available yet. The edited version is better and more complete. I think it will help you. In a second post I\'m going to copy the version we are editing. Some parts are in italics because of the edits.

It will be a long post and you need to read all of it.

Cpdn news
ID: 30451 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 30452 - Posted: 9 Sep 2007, 3:56:07 UTC
Last modified: 9 Sep 2007, 3:59:02 UTC

This is MikeMarsUK\'s text

There are a number of common errors which cause many people problems. The first is the Windows Stop message (appears as a Microsoft Send / Don\'t Send dialogue, and -1073741819 in the log), the second is a -161 error in the log, and finally there is an error code 22.

Unfortunately the \'-161\' and 22 errors mask the underlying error (-161 simply means that the model ended without results to upload, and the error code 22 seems to be something to do with how the work unit deals with other errors).

When you get one of these \'generic\' errors, the first thing to do is to take a look at the model\'s server web page. To find this, click \'your account\', \'results\', and then select the result which crashed.

The reason for the crash sometimes appears near to the end of the section \'stderr out\', prior to any -161 errors. For example, NEGATIVE PRESSURE VALUE CREATED indicates that the model reached an impossible climate, and shut itself down. This can be caused by overclocking, bad memory, or in many cases simply because the initial starting parameters for the model will never lead to a viable climate.

In the absence of a clear reason for the model failing, we can only make the following general suggestions:


* Firstly you need to realise that a crash is not a disaster, even if you have no backup. The coupled model (HadCM3) uploads climate data at intervals:

- A summary every year
- A more detailed summary every 10 years
- A \'restart dump\' every 40 years (1960, 2000, and 2040).

The scientists will have the data so far, and if a \'restart dump\' was uploaded, then someone else may be able to continue running the model from that point.

The Slab model (HadSM3) uploads climate data at the end of each of its three phases.


* However, since it is far more satisfying completing your own model, we advise everyone to back up their climate models. The HadCM models take upwards of 4 months to complete (over a year on some computers), running 24/7. It is fairly likely that something will go wrong on the computer during such a long period.

Make backups at least once per week; it only takes a few moments. See http://www.climateprediction.net/board/viewtopic.php?t=5895 for information about backups.

In some cases restoring from backup will work (where the crash was caused by transient problems on the PC, i.e., code -107... errors, error code 0, and error code -1), but in others the restored model is doomed to fail. If you\'re not sure whether restoring the model is a good idea, then ask on the forum.


* If you see a Microsoft Send/Don\'t Send dialogue, don\'t select anything until you have gone into boinc and selected \'exit\' from the menu. Hopefully the model will restart from the previous checkpoint rather than giving up and crashing.


* If you use Norton or Sophos antivirus, exclude the boinc project directory from the automated scan. Norton is the cause of many models crashing, because it locks files aggressively, whereas Sophos incorrectly identifies one of the key files as a worm (known as a \'false positive\' in the trade). See http://www.climateprediction.net/board/viewtopic.php?t=2895 for more information.


* Before playing games or other heavy duty applications (high CPU or memory usage), set \'no more work\' against the project and \'suspend\' the model - that way they won\'t tread on each other\'s toes. Sometimes simultaneous use of graphics drivers from two different programs seems to cause problems.


* Before carrying out antivirus scans, exit from boinc. Do not just suspend the model. Exit by right-clicking on the system tray icon (lower right of screen) and selecting Exit. You may need to disable automatic scheduled AV scans if these would run without exiting from boinc.


* Windows updates have occasionally caused problems. It is wise to take a backup before downloading them or at least to exit from boinc first.


* Never end the model process or the model globe process or boinc manager using the End Task or End Process buttons in Task Manager. If you have a frozen screen, first exit from boinc via the boinc icon in the system tray. Then deal with the frozen screen.


* Before shutting down or restarting the computer, first suspend the model and then exit from boinc by right-clicking on the system-tray icon and selecting Exit. Wait until the icon disappears before going into the Start menu.


* If you are running your model 24/7, reboot the computer at least weekly.



* Similarly, turning off the screensaver will reduce the chance of crashes, and will also save a lot of CPU time. On a computer with integrated motherboard graphics, displaying the screensaver can take up to 50% of CPU time.

To disable the screensaver:

Right-click on the desktop
Click Properties
Select Screensaver
Select None

Anyone finding that the model interferes with normal use of the computer should first disable the screensaver - it\'s easy to do and often helps. View your globe instead using the View graphics button in boinc manager. If you have previously suffered a -107 error, avoid maximising this globe graphics window.


* If you have suffered a -107 or -1 error code you should update your graphics card driver. This is a free update from the card manufacturer. Even a new computer may need this. For further details and instructions see http://bbc.cpdn.org/forum_thread.php?id=1038.



* Overheating can cause instability and shorten the life of your computer. Cleaning out dust from the motherboard and fans often helps if this is the case. Make sure all fans are working OK. Machines are often supplied with noisy unreliable fans without ballbearings. If you need to replace one, it\'s quite an easy job, but make sure you buy an \'ultraquiet ballbearing\' fan. There is a program called \'Everest\' which can tell you your CPU and motherboard temperatures on a lot of systems. 50c is the recommended maximum for AMDs, and 60c for Intel. For more information, see http://www.climateprediction.net/board/viewtopic.php?t=2124.


* Run a stability test on your machine, I recommend Prime95\'s torture test. Run it for about 24 hours, one copy per CPU core. If this runs without error, then it indicates that your PC\'s hardware is very stable, and any problems are more likely to do with software or the model. For more information see http://www.climateprediction.net/board/viewtopic.php?t=2126.

Regarding overclocking, some do it and have stable machines, others do it with disastrous results (literally, by cooking something). Whenever overclocking, or changing timings on memory, a torture test should be considered mandatory. Note that Seti and so forth don\'t stress the machine enough to be a useful stability check.

People who are getting errors after overclocking their machines usually find that relaxing memory timings, reducing CPU MHz or improving system cooling will help.



* Watch out for firewall messages at the same time as when the model crashed. If your network connection or firewall is unreliable, it may be best to select \'suspend network\' from the boinc manager, and then manually allow it about once per week, to let trickles upload to the servers. This is also a good approach for people using dialup connections rather than broadband.


* Windows \'time sync\' messages have been mentioned recently as causing \'process exited with zero status\' crashes. Although these are relatively benign, it may be worth trying to reduce their frequency.


* The benchmark boinc runs every 5 days can cause the model to fail (See http://www.climateprediction.net/board/viewtopic.php?t=3965)


* The memory requirement for XP machines is now 256MB per HadSM model, 512MB per HadCM and 1GB per HadAM (Seasonal) model, or 1.5GB for two Seasonal models running in tandem. Vista needs an extra 512MB. (See http://www.climateprediction.net/board/viewtopic.php?t=3888). Running with insufficient memory may cause slowness and excessive work for the hard disk.

Some people have managed to run a coupled model (HadCM3) with only 256Mb RAM, but it\'s pushing things to the very limit. In this situation I\'d advise frequent backups.

Try running a different type of model instead - for example, the Slab model (HadSM3) uses a lot less memory. This is set via Your Account / CPDN preferences / View / Edit / tick HadSM3 and untick HadCM3.


It\'s worthwhile scanning through the various README files for more information.


Cpdn news
ID: 30452 · Report as offensive     Reply Quote

Questions and Answers : Windows : Models are crashing.

©2024 climateprediction.net