climateprediction.net home page
Posts by Dave

Posts by Dave

1) Questions and Answers : Windows : model time resets (Message 27922)
Posted 16 Apr 2007 by Dave
Post:
The models all have unique names.
And so do the data sets, and workunits.
You can see the Result ID and Work unit ID here.

Then, if you click on each of the Result IDs, you can see the names of the models as they will/would appear in the Tasks tab of your Manager.

The first one was: hadcm3ohe_1w5e_05736660_0
and the second is: hadcm3inct_cn4s_1920_160_05865021_0

This new one is one of the just released models, which write to the HD a lot less then the earlier models, and has a different name structure, which can be \'decoded\' as per a post in this thread, 4th post from the bottom.

All of the messages are stored as an archive in the file: stdoutdae.txt, which is in the BOINC folder.






Thank you for your response. I am amazed at the data that has been computed for this model. When I went back following your links, I noticed the last trickle was sent on 13 April and my model crashed on 14 April, so not too much data was lost. Another post mentioned the \"Negative Pressure Value Created\" and I noticed that as I was skimming down the file. So I can probably safely assume that if it crashes again for some reason, I probably won\'t lose much data then either. At least I now know where I can review the messages.

Thanks for your assistance,

Dave R.
2) Questions and Answers : Windows : model time resets (Message 27911)
Posted 15 Apr 2007 by Dave
Post:
The reason it crashed was due to \'negative pressure value created\'. This is when the climate of the model ceases to be physically possible.

This can happen for two reasons:

* The initial setup of the model doesn\'t lead to a viable climate, which causes the model to drift off course. One of the major goals of the project is to find out which parameters are viable, and which aren\'t - borderline models are therefore very interesting to the researchers. I note that the temperature of your model is on the high side.
* Alternatively, the PC might be a little unstable and has been introducing corrupt floating point calculations into the model. If you want to check your CPUs stability and confirm it\'s OK, then try running Prime95\'s torture test for 24 hours or so.

In general it\'s not worth restoring \'negative pressure\' models from backup, because they\'ve usually modelled as far as they can (i.e., they\'re already complete). If it turns out that you have an instability problem, and you are able to fix it (for example, by reducing an overclock slightly, or improving airflow in the system), then it may possibly be worth it.

My feeling is that in this case, you have a stable PC but the model\'s parameter set means that it can only model so far.

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6212942

Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA

Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA

Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA

Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA
Fatal crash! :-(



Thank your for the reply,

If I understand you correctly, the program may have terminated due to modeling either outside parameters that were empirically impossible or that the parameters used may have given results that were empirically impossible. Is this a general understanding of the \"negative pressure value created\" condition? If so, then would the data I am now working be a reset of the initial conditions and computing the same model over again - only to crash later? Or would the project manager download a new set of data so that I would be working a different set of parameters?

There was a lot of communication yesterday when I looked at the messages. They are gone now, overwritten or just dumped, but they seem to start with an error 161? Then is seemed a lot of contacting the server, and it seemed that the last parts were requesting data. I gather that the model ceased operation because of this error (don\'t know though), then either tried to restart, reset, or retrieve new data. I had just skimmed it but didn\'t get look at it real close.

I had backed up the previous night, but this happened at about 11:30 last night and my daily auto backup occured just 30 minutes later at midnight. I close the program and everything else, allow the backup to run, and then restart all the programs after the backup concludes (about 3 - 5 minutes. So I do have the backup through T%hursfday night, Friday\'s computation is after the program crashed. I can restor it up to that point, but if the other commenter to my dilema is correct, there are times when data sets are uploaded while a program is still computing data. If that is true, then it is a good idea against complete data loss and I don\'t have too much to make up.

I have 4 other programs running and had reset my program workload for the Climate prediction to be double any of the other programs, ruducing their run time as I could see my model would not complete by the deadline of Dec 21, 2007. The new deadline is March 26, 2008. By doubling the amount of time spent on this model, I would have made the deadline. Leraving the current workload at double the others (200 vs 100) this new deadline for the 2288 hr model will be manageable. My only question is, is this new data or the same set I was working on before? If it\'s the same set, won\'t the crash happen again at the same point? making this computation a waste of time? How do I know if it is the same data set or if it\'s a new set of parameters?

Thank you for your assitance.

DS Russell
3) Questions and Answers : Windows : model time resets (Message 27910)
Posted 15 Apr 2007 by Dave
Post:
Dave, I hope you have your daily auto backups set up in such a way that you have exited from boinc before the backup begins? Because backups done with boinc running can\'t be successfully restored. If your auto backup system can\'t exit from boinc first, it would be worth doing a separate manual backup of the complete contents of the boinc folder once or twice a week.

Through my sig you can get to the READMEs where there\'s lots of info on backups.

Your crashed model will have sent trickles to Oxford once every model year, plus bigger file uploads at the end of each decade, so even if the model can\'t be saved, the data sent will be used by the researchers. Ie it was computer time well spent.



Thank you for your response. Yes, my auto backups are done after BOINC has been closed. In fact, since I need to have an active Internet connection, I also close that down as welll as everything else that I need to back up. I believe that the backup will skip any data that is open. I just shut down all things that are to be backed up a couple of minutes before the backup is to begin, then 5 minutes later, when the restore points show up in the backup log, I restart the BOINC & Internet.

I don\'t know if the climate model restarted from the beginning or if it closed the crashed model and requested a new set of data points. In any event, the timer was reset to 2288 hrs. In other words, am I recomputing the same data I did for the last 2 months, which would then crash at the same point again? Or am I working on a fresh data set, in which case I needn\'t do anything but just let it plod along? If I did restore, I would also be overwriting the other 4 programs that are interwoven with the project manager.

If there are intermediate points that the project manager uploads, then I probably won\'t have too much loss. There is a lot of communication that went on at the time of the reset. It looks like it started with an error 161?, but the communication has since been lost or overwritten. I kind of skimmed through it and it seemed like it was sending data and retrieving new data points. But not really knowing how this process works and relying on the project manager to keep things neat and tidy and timely, I don\'t know which it was doing.

What are the chances that I am redoing the climate prediction data sets I had already completed?

thanks for your input,

DS Russell
4) Questions and Answers : Windows : model time resets (Message 27884)
Posted 14 Apr 2007 by Dave
Post:
I have been running the Predictikon.net since January on a long term basis. After more than 2 months, 400 plus hours of the 2288 hours, the program suddenly decided to reset itself back to 2288 hours and none completed. I have backups every night so it would be recoverable back to last night. As of today, 20% had been computed.

The messages tab has a many lines of dialog starting with an unrecoverable error for result at 11:33 tonight (30 minutes before my daily auto backup begins) and ending at 11:37pm. It was downloading several sets of data.

Please advise. should I restore from last night\'s backup or what may have happened. I could probably send the messages that were made at this time. Is there some interim time when completed data segments are uploaded?

And who should I be talking to?

Please advise.

Thank you,

DS Russell
5) Questions and Answers : Windows : Phase determination (Message 27205)
Posted 8 Mar 2007 by Dave
Post:
Go to Manager\'s Tasks tab, highlight a Model, click on Show Graphics. That brings up another window.
Hit the \"Z\" key to get rid of the overlay and you\'ll see what you want. (There is now only one Phase, one BIG Phase.)
\"X\" the graphics window off when finished. (It\'s expensive to run.)



Thank you for your info. I shall try this when the model is running again (currently preempted)
6) Questions and Answers : Windows : Percent project allocation adjustment (Message 27201)
Posted 7 Mar 2007 by Dave
Post:
How can I adjust the percent allocation for the projects? Currently, I am handling 5 projects simultaneously, all at 20% time allocation. I would like to be able to manually override the default percentages to allow more time to be given to the climate prediction model. I need to be able to complete at least 10% per month in order to end by the due date of December 21, 2007.
7) Questions and Answers : Windows : Phase determination (Message 27200)
Posted 7 Mar 2007 by Dave
Post:
How can I determine what phase or project step I am currently performing calculations for? The information pages describe them but do not indicate how we can determine which one we are doing.




©2024 climateprediction.net