1)
Questions and Answers :
Windows :
Comments for \'Generic solutions to models\' sticky
(Message 25513)
Posted 8 Dec 2006 by old_user212146 Post: Sorry about this - my final post of the night I promise :-) Does the checkpoint handle completely unexpected results, i.e. if i suddenly had a powercut would it recover from the last checkpoint (my pc is similar spec to yours so from about the previous 30 minutes worth of processing)? This is as opposed to an out of bounds value turning up in the data. If it does this why would it ever restart back at 1920? If it does this because it has reached a data dead end (i.e. the model has realised it is spiralling unrealistically out of countrol and decides to abort) is the data produced up to that point useful? If the problems are being caused by PC hardware failure you would expect the checkpoint system to be able to rewind to the recent checlpoint position and just carry on regardless - hardware problems are extremely unlikely to occur in the same processing place and so you are always going to steadily increase your progress (albeit with the odd slight rewind). Thanks for your patience in answering my questions - I am running the Prime95 Torture Test at the moment to ensure I have no hardware issues to explain my resets and as i say i will be attempting to do backups at regular intervals. It is way past my bedtime (1:45am) and I need sleep now. |
2)
Questions and Answers :
Windows :
Comments for \'Generic solutions to models\' sticky
(Message 25511)
Posted 7 Dec 2006 by old_user212146 Post: \"smaller chunks of work\" will hopefully become possible sometime next year, once some code is written and tested. This is what the \"restart dumps\" every 40 years are for. of course - this is understood - business as usual tends to take precedence over this sort of thing.
agreed - but not everybody - and as was posted here - many people who have had problems have simply not bothered to post. I myself have crashed as i say and my pc is pretty stable 3.4ghz dual cpu with 1gb ram + 1 terrabyte of disk storage. It may not have been your intention but the responses came across as \"well it can\'t be us it must be your fault\" which is going to get peoples backs up a bit.
the joy of users
I will get around to this at some point soon
mmmm - well - my definition of regular would be at least once per day. If the last known checkpoint can get destroyed then it is not an effective checkpoint and how it is done needs to be rethought. I work in the telecoms industry now where we process realtime telephone data (10\'s of millions of items of data every day) - everytime a process crashes it HAS to know how to recover otherwise people lose revenue (lots of revenue) so i know these procedures are difficult, but not impossible to accomplish.
I understand this - but you have to realise human nature - probably 95% of the people who attach to this project did so thinking - \"ooohhh, that\'s a pretty screensaver and im doing a bit for the environment as well\". It is also a screensaver, which by definition is hands off, you are simply not there when it is running. People are not going to remember to do backups, they just aren\'t going to remember. By insisting on this policy (and with it being unstable as it is it does become mandatory) you are restricting yourself to the IT literate amongst us. I am not having a go at the project - I appreciate what it is trying to achieve but you have to accept that most people are just running this for fun with the hope that it helps understanding of climate change. If it becomes \"difficult\" they are just gonna quit. This is the climiteprediction.net\'s loss as one more person detaches from the project to run one that is more stable. I will try and stick with the project and try and remember to do regular backups but it is the most frustrating screensaver I have ever used :-) |
3)
Questions and Answers :
Windows :
Comments for \'Generic solutions to models\' sticky
(Message 25506)
Posted 7 Dec 2006 by old_user212146 Post: I\'m afraid that I have to agree with the above post. Being a seasoned software engineer for 20 years (5 years of which was at the Met Office in Bracknell) I know that you cannot write code perfectly - If you are talking about 1 million lines of Fortran code (god I hated using Fortran when i worked there) then there are bound to be errors in the code - it is just infeasible that the code created is 100% perfect so blaming peoples pc\'s for the problem is a little harsh, it is more likely that a certain combination of data is forcing the code down an unexpected path and causing a crash. The work packages are enormous - I used to run the project but gave up after having it restart itself back at 1920 3 times. I appreciate that it may be difficult to do but reorganising the code to allow for smaller chunks of work - or at least to fix a \"last known good state\" at regular intervals would greatly increase the amount of results you are receiving. |
©2024 climateprediction.net