climateprediction.net home page
Posts by Andy Lee Robinson

Posts by Andy Lee Robinson

1) Message boards : Number crunching : Error code 22 / Missing Data In Ocean UV Field (Message 31690)
Posted 13 Dec 2007 by Andy Lee Robinson
Post:
I think anyone taking on these tasks should be committed to seeing them through, and dedicate whole machines to them.

1000 day limit? I hate arbitrary limits because they will always bite someone somewhere! M$ is famous for that crime!

Yes, I don\'t anticipate having a \'wrong\' deadline will cause any problems. The machine is dedicated and CPDN runs completely unobtrusively so can largely forget it\'s even there.

Hopefully this thread will help others with similar questions.

Cheers,
Andy.
2) Message boards : Number crunching : Error code 22 / Missing Data In Ocean UV Field (Message 31684)
Posted 13 Dec 2007 by Andy Lee Robinson
Post:
Andy, the deadline for your new model says 2012 on your computer\'s results page:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?hostid=551091

Could you please tell us again what the deadline says in the Tasks window of BOINC manager?

(We know there\'s a problem about deadlines of some new models.)


Ah, thanks for the pointer - yes, looks fine there...

Here is my CPDN task list as shown via BoincView via an ssh tunnel to the webserver running Crunch3r\'s client 5.5.



The task is highlighted in red because it thinks it has expired, and note also the erroneous 1.18 s/TS - the machine is a 4200+, not an overclocked Core2!
These errors could arise from the workunit, client or boincview and therefore probably not so easy to identify.
Having said that, the other task looks to be reporting OK even though it is a particularly demanding one.

Cheers,
Andy.
3) Message boards : Number crunching : Error code 22 / Missing Data In Ocean UV Field (Message 31646)
Posted 10 Dec 2007 by Andy Lee Robinson
Post:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6910804

I just had the same problem on a WU that managed to get from 2000 to 2065. No problems with the machine - it is a production webserver!

I take daily backups using rsync, but I don\'t think it is worth restoring because it\'ll probably just do the same thing again, and the machine has also just downloaded a new WU. Better to make sure that all the models have all the data they will need during their lifespans!

Curiously, the deadlines for that one and the new job is backdated to 2002, surely a mistake? I think the validator ignores missed deadlines because the data is too valuable to just throw away, but backdating WUs in this way pushes BOINC into EDF mode. It isn\'t a real problem, but perhaps it should be looked at to avoid more chatter!

Cheers,
Andy.
4) Questions and Answers : Unix/Linux : Fatal error in last minute of WU, but still reports success. Admin, please examine! (Message 29125)
Posted 3 Jun 2007 by Andy Lee Robinson
Post:
Nice idea but it can\'t be done. CPDN awards credit as the Run progresses. It is intended that all boinc Projects award the same amount of credit for equal amounts of work. So...

Well, in principle yes, but in practice I suspect a little more generosity wouldn\'t go amiss as these WUs are about 1000x longer than any others and require a lot of patience, commitment and stamina to see through!


If a CPDN Run bombs somewhere along the way, the participant still gets par-value credit for work done.

Yes, but this is also a negative thing which doesn\'t give so much incentive to take care of the task!


If a Run finishes, full credit will have been given. If CPDN then tossed-in a bonus, the theoretical balance among Projects would be skewed. It might draw additional participants to CPDN but I doubt it would please leaders of other Projects.

Well, perhaps avoiding churn and keeping the existing participants interested may be more significant than bringing in new ones that then just drop out after a while!


Congratulations on your success. Your effort contributed significantly to the science. Thanks for participating and I hope we see you around for more. (Note: New options are being tested, shorter-running than the current Coupled Model. Other Models are in various stages of planning and development, so it will be an interesting place to be for quite awhile.)


Thanks - I have a nice warm fuzzy feeling now at actually having got all the way through two of these monsters... :-) the sulphur runs last year were much shorter, but I still had difficulty keeping a machine stable enough to run continuously while trying to survive occasional power outages, developing applications which could do all sorts of unpredictable things, and rendering animations etc.

I think a greater degree of granularity would help overall, say distributing 10 year pieces - you can combine them as they come in, though there isn\'t the same magnitude of satisfaction on completion! ;-)
Also, optimisation for the significant numbers of SSEn+ enabled processors (of course without losing sight of accuracy) and maybe even a PS3 version, which I think would be a major feat! Conceivably they could do a WU in about a week, if single precision could be fudged to produce acceptable results, though would still be useful in double precision mode. I guess the next version of the Cell will do DP just as quick as current SP anyway, so worth a thought!

I\'m very concerned about climate change, and look forward to learning about your developments and of any improvements in model capability and code optimisation.
5) Questions and Answers : Unix/Linux : Fatal error in last minute of WU, but still reports success. Admin, please examine! (Message 29097)
Posted 1 Jun 2007 by Andy Lee Robinson
Post:
Yes Andy, it did finish, well done - result and graph here. Version 5.15 of the climate software shocks everyone by reporting every single error message since the beginning of the model, when the model completes! Looks as if you restored it from a backup at some point? (If so, well done for that too!)

Now that I look at it again, you haven\'t actually been granted the usual amount of credit for it, so maybe there\'s a missing trickle or something, but your graph is certainly showing a complete run. ;-)


Thanks very much for your reassurance - I have another one on the other core to finish in 5 hours time, so looking forward to that too!
I\'m surprised that it didn\'t seem to upload everything on completion.

Yes, it is quite an achievement to actually complete a WU, I tried a few times on my overclocked Core2, but after a few weeks a crash would happen, something would get corrupted and the WU would abort :-(
This time I ran it on my linux production webserver which is quite lightly loaded and stable, (as it has to be!) and the WUs survived. I tried to just leave it alone as much as possible, and not even sneeze in the general vicinity!
It might be a good idea to award a substantial credit prize on successful completion.

I hadn\'t restored a backup on the machine, but upgraded the kernel a few times so requiring a reboot.

Once the last 5.15 WU has completed, should I detach and reattach to clean out the folder and prepare for the new app?

Cheers,
Andy.
6) Questions and Answers : Unix/Linux : Fatal error in last minute of WU, but still reports success. Admin, please examine! (Message 29079)
Posted 31 May 2007 by Andy Lee Robinson
Post:
I\'ve just finished one which reported success but the details in the results file suggest otherwise, and I didn\'t see anything uploaded.

3 months processing and all this in the last minute...
I\'d like to know if it really is OK, and if the files can be salvaged and uploaded somehow.

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6312426

<core_client_version>5.5.0</core_client_version>
<stderr_txt>
(null): cannot open input file dataout/atmos_restart.day
(null): cannot open input file dataout/ocean_restart.day
... [deleted] ...
pp2netcdf crashed: Error in getting file type
Error in converting file dataout/b6hcfo.pjk6c10 to netcdf format.

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/b6hcfo.pik6c10 to netcdf format.

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/b6hcfo.pfk6c10 to netcdf format.

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/b6hcfa.phk6c10 to netcdf format.

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/b6hcfa.pgk6c10 to netcdf format.

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/b6hcfa.pek6c10 to netcdf format.

pp2netcdf crashed: Error in getting file type
Error in converting file dataout/b6hcfa.pdk6c10 to netcdf format.

(null): cannot open input file dataout/ocean_restart.day

Model crashed: umshell1.f: READ_FLH: I/O error
(null): cannot open input file dataout/ocean_restart.day

Model crashed: umshell1.f: READ_FLH: I/O error
(null): cannot open input file dataout/ocean_restart.day

Model crashed: umshell1.f: READ_FLH: I/O error
(null): cannot open input file dataout/ocean_restart.day

Model crashed: umshell1.f: READ_FLH: I/O error
Fatal crash! :-(

</stderr_txt>





©2024 climateprediction.net