climateprediction.net home page
Posts by old_user249784

Posts by old_user249784

1) Message boards : Number crunching : Iceworld (HadSM and HadSM MH) discussion (Message 41063)
Posted 17 Nov 2010 by old_user249784
Post:
OK, thanks! It seems to have run backwards in the last hour!
2) Message boards : Number crunching : Iceworld (HadSM and HadSM MH) discussion (Message 41061)
Posted 17 Nov 2010 by old_user249784
Post:
My project hadsm3dhet2_k2xl_006613451 has developed an Iceworld, but it has reached over 97% (237238/259248), so should I let it run to completion?
3) Questions and Answers : Getting started : \'Client detached\' task (Message 35916)
Posted 14 Jan 2009 by old_user249784
Post:
These are Jeremy\'s tasks.

I think the significant thing is that Jeremy has upgraded today from BOINC 5.10.45 to BOINC v.6 and may not have let BOINC itself migrate the model data correctly. So BOINC has lost sight of two models and is calling them \'detached\'.

Jeremy, can you tell us whether the following sequence of events is correct?

On Monday 12 Jan you had model 7674417 (half completed) and 7679952 (1/3 completed) running with BOINC 5.10.45. 7679952 crashed and the server immediately sent you 7725474 to replace it.

This morning Tuesday just before midday you upgraded to BOINC 6. BOINC 6 didn\'t find the two previous running models 7674417 and 7725474, so the CPDN server sent you two new models (the two at the top of your task list).

Sorry, I got one of those model numbers wrong earlier so I\'ve edited my post.

Is that time sequence correct?

We need to know

* Have you still got a backup of the complete BOINC 5.10.45 folder made after exiting from BOINC and before 7679952 crashed, or after it crashed but before you upgraded?

* Later edit: Could you please also tell us what backup method you used?

(I think it may be possible to rescue one or both of those part-completed models, but only if you made a backup before you upgraded BOINC.)


Unfortunately I had not backed up since late November (I use the copy and paste method). Of course I should have backed up before installing the new version of the BOINC program. This version saves the application data in a different place, unlike the old program. I think in trying to sort the folder problem out I lost the unfinished task. Although I still have the folder for hadsm3fub_k4dk_005970459 I now realise that it\'s impossible to restart it with the folder alone. This will teach me to back up more often...
4) Questions and Answers : Getting started : \'Client detached\' task (Message 35909)
Posted 13 Jan 2009 by old_user249784
Post:
My computer crashed yesterday, and I lost two projects. One has ended with \'Compute error\' (which possibly caused the crash) but the other one says \'Client detached\' (Task ID 7674417). This was nearly 50% complete. Is there any way I can reinstate it? I still have the data folder on my computer. At the same time I have upgraded the BOINC software. But now I am running two completely new tasks... Is there any way of getting back to the uncompleted one??
5) Questions and Answers : Windows : No trickles being sent (Message 34328)
Posted 20 Jul 2008 by old_user249784
Post:
Very useful info, thanks. This means the model\'s progressed 2575 timesteps in more than 7 days which means about 360 timesteps per calendar day. You\'re going to have to put this defective model out of its misery by aborting it. Better luck with your next model and thanks for reporting the problem. I wish more members would post to report anomalies and abnormalities.

Before aborting it you can if you want in the CPDN preferences section of your account select the type of model you download next.

I\'ll send cbamber a private message.

Reminder to everyone. In your CPDN accounts please enable email notification of private messages so if we have to contact you about this sort of problem you\'re more likely to notice the message.


So the problem is in the model? There\'s no point in restarting it from the last back up?
6) Questions and Answers : Windows : No trickles being sent (Message 34320)
Posted 20 Jul 2008 by old_user249784
Post:
Hi again Jeremy

If the graphics globe looks uniformly blue for temps and other uniform colours when you press R for precipitation or P for pressure, the prognosis for this model is dire.

One of the other crunchers you mentioned aborted it. Cbamber is battling on but his/her model hasn\'t trickled for two weeks.

In my experience these slow-processing models can\'t be rescued and have to be aborted. One member, John Hunt, had this happen to a model near the end of a phase and he very kindly let it crunch on for ages so it would upload its end-of-phase zip file and we\'d see what sort of results it produced. From the moment of the slowdown the results suddenly became rubbish, unusable by the researchers. Look at John\'s phase 3 precipitation graph.

These model worlds don\'t really turn to ice, but the graphics look as if they do. It\'s really the processing that freezes up.

Just a couple of things for you to check please before you abort this model and one of us sends a private message to cbamber.

* Is your computer overclocked?

* What sec/timestep does your graphics window now say? (This figure won\'t have slowed to its true current snail\'s pace because it\'s a cumulative average, but if this really is an iceworld the sec/ts should already indicate this.)


No computer is not overclocked.

2.66/tS

Timestep: 164605

7) Questions and Answers : Windows : No trickles being sent (Message 34317)
Posted 20 Jul 2008 by old_user249784
Post:
Two possibilities:
1) It\'s turned into an \"ice world\" Info
2) You interrupted the model at the phase change and before it reached the 1st savepoint in the next phase. (No real help thread, just replies in various places, and warnings in News and Announcements.

My guess is 2) (No more trickles will be registered until the model catches up to where it was.)



The graphics certainly look like an ice world! I\'ve taken a screenshot but can\'t see a way of attaching it. Details are:
Model date: 10/06/2060
hours elapsed: 505:03:50 2.66s/TS
timestep: 164597 out of 259248
Progress: 87.83%

I\'ve looked at the workunit (615042) and notice that two other tasks (7418482 and 7418481) seem to have stalled at the same point.

The last backup I have is dated 15 June but this includes another model running normally and 95% complete.


8) Questions and Answers : Windows : No trickles being sent (Message 34313)
Posted 19 Jul 2008 by old_user249784
Post:
I am running two Slab models. One is progressing fine and has nearly finished. The other, which was started the same day, has started running backwards!! The last trickle was sent on 12 July, and in the last few days the time to completion has been going UP and the progress has been stationery - or actually going DOWN! The task ID is 7418486.
9) Questions and Answers : Windows : Outcome: Client Error??? (Message 33545)
Posted 24 Apr 2008 by old_user249784
Post:
Jeremy, you were surprised that your completed slab model disappeared from the Tasks window of your BOINC manager. This disappearance is normal when a task is reported. But it\'s still on your CPDN web pages. Here it is, with lovely graphs:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6977062

If you select the type of model you want in future in your CPDN project preferences in your account, you won\'t be offered any more unsuitable models.


Thanks for your help. As my computer usually runs for only 10-14 hours a day, I\'m going to concentrate on slab models and make sure that they complete successfully. One last question: when I back up, is it good practice always to keep the last two backups in case the last one doesn\'t work as happened to me?
10) Questions and Answers : Windows : Outcome: Client Error??? (Message 33535)
Posted 23 Apr 2008 by old_user249784
Post:
Let us know what you eventually decide and how you get on.


The crashed project wouldn\'t restart. I decided to abort the coupled project and run two slab models. When BOINC fetched them, I got this message (on both occasions):
Your computer has 1024MB of memory, and a job requires 1464.84MB, although in my preferences I only checked slab models. Presumably an error from the server?

Anyway, the two models are now running.
11) Questions and Answers : Windows : Outcome: Client Error??? (Message 33533)
Posted 23 Apr 2008 by old_user249784
Post:
Let us know what you eventually decide and how you get on.


The slab model has now finished (result ID 6977062) but although the zipped result was transferred last night and progress shows as 100%, the project is still in BOINC Manager as \'in progress\' and \'Ready to Report\'. Why doesn\'t it appear as successfully completed??

Sorry, when I pressed update it reported and disappeared!
12) Questions and Answers : Windows : Outcome: Client Error??? (Message 33437)
Posted 19 Apr 2008 by old_user249784
Post:
Yes I did create a new folder when I tried to restore.


Normally you wouldn\'t need to create a new BOINC folder when you restore a backup. Before restoring the backup you\'d double-click on the BOINC folder to display its contents, then select all the contents, then delete everything. Going back a page, the BOINC folder would still be there, but empty and ready to receive the contents of the backup folder.

But if you do for any reason delete the BOINC folder itself, making a new one shouldn\'t be a problem as long as you make it in the same location as before.

But if you created a new BOINC folder without deleting the contents of the old one, I don\'t think the restore would work. A computer should never have more than one instance of BOINC on it.



I renamed the old folder with a 0 at the end, as advised somewhere on the message boards.
13) Questions and Answers : Windows : Outcome: Client Error??? (Message 33436)
Posted 19 Apr 2008 by old_user249784
Post:
It was probably the \"save while BOINC was running\" that caused the problem.
Xcopy takes a long time to copy everthing, by which time some more files would have been written, making the \"set\" incompatable, and useless.



My slab project (hadsm3fub_e095_005911418) should complete early next week. I am uncertain what to do next. I could try to restore hadcm3inct_ckmt_1920_160_75866945 (but it didn\'t work when I tried before). No one else on the work unit seems to be working on it.

Or I could abort that and resume hadcm3istd_0c3p_1920_160_15936539 - but several other computers seem to be working on that.

Or I could start 2 new slab models.

Any suggestions as to my best course of action?
14) Questions and Answers : Windows : Outcome: Client Error??? (Message 33388)
Posted 17 Apr 2008 by old_user249784
Post:
I expect you know that when you restore a BOINC folder backup, you have to paste it into an empty BOINC folder? You can\'t add the backup to stuff already in the BOINC folder.

I think that if I had that backup of a 160-year model which had already completed more than 20 model years, I\'d save the backup and at some point I\'d try to restore it again. Maybe even in 2009 after the current tasks have completed. But I\'d first check the workunit that the backed-up task belongs to to see whether anybody else\'s computer had run it to completion. If yes, I wouldn\'t spend time replicating another computer\'s efforts. If no, I\'d have a final shot at restoring it.

When I use Les\'s manual backup method, I find that if I\'ve ever forgotten to exit from BOINC first, some of the files in the BOINC folder won\'t copy to the backup folder and I have to start again.

Basically though, as I don\'t think you\'re letting the computer process models 24/7, you\'d be better off only running shorter HADSM slab models when you need new work. You can select what models you get in future in the CPDN preferences of your account.



OK, I have deselected HADCM3 projects. Also, I think running two projects simultaneously did make my (not very new) processor work overtime, judging from the noise it made.

Yes I did create a new folder when I tried to restore. I might try once more when my current project completes.
15) Questions and Answers : Windows : Outcome: Client Error??? (Message 33370)
Posted 16 Apr 2008 by old_user249784
Post:
Oops, my mistake there - I\'d forgotten what it was called. I agree that the simple manual backup method is very useful as one sees exactly what one is doing.



For the record, all that CCEbu.bat does is this (configured for my system):
XCOPY \"d:\\Program Files\\Climate Change Experiment\\*.*\" \"F:\\CCE-bu\\CCE-backup\" /C /M /E /Y. It is not automatic, it has to be started manually.

It is possible that I didn\'t exit from BOINC before running it - I guess that\'s the only explanation. I did try restoring the 14 March backup, but that didn\'t restart the crashed model.
16) Questions and Answers : Windows : Outcome: Client Error??? (Message 33361)
Posted 15 Apr 2008 by old_user249784
Post:
...Is there any way I canretrieve the lost data?

You appear to have two models running: 7340333 (HADCM3-coupled) and 6977062 (HADSM3-slab). The slab model is almost finished, so it may be a good idea to let that run to completion. Make sure the \'no new tasks\' button is pressed in BOINC Manager, so that you don\'t get any new models. The coupled model has only just started - you\'re in for a long haul with that one, since it\'s a 160-year model running (hyperthreaded) with the slab.

The other model you mention, 6685813, is also a coupled model but has made a bit more progress. It\'s reported as crashed, but there is a temporary problem on the Web site that is hiding the error messages - so we can\'t tell what caused the crash. It should be possible to restore it from a backup and continue.

If I were you, I would:

1) Press \'no new tasks\'.

2) Suspend the coupled model.

3) Let the slab model finish.

4) Resume the coupled model.

5) Forget about the crashed model.

However, that\'s just my opinion!

PS If the two computer records on your computer page (here) are actually one physical computer, then it\'s a good idea to \'merge\' them - see the \'merge this computer\' link at the bottom of the computer summary page for each computer.


Many thanks. I\'ve done all that (I had actually already merged the computers). If you find out what the error message was, let me know if there\'s anything I can do to retrieve any lost data or restart the project. I won\'t use the backup facility in the meantime.
17) Questions and Answers : Windows : Outcome: Client Error??? (Message 33357)
Posted 15 Apr 2008 by old_user249784
Post:
I\'ve also had \'Client error\' messages, the latest on 14 March for a project which had been running for about 8 months (result ID 6685813). I back up fairly regularly using CCEbu.bat, but it seems to have been this which caused the model to crash as the backup file of the project in question was timed the same time as the result. Is there any way I canretrieve the lost data?

I\'ve now stopped running the screensaver and will exit before shutting down.




©2024 climateprediction.net