climateprediction.net home page
\"Time remaining\" keeps counting up

\"Time remaining\" keeps counting up

Questions and Answers : Windows : \"Time remaining\" keeps counting up
Message board moderation

To post messages, you must log in.

AuthorMessage
Roel

Send message
Joined: 24 Feb 08
Posts: 9
Credit: 876,602
RAC: 0
Message 32795 - Posted: 29 Feb 2008, 14:36:11 UTC

I have two models running on the same pc. One behaves normal: \"elapsed time\" increases while \"time remaining\" decreases with about the same amount of time. In the other model \"time remaining\" is increasing with about the same amount of time as \"elapsed time\" does.
Am I the victim of an endless loop or is there a solution?
ID: 32795 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 32796 - Posted: 29 Feb 2008, 14:49:21 UTC

This appears to be the model with the problem: hadsm3fub_00d0_005931581_8.

The last trickle shows that the model used about ten times the normal amount of CPU to complete the trickle (145,400 vs 12,989 seconds). This does sometimes happen to slab models (i.e. hadsm3) and is usually accompanied by a blue temperature display. These ice worlds usually happen much later in the run - and don\'t recover.

I would abort that model.
ID: 32796 · Report as offensive     Reply Quote
Roel

Send message
Joined: 24 Feb 08
Posts: 9
Credit: 876,602
RAC: 0
Message 32797 - Posted: 29 Feb 2008, 17:57:48 UTC

Hi Iain,

You were right, it was that model with a blue temperature display, and I am going to abort it. Thanks for your comment.
ID: 32797 · Report as offensive     Reply Quote
old_user193386

Send message
Joined: 20 Jul 06
Posts: 4
Credit: 336,140
RAC: 0
Message 33139 - Posted: 29 Mar 2008, 23:56:23 UTC

I also have one of these models where \"time remaining\" is increasing though somewhat slower than the CPU time usage. Temperature display is blue. The model is hadsm3fub_024m_005933871_0
I presume that I should abort it.
ID: 33139 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 33140 - Posted: 30 Mar 2008, 0:00:44 UTC - in response to Message 33139.  

I also have one of these models where \"time remaining\" is increasing though somewhat slower than the CPU time usage. Temperature display is blue. The model is hadsm3fub_024m_005933871_0
I presume that I should abort it.

I have never got a restored slab model to get past the point at which it \"went blue\", so aborting it is the only option as far as I can see. Bad luck for you, as that model was well into phase 3.
ID: 33140 · Report as offensive     Reply Quote
old_user193386

Send message
Joined: 20 Jul 06
Posts: 4
Credit: 336,140
RAC: 0
Message 33141 - Posted: 30 Mar 2008, 1:38:19 UTC - in response to Message 33140.  

I also have one of these models where \"time remaining\" is increasing though somewhat slower than the CPU time usage. Temperature display is blue. The model is hadsm3fub_024m_005933871_0
I presume that I should abort it.

I have never got a restored slab model to get past the point at which it \"went blue\", so aborting it is the only option as far as I can see. Bad luck for you, as that model was well into phase 3.

Thanks Iain - I do regret having to do this. I\'ll check for a few days longer to see if the progress percentage is increasing
ID: 33141 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 33143 - Posted: 30 Mar 2008, 10:49:48 UTC
Last modified: 30 Mar 2008, 10:50:14 UTC

Hi Gabriel

Your computer has a good record of model completion so you\'re obviously doing the right things to look after your models. This problematic slab looks as if it\'s been trying for 5 days to get to the next trickle. It was previously doing about 1.8 sec/TS. The current sec/TS that you see in the graphics window is a cumulative average, so I wouldn\'t be surprised if it\'s doing 18 sec/TS now. You can note down its timestep and the wall clock time, close the graphics window, then look back 10 minutes later to see where it\'s got and calculate its current speed. If it really is so slow and the graphics have gone, I\'d abort it now. As Iain says, we\'ve never seen one of these slow processing models recover.

We\'ve seen the graphics of one of these slow \'iceworlds\' that did complete because its owner battled on to the end. From where the slowdown occurred the results were abnormal and unusable.

It\'s just bad luck to get one of these.
Cpdn news
ID: 33143 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 33152 - Posted: 30 Mar 2008, 20:20:38 UTC

It seems a bitter pill but we can take consolation in knowledge that these failures provide valuable information to the researchers. Given that the Project tests parameter combinations, knowing what doesn\'t work is useful in establishing boundary conditions.

My dimming memory tells me that one of the researchers (Dave Frame?) posted several years ago that some failures can be more valuable than some successful Runs.

That said, however, it\'s more satisfying to complete a Model; for me, the longer the Model, the greater the satisfaction.

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 33152 · Report as offensive     Reply Quote
Profile Pooh Bear 27
Avatar

Send message
Joined: 5 Feb 05
Posts: 465
Credit: 1,914,189
RAC: 0
Message 33157 - Posted: 30 Mar 2008, 21:03:44 UTC - in response to Message 33152.  

My dimming memory tells me that one of the researchers (Dave Frame?) posted several years ago that some failures can be more valuable than some successful Runs.

That said, however, it\'s more satisfying to complete a Model; for me, the longer the Model, the greater the satisfaction.

Science, all about the trials, failures, errors, mistakes and successes. How would we even know success without some of the others.

Having done less than you, but a significant amount myself, I still feel that sense of accomplishment whenever a task finishes. I am now dabbling in the Beta project, with 200 year models. Talk about a long crunching time. My C2D T7200 says it will take upwards 145 days to do nonstop. The 2.4G Xeon will take around 260 days. Lots of patience needed there, cause they have already crashed a bunch with bad parameters, etc.

All in a cruncher\'s days work.

ID: 33157 · Report as offensive     Reply Quote
old_user193386

Send message
Joined: 20 Jul 06
Posts: 4
Credit: 336,140
RAC: 0
Message 33167 - Posted: 31 Mar 2008, 19:52:49 UTC

The progress percentage remains static. I\'ll abort the model.
<sigh>
Thank you all for the background and encouragement.
ID: 33167 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 33170 - Posted: 1 Apr 2008, 0:55:46 UTC

You\'ll get credit for all the trickles the model sent. Better luck with the next model. By the way, if you only want a HADSM slab next time, you may have to wait until tomorrow as the HADSM work queue is currently empty.
Cpdn news
ID: 33170 · Report as offensive     Reply Quote
old_user219190

Send message
Joined: 14 Jan 07
Posts: 52
Credit: 284,001
RAC: 0
Message 33176 - Posted: 1 Apr 2008, 11:36:48 UTC

Recently had to abort this ice ball.Poor thing had been going less than 48hours.
It was (I thought) happily crunching away running at 1.30 t/s and reporting every 4 hours but after 10 trickles it turned blue and 7 hours later had only completed 6000 of the 10.800 timesteps needed for the next trickle the T/S was up to 1.39 by then.Suspended at 144 on the countdown [to reboot Boinc,worth a try!]it fell back to the previous checkpoint knocking off .02% off the progress meter and the T/S fell back to 1.33.So a rapid time increase in the last countdown.
Crashed a slab over at beta a while ago also suspending at144 countdown (to take a backup) immediately after restart it crashed ,as expected, did the backup.
So now always treat the start of any countdown 1 below eg 143 for a slab and no problems with work loss or crashes since.
Had 2 out of 11 Ice ball, the first wu was completed successfully by another user running an AMD machine and the last is being run by another intel user so interested in how they get on.


ID: 33176 · Report as offensive     Reply Quote
old_user475756

Send message
Joined: 4 Oct 07
Posts: 1
Credit: 14,817
RAC: 0
Message 33219 - Posted: 6 Apr 2008, 10:22:53 UTC

Aargh, and I wonder why nothing happens. The Globe is frozen at 68,240%..
Strangely, the percentage increases sometimes, but then resets to 68,240%.
And the time display shows 2051 - later than the planned end point.

I will abort this model then.
ID: 33219 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 33220 - Posted: 6 Apr 2008, 12:03:05 UTC
Last modified: 6 Apr 2008, 12:04:52 UTC

What does the temperature display show? (\'T\' when the globe is displayed), and the model speed? (\'Z\' when graphics are showing to remove grey sidebar - the speed figure is marked \'s/ts\').

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6914668

Note that Phase three *starts* at 2050 and continues to 2065 (see the \'running your model\' readme, \'information\' section, first two posts describing types of model - link in my signature).


I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 33220 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 33221 - Posted: 6 Apr 2008, 12:13:02 UTC
Last modified: 6 Apr 2008, 12:13:53 UTC

Chieron,

It\'s a genuine ice world, as this unfortunate run of trickles shows for another Intel/Windows host in the same work unit: here.

Intel/Linux and AMD/Windows results have completed in that work unit.

Abort it.

Iain

[Oops, sorry Mike.]
ID: 33221 · Report as offensive     Reply Quote

Questions and Answers : Windows : \"Time remaining\" keeps counting up

©2024 climateprediction.net