climateprediction.net home page
Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion

Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion

Message boards : Number crunching : Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 15 · Next

AuthorMessage
zdespi

Send message
Joined: 5 Jan 05
Posts: 4
Credit: 1,544,444
RAC: 0
Message 35879 - Posted: 9 Jan 2009, 12:50:21 UTC

Hi again,
another unit turned into iceball I suppose. Within 19 days only 9.8% finished and TS growing.
8233928
ID: 35879 · Report as offensive
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,403,322
RAC: 5,085
Message 35881 - Posted: 9 Jan 2009, 14:06:18 UTC

Two others in that WU, both Intel Pentium 4s or later with Windows, are also in trouble at the same point. A Pentium 3 in Windows has gone past that point and seems to be doing well (as well as a Pentium 3 can do at 6 s/TS).
ID: 35881 · Report as offensive
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 35886 - Posted: 9 Jan 2009, 22:12:24 UTC

I\'ll send private messages to the people with that workunit who need to be warned. (Not that I\'ve ever received a response to this sort of message, of which I must have sent dozens.)
Cpdn news
ID: 35886 · Report as offensive
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,403,322
RAC: 5,085
Message 35915 - Posted: 14 Jan 2009, 16:58:11 UTC
Last modified: 14 Jan 2009, 17:49:01 UTC

Another frozen earth here, which the user has aborted. Two other Intel/Windows computers in that work unit are nearing the freeze point. Unfortunately, I am unable to PM one of them...not that PMs do much good in these situations since, by default, no one is ever notified of a PM via e-mail.
ID: 35915 · Report as offensive
Virtual Boss*
Avatar

Send message
Joined: 14 May 08
Posts: 29
Credit: 776,852
RAC: 0
Message 35926 - Posted: 15 Jan 2009, 14:48:47 UTC

Another Ice Age

Could not get it restarted and I had not made a backup, so I aborted it.

Also a \'dirty\' power interruption caused another host to crash 3 models, #1 , #2 , #3.

All three reported immediately on computer restart.

Unfortunately I had been doing some rearranging and accidently left that host off the UPS, now fixed but too late for those 3 and some Seti WU\'s.
ID: 35926 · Report as offensive
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 35945 - Posted: 16 Jan 2009, 20:55:47 UTC

Unless you have suspicions that an iceworld was caused by an instability in the computer, it isn\'t worth restoring even if you have a backup. They almost invariably crash again at the same point.

There\'s another Windows machine in the same WU which will probably hit iceworld conditions within the next trickle or two so I\'ll send its owner a PM.


Cpdn news
ID: 35945 · Report as offensive
old_user451620

Send message
Joined: 16 May 07
Posts: 2
Credit: 0
RAC: 0
Message 35989 - Posted: 23 Jan 2009, 8:33:26 UTC

I am posting to report that hadsm3mh_kl60_006003796_8 went iceworld at 97.843% completion.

Other information provided because another post said it was providing this requested information:

1. A link to the model/ResultID webpage

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=8132659


2. A current timestep of that model (on the globe graphic)

236881 of 259248 Date 16/08/2064 00:30

Sorry, not sure when last trickle was.

3. The s/TS value (on the globe graphic. Remember, you can hit the Z key while viewing the globe and it will give you this additional text/status information.)

Hours Elapsed: 0866:51:21 (3.08 s/TS)

4. Whether the temperature display of the globe graphic is blue.

Yes. Entirely.

5. What your processor/CPU is (i.e. Intel, AMD)

GenuineIntel
Intel(R) Pentium(R) 4 CPU 3.00GHz [x86 Family 15 Model 4 Stepping 1] [fpu tsc pae nx sse sse2 mmx]

6. Whether you are overclocking.

No.


Question: About how long should I expect the remaining 2.2% to take to complete?

Thanks
ID: 35989 · Report as offensive
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 35991 - Posted: 23 Jan 2009, 10:09:36 UTC - in response to Message 35989.  
Last modified: 23 Jan 2009, 10:11:24 UTC

... Question: About how long should I expect the remaining 2.2% to take to complete?
The last trickle was timestep 226,842 in phase 4, so there are three timesteps to go. The machine was processing at about 2 seconds/timestep (after the other model had finished).

Based on another HADSM3MH model that went iceworld, it might take 10-11 days for each trickle to complete. So, that would be about a month for the final three trickles! That\'s why most people who are aware of an iceworld just abort it, since they could finish a number of entire models in the time it takes to finish one iceworld. However, it will eventually get to the end if you let it run.
ID: 35991 · Report as offensive
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 35992 - Posted: 23 Jan 2009, 11:32:30 UTC

I think you should abort it. When an iceworld develops, as far as we know the model always stops producing some of the data. Usually iceworlds stop producing the precipitation graphs. So the researchers can no longer use your model.
Cpdn news
ID: 35992 · Report as offensive
old_user451620

Send message
Joined: 16 May 07
Posts: 2
Credit: 0
RAC: 0
Message 35993 - Posted: 23 Jan 2009, 14:36:43 UTC - in response to Message 35992.  

The last trickle was timestep 226,842 in phase 4, so there are three timesteps to go. The machine was processing at about 2 seconds/timestep (after the other model had finished).

Based on another HADSM3MH model that went iceworld, it might take 10-11 days for each trickle to complete. So, that would be about a month for the final three trickles! That\'s why most people who are aware of an iceworld just abort it, since they could finish a number of entire models in the time it takes to finish one iceworld. However, it will eventually get to the end if you let it run.


I think you should abort it. When an iceworld develops, as far as we know the model always stops producing some of the data. Usually iceworlds stop producing the precipitation graphs. So the researchers can no longer use your model.


Though it seems quite the shame lose all that computing time, if the results aren\'t usable when they are finished then there isn\'t much sense in waiting around on them.

Thank you both,
-Mike
ID: 35993 · Report as offensive
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 35997 - Posted: 23 Jan 2009, 19:08:45 UTC

Mike, if you look at the model\'s web page and inspect the graphs, you\'ll see that they were excellent for the first three phases. It\'s only from the point when the iceworld develops that the model stops processing its data correctly. So only your crunching for the last phase was lost.

My apologies for not making that clear before.
Cpdn news
ID: 35997 · Report as offensive
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 35998 - Posted: 23 Jan 2009, 21:39:33 UTC

The phase 4 trickle history for Mike\'s task exhibits a significant speed up before it slowed down. I\'ve not spotted that pattern before.

After the 8th trickle the average sec/TS started falling, from 3.1156 to 2.9757. That works out at 2.1150 sec/TS over the last 13 trickles, approximately a third faster than before. The 12th trickle took an average of 1.9839 sec/TS and the 13th 2.3502 sec/TS, suggesting that the slowdown started a bit before that trickle.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 35998 · Report as offensive
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 35999 - Posted: 23 Jan 2009, 21:52:00 UTC
Last modified: 23 Jan 2009, 21:57:19 UTC

... that\'s just because the other task on the hyperthreaded 3 GHz P4 finished at that point.

There appear to be two \'user\' accounts: the relevant one is here.
ID: 35999 · Report as offensive
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 36024 - Posted: 26 Jan 2009, 10:22:18 UTC

I believe my task 25/01/2009 22:34:28 hadsm3fub_k8n4_005975987_5 using hadsm3 version 607 has, I think, gone to ice world. It\'s reached 6/8/1824 and the globe is now wholly blue. My other model (coupled) is happily displaying the usual sunshine and clouds. I\'m planning to abort, but not today!
ID: 36024 · Report as offensive
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 36025 - Posted: 26 Jan 2009, 10:37:51 UTC

Yes, there\'s one model further on and it\'s slowed down at the same point. I\'ll send a message to the other people in that unit, advising them to abort.

Thanks for reporting that.
ID: 36025 · Report as offensive
old_user551646

Send message
Joined: 3 Jan 09
Posts: 9
Credit: 633,446
RAC: 0
Message 36096 - Posted: 7 Feb 2009, 21:08:59 UTC - in response to Message 35989.  
Last modified: 7 Feb 2009, 21:14:01 UTC

I am posting to report that hadsm3mh_kj7q_006010388 went iceworld at 99.371% completion. It is still running, but extremely slowly. BOINC Mgr is estimating 5 hrs to completion, but by my calculations it\'ll be closer to 6 days.

Using a previous example of reporting an Ice World:

1. A link to the model/ResultID webpage
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/trickle.php?resultid=8284151

2. A current timestep of that model (on the globe graphic)

252734 of 259248 Date 16/07/2065 07:30

3. The s/TS value

Hours Elapsed: 0530:27:16 (1.85 s/TS) - per the last trickle the s/TS had been 1.7036 on 2/5/09 at 00:44:46 UTC.

4. Whether the temperature display of the globe graphic is blue.

Yes. Entirely.

5. What your processor/CPU is (i.e. Intel, AMD)

GenuineIntel
Intel(R) Core(TM)2 Quad CPU @ 2.40GHz [x86 Family 6 Model 15 Stepping 7]

6. Whether you are overclocking.

Yes to 2.70GHz.

---------------

Thanks,
Jack
ID: 36096 · Report as offensive
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 36103 - Posted: 8 Feb 2009, 9:08:27 UTC
Last modified: 8 Feb 2009, 9:16:00 UTC

Hi Jack

Thanks for reporting this iceworld.

That\'s very frustrating, particularly so near the end. The graphs for the first 3 phases all look good. I see you\'ve aborted it now which was the right thing to do. A Mac has completed a model from the same WU but we know that iceworlds are far more common on Intel/Windows computers. So comparing your model with the one on the Mac tells us nothing useful.

The only model in that WU that might give you a clue about whether your model was inherently unstable or the iceworld could have been caused by an instability in your computer is this one, also running on Intel/Windows. But that computer\'s sending in trickles so infrequently that you won\'t get an early answer.

If your computer produces several iceworlds you could consider testing it for stability at that level of O/C if you haven\'t already done so. Or you could select non-HADSM model types for it.
Cpdn news
ID: 36103 · Report as offensive
Virtual Boss*
Avatar

Send message
Joined: 14 May 08
Posts: 29
Credit: 776,852
RAC: 0
Message 36108 - Posted: 8 Feb 2009, 11:55:27 UTC - in response to Message 36103.  


The only model in that WU that might give you a clue about whether your model was inherently unstable or the iceworld could have been caused by an instability in your computer is this one, also running on Intel/Windows. But that computer\'s sending in trickles so infrequently that you won\'t get an early answer.


I wouldn\'t hold my breath, all previous models by that host are compute error.
ID: 36108 · Report as offensive
old_user186450

Send message
Joined: 11 May 06
Posts: 4
Credit: 1,008,514
RAC: 0
Message 36137 - Posted: 14 Feb 2009, 11:28:19 UTC

I\'ve not had much luck with these models...
First off it seems I had some dodgy memory that corrupted the first few models I crunched (replaced 19/1/09 with ECC).

Now I have a possible \'ICE\' planet :(

Task ID: 7736409
Name: hadsm3fub_k95r_005976658_2
Workunit 6188835

Should I abandon this one?



ID: 36137 · Report as offensive
wateroakley

Send message
Joined: 6 Aug 04
Posts: 185
Credit: 27,083,655
RAC: 6,161
Message 36140 - Posted: 14 Feb 2009, 15:16:29 UTC - in response to Message 36137.  

Should I abandon this one?
7736409 Your model seems to have speeded up by about 20% on 10th Feb between TS 216040 and TS 226842 in phase 1. trickles. It went from about 1.09s/TS to 0.823s/TS, then to 0.7s/TS at your last trickle phase 2 TS 140,246 today at 11:01. One other user has run the WU to completion, under Darwin, with consistent timing. Three others seem to have also errored, though none as far as yours. Unless you have a backup prior to the 10th I\'d suggest you abort it.
Good luck with the next model.
ID: 36140 · Report as offensive
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 15 · Next

Message boards : Number crunching : Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion

©2024 climateprediction.net