climateprediction.net home page
Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion

Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion

Message boards : Number crunching : Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 15 · Next

AuthorMessage
wedgef5

Send message
Joined: 19 Jun 08
Posts: 2
Credit: 739,082
RAC: 0
Message 35065 - Posted: 21 Sep 2008, 12:43:10 UTC - in response to Message 35063.  

If it were mine, I\'d pull the plug. (They can sometimes be saved by transferring a backup to another machine type, Intel to AMD or vice versa, but there\'s no guarantee it will work.)

Irritating to get so close and then see it fail, but still, the work done will be of use to the researchers.

Welcome to the Boards.


Thanks. I\'m going to bail on it. I don\'t really care that much about the credits, so it\'s time to move on to a new simulation.
ID: 35065 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 35070 - Posted: 21 Sep 2008, 14:20:14 UTC

With this project, you get credits all the way through a model, and retain them even if a model fails or is aborted.
The only thing lost is the data from the point of failure forwards.

Because the point of the models is to find this failure point, and NOT to force a model to the end, the failure is actually a success. Now the researchers have another set of parameter values that they know aren\'t stable for a long period of time.


Backups: Here
ID: 35070 · Report as offensive
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 35106 - Posted: 26 Sep 2008, 13:18:35 UTC

I\'ve looked at the workunit that wedgef5\'s model belonged to. Three computers did complete it, but one\'s an AMD and the other two are Macs.

There\'s also an American cruncher with an Intel whose model may have been stuck at the same point as Wedgef5\'s for about a month. I\'ll send him a private message to let him know.
Cpdn news
ID: 35106 · Report as offensive
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 35199 - Posted: 8 Oct 2008, 20:34:56 UTC

Background, following change to boinc 6.2.19:
10/8/2008 12:27:38 PM||Running CPU benchmarks
10/8/2008 12:27:39 PM|Suspending network activity - user request
10/8/2008 12:28:10 PM||[error] Integer benchmark ran only 0.936006 sec; ignoring
10/8/2008 12:28:10 PM||[error] CPU benchmarks error
BOINC error? Gee, there\'s a surprise. (5.10.nn had no problems with the exercise, nor does Prime95 Torture Test/four copies have a problem with the machine.)

The failure: Frozen globe. Installed boinc 6.2.19, against my better judgment, to see the graphics on a HadSM3-MH Model I suspected turned blue. (It did; then, I committed an abortion/mercy killing.) It fell 13+ percent behind it wombmate on C2Q 9300 running stock, Vista Home Premium x64, 8GB DDR2 RAM, formerly under boinc 5.10.13.
10/8/2008 12:30:41 PM|climateprediction.net|Computation for task hadsm3mh_km6e_006000584_0 finished
10/8/2008 12:30:48 PM|climateprediction.net|Restarting task hadsm3mh_km6c_006000582_0 using hadsm3mh version 602
Note that boinc doesn\'t log a Message as to why the Run \"finished\".

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=8093024
The other crashed Model in the Work Unit is inconclusive (-107) re. \'frozen earth\'.

(With one exception, my v.6 Spinups show graphics in boinc 5.10.13 after a boinc restart; HadSM3-MH Models do not, hence my reluctant excursion into boinc\'s latest foray into instability.)
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 35199 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 35201 - Posted: 8 Oct 2008, 21:21:14 UTC

There was talk about truncated benchmarks on BOINC/dev.
I think that it\'s something to do with a higher level program running at the time the benchmaks are run.

ID: 35201 · Report as offensive
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 35203 - Posted: 8 Oct 2008, 22:14:30 UTC

Thanks, Les. It\'s a useless exercise for CPDN anyway.

I wonder what the higher level program might be, a Windows Service? Nothing else was active --> that\'s a CPDN-only box (except when Firefox is activated to report a problem or updates are made).

Jim
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 35203 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 35204 - Posted: 8 Oct 2008, 22:29:38 UTC

My memory was right for once. Here.

ID: 35204 · Report as offensive
old_user532563

Send message
Joined: 15 Aug 08
Posts: 2
Credit: 751,934
RAC: 0
Message 35240 - Posted: 14 Oct 2008, 22:36:28 UTC
Last modified: 14 Oct 2008, 22:36:50 UTC

I appear to have this issue myself. Last trickle was 4 days ago at TimeStamp 129,624 of phase 4 (9 days into the run)- right now its at 139973 of phase 4. It is progressuing but time to completion is going up instead of down. Temperature is blue.

Results here: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/trickle.php?resultid=8090588

CPU is not overclocked at all.

Wonder should I let it crawl to the finish line or put it down now?



13/10/2008 9:12:59 AM||Starting BOINC client version 6.2.19 for windows_intelx86
13/10/2008 9:12:59 AM||log flags: task, file_xfer, sched_ops
13/10/2008 9:12:59 AM||Libraries: libcurl/7.18.0 OpenSSL/0.9.8e zlib/1.2.3
13/10/2008 9:12:59 AM||Running as a daemon
13/10/2008 9:12:59 AM||Data directory: C:\\ProgramData\\BOINC
13/10/2008 9:12:59 AM||Running under account boinc_master
13/10/2008 9:12:59 AM||Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz [Intel64 Family 6 Model 23 Stepping 6]
13/10/2008 9:12:59 AM||Processor features: fpu tsc pae nx sse sse2 pni mmx
13/10/2008 9:12:59 AM||OS: Microsoft Windows Vista: Ultimate x64 Editon, Service Pack 1, (06.00.6001.00)
13/10/2008 9:12:59 AM||Memory: 8.00 GB physical, 32.17 GB virtual
13/10/2008 9:12:59 AM||Disk: 97.66 GB total, 54.45 GB free
13/10/2008 9:12:59 AM||Local time is UTC -4 hours
13/10/2008 9:13:00 AM||Version change (6.2.18 -> 6.2.19)
13/10/2008 9:13:00 AM|climateprediction.net|URL: http://climateprediction.net/; Computer ID: 906743; location: (none); project prefs: default
13/10/2008 9:13:00 AM||No general preferences found - using BOINC defaults
13/10/2008 9:13:00 AM||Reading preferences override file
13/10/2008 9:13:00 AM||Preferences limit memory usage when active to 4095.06MB
13/10/2008 9:13:00 AM||Preferences limit memory usage when idle to 7371.11MB
13/10/2008 9:13:00 AM||Preferences limit disk usage to 9.31GB
13/10/2008 9:13:00 AM||Running CPU benchmarks
13/10/2008 9:13:31 AM||Benchmark results:
13/10/2008 9:13:31 AM|| Number of CPUs: 2
13/10/2008 9:13:31 AM|| 3345 floating point MIPS (Whetstone) per CPU
13/10/2008 9:13:31 AM|| 7026 integer MIPS (Dhrystone) per CPU
13/10/2008 9:13:32 AM|climateprediction.net|Restarting task hadsm3mh_kl8v_006000313_3 using hadsm3mh version 602
13/10/2008 9:13:32 AM|climateprediction.net|Restarting task hadcm3ivolc_l2b9_2000_80_06001703_2 using hadcm3i version 602
ID: 35240 · Report as offensive
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 35241 - Posted: 14 Oct 2008, 23:06:02 UTC - in response to Message 35240.  

... Wonder should I let it crawl to the finish line or put it down now? ...

It has 12 trickles to go before the end of phase 4. It could take up to a week per iceworld trickle, though yours is a very fast machine - so 2-3 months to finish. I would put it out of its misery. For interest\'s sake you might let it run to the next trickle: the jump in sec/TS would then serve as a warning to others in that work unit ...
ID: 35241 · Report as offensive
old_user532563

Send message
Joined: 15 Aug 08
Posts: 2
Credit: 751,934
RAC: 0
Message 35244 - Posted: 15 Oct 2008, 0:59:10 UTC

Thanks very much for the thoughtful reply. I shall, as you suggest, allow it to trickle once more, then bury it in the permafrost.
ID: 35244 · Report as offensive
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 35255 - Posted: 15 Oct 2008, 14:05:32 UTC
Last modified: 15 Oct 2008, 14:10:18 UTC

I\'ve got a problematic slab model of my own now: 6173911. It\'s not a slow-processing iceworld as I\'ve had many times previously, but is in a seemingly infinite loop, submitting one trickle at each checkpoint - but making no progress at all.

It\'s been aborted, and the other stalled cruncher (non-anonymous, Wintel) in that work unit informed.
ID: 35255 · Report as offensive
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 35262 - Posted: 15 Oct 2008, 23:04:04 UTC

One of my slabs turned blue a couple days ago, in late Phase 3, if I recall correctly. Grabbed one of my Samaurai swords and dispatched it with ease -- so to speak.

(A couple days ago, I reran a crashed Beta Spinup Run, only to have it crash in the same way, in the same place. Main site Slabs don\'t get that consideration, not any more.)
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 35262 · Report as offensive
old_user452941

Send message
Joined: 22 May 07
Posts: 35
Credit: 1,065,741
RAC: 0
Message 35265 - Posted: 16 Oct 2008, 0:23:17 UTC

I\'m aborting this model as it has turned into a slow ice-world at 12%: 645222.
ID: 35265 · Report as offensive
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 35266 - Posted: 16 Oct 2008, 1:23:37 UTC

Ed, I think two other members sharing your workunit have already hit the same problem. One of them only joined CPDN on Sunday so it\'s a baptism of fire for him.

It\'s too late now, but tomorrow I\'ll send PMs to everybody sharing your WU and also to a couple of the people sharing Fundin\'s. I hope some of them have email notification of PMs enabled......
Cpdn news
ID: 35266 · Report as offensive
zdespi

Send message
Joined: 5 Jan 05
Posts: 4
Credit: 1,544,444
RAC: 0
Message 35579 - Posted: 23 Nov 2008, 9:25:45 UTC

Here are two of mine:
8083034
8082988

First one even did not finished phase one = stuck around 12.9%
Second I have aborted around 69%

CPU is Intel(R) Xeon(R) CPU E5320 @ 1.86GHz, no OC

This machine have finished some models as OK already.


ID: 35579 · Report as offensive
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 35594 - Posted: 25 Nov 2008, 0:02:49 UTC
Last modified: 25 Nov 2008, 0:05:26 UTC

Thanks for reporting that. These models were very badly stuck in their iceworlds!

Your links don\'t work (there may have been an erroneous / at the end of the addresses) so here they are again:

Task 8083034
Task 8082988

I\'ll send private messages to the other crunchers to warn them.
Cpdn news
ID: 35594 · Report as offensive
Bernard

Send message
Joined: 14 Mar 06
Posts: 1
Credit: 372,031
RAC: 0
Message 35631 - Posted: 3 Dec 2008, 0:35:53 UTC

I believe my model has changed to an Ice World.
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=8162104

28/11/2008 13:50:55||Starting BOINC client version 6.2.19 for windows_intelx86
28/11/2008 13:50:55||log flags: task, file_xfer, sched_ops
28/11/2008 13:50:55||Libraries: libcurl/7.18.0 OpenSSL/0.9.8e zlib/1.2.3
28/11/2008 13:50:55||Data directory: D:\\BoincData
28/11/2008 13:50:55||Running under account Bernard
28/11/2008 13:50:55||Processor: 1 GenuineIntel Intel(R) Pentium(R) M processor 1.73GHz [x86 Family 6 Model 13 Stepping 8]
28/11/2008 13:50:55||Processor features: fpu tsc pae nx sse sse2 mmx
28/11/2008 13:50:55||OS: Microsoft Windows XP: Home x86 Editon, Service Pack 3, (05.01.2600.00)
28/11/2008 13:50:55||Memory: 1022.42 MB physical, 2.40 GB virtual
28/11/2008 13:50:55||Disk: 62.89 GB total, 49.88 GB free
28/11/2008 13:50:55||Local time is UTC +0 hours
28/11/2008 13:50:55|climateprediction.net|URL: http://climateprediction.net/; Computer ID: 568448; location: home; project prefs: default
28/11/2008 13:50:55||General prefs: from climateprediction.net (last modified 15-Mar-2007 01:23:29)
28/11/2008 13:50:55||Computer location: home
28/11/2008 13:50:55||General prefs: no separate prefs for home; using your defaults
28/11/2008 13:50:55||Reading preferences override file
28/11/2008 13:50:55||Preferences limit memory usage when active to 511.21MB
28/11/2008 13:50:55||Preferences limit memory usage when idle to 920.18MB
28/11/2008 13:50:55||Preferences limit disk usage to 49.79GB
28/11/2008 13:50:56|climateprediction.net|Restarting task hadsm3mh_kl4j_006005547_3 using hadsm3mh version 602
01/12/2008 00:24:48||Running CPU benchmarks
01/12/2008 00:24:48||Suspending computation - running CPU benchmarks
01/12/2008 00:25:20||[error] FP benchmark ran only 1.171875 sec; ignoring
01/12/2008 00:25:20||[error] CPU benchmarks error
01/12/2008 00:25:22||Resuming computation
01/12/2008 22:40:01||Running CPU benchmarks
01/12/2008 22:40:01||Suspending computation - running CPU benchmarks
01/12/2008 22:40:32||Benchmark results:
01/12/2008 22:40:32|| Number of CPUs: 1
01/12/2008 22:40:32|| 1557 floating point MIPS (Whetstone) per CPU
01/12/2008 22:40:32|| 3100 integer MIPS (Dhrystone) per CPU
01/12/2008 22:40:33||Resuming computation

It is still continuing the calculation but has slowed from 2.2 s/ts to 62 s/ts.
Should I abort this model?
ID: 35631 · Report as offensive
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 35632 - Posted: 3 Dec 2008, 1:12:55 UTC - in response to Message 35631.  

It is still continuing the calculation but has slowed from 2.2 s/ts to 62 s/ts.
Should I abort this model?

Yes, Bernard. It\'s a goner. The other Windows/Intel model in that work unit has also run into the same difficulty, which is conclusive proof.

I\'ll send the affected people in that work unit a PM to advise them to abort.

Thanks for reporting it.
ID: 35632 · Report as offensive
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 35633 - Posted: 3 Dec 2008, 1:23:43 UTC
Last modified: 3 Dec 2008, 1:31:10 UTC

Hi Bernard, welcome to the forum and thank you for reporting the problem.

Yes, it\'s definitely an iceworld. The model stopped processing its precipitation data properly during the last one or two trickle periods before the end of phase one. It then uploaded its phase 1 data so, unusually, this provides us with a graph that illustrates what happens. The temp graph is normal, but look at this:


In my experience when this happens the model never recovers. So you\'ll need to abort it.

Two other members with a model from the same workunit did get past this danger point and continued crunching normally, but whereas you are crunching it on an Intel, one of these members has an AMD and the other a Mac.

Another member has a model stuck at exactly the same point. This shows that the model\'s defective and the problem hasn\'t been caused by your computer.
Cpdn news
ID: 35633 · Report as offensive
zdespi

Send message
Joined: 5 Jan 05
Posts: 4
Credit: 1,544,444
RAC: 0
Message 35650 - Posted: 5 Dec 2008, 13:13:13 UTC

Hi again,
I have recently aborted these other two models. They were also stuck, I suppose.
Task 8160591
Task 8127746

Regards
ID: 35650 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 15 · Next

Message boards : Number crunching : Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion

©2024 climateprediction.net