|
Message boards : Number crunching : Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion
Message board moderation
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 15 · Next
Author | Message |
---|---|
Send message Joined: 19 Jun 08 Posts: 2 Credit: 739,082 RAC: 0 |
If it were mine, I\'d pull the plug. (They can sometimes be saved by transferring a backup to another machine type, Intel to AMD or vice versa, but there\'s no guarantee it will work.) Thanks. I\'m going to bail on it. I don\'t really care that much about the credits, so it\'s time to move on to a new simulation. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
With this project, you get credits all the way through a model, and retain them even if a model fails or is aborted. The only thing lost is the data from the point of failure forwards. Because the point of the models is to find this failure point, and NOT to force a model to the end, the failure is actually a success. Now the researchers have another set of parameter values that they know aren\'t stable for a long period of time. Backups: Here |
![]() ![]() Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
I\'ve looked at the workunit that wedgef5\'s model belonged to. Three computers did complete it, but one\'s an AMD and the other two are Macs. There\'s also an American cruncher with an Intel whose model may have been stuck at the same point as Wedgef5\'s for about a month. I\'ll send him a private message to let him know. Cpdn news |
![]() Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Background, following change to boinc 6.2.19: 10/8/2008 12:27:38 PM||Running CPU benchmarksBOINC error? Gee, there\'s a surprise. (5.10.nn had no problems with the exercise, nor does Prime95 Torture Test/four copies have a problem with the machine.) The failure: Frozen globe. Installed boinc 6.2.19, against my better judgment, to see the graphics on a HadSM3-MH Model I suspected turned blue. (It did; then, I committed an abortion/mercy killing.) It fell 13+ percent behind it wombmate on C2Q 9300 running stock, Vista Home Premium x64, 8GB DDR2 RAM, formerly under boinc 5.10.13. 10/8/2008 12:30:41 PM|climateprediction.net|Computation for task hadsm3mh_km6e_006000584_0 finishedNote that boinc doesn\'t log a Message as to why the Run \"finished\". http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=8093024 The other crashed Model in the Work Unit is inconclusive (-107) re. \'frozen earth\'. (With one exception, my v.6 Spinups show graphics in boinc 5.10.13 after a boinc restart; HadSM3-MH Models do not, hence my reluctant excursion into boinc\'s latest foray into instability.) "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
There was talk about truncated benchmarks on BOINC/dev. I think that it\'s something to do with a higher level program running at the time the benchmaks are run. |
![]() Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Thanks, Les. It\'s a useless exercise for CPDN anyway. I wonder what the higher level program might be, a Windows Service? Nothing else was active --> that\'s a CPDN-only box (except when Firefox is activated to report a problem or updates are made). Jim "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
|
Send message Joined: 15 Aug 08 Posts: 2 Credit: 751,934 RAC: 0 |
I appear to have this issue myself. Last trickle was 4 days ago at TimeStamp 129,624 of phase 4 (9 days into the run)- right now its at 139973 of phase 4. It is progressuing but time to completion is going up instead of down. Temperature is blue. Results here: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/trickle.php?resultid=8090588 CPU is not overclocked at all. Wonder should I let it crawl to the finish line or put it down now? 13/10/2008 9:12:59 AM||Starting BOINC client version 6.2.19 for windows_intelx86 13/10/2008 9:12:59 AM||log flags: task, file_xfer, sched_ops 13/10/2008 9:12:59 AM||Libraries: libcurl/7.18.0 OpenSSL/0.9.8e zlib/1.2.3 13/10/2008 9:12:59 AM||Running as a daemon 13/10/2008 9:12:59 AM||Data directory: C:\\ProgramData\\BOINC 13/10/2008 9:12:59 AM||Running under account boinc_master 13/10/2008 9:12:59 AM||Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU E8500 @ 3.16GHz [Intel64 Family 6 Model 23 Stepping 6] 13/10/2008 9:12:59 AM||Processor features: fpu tsc pae nx sse sse2 pni mmx 13/10/2008 9:12:59 AM||OS: Microsoft Windows Vista: Ultimate x64 Editon, Service Pack 1, (06.00.6001.00) 13/10/2008 9:12:59 AM||Memory: 8.00 GB physical, 32.17 GB virtual 13/10/2008 9:12:59 AM||Disk: 97.66 GB total, 54.45 GB free 13/10/2008 9:12:59 AM||Local time is UTC -4 hours 13/10/2008 9:13:00 AM||Version change (6.2.18 -> 6.2.19) 13/10/2008 9:13:00 AM|climateprediction.net|URL: http://climateprediction.net/; Computer ID: 906743; location: (none); project prefs: default 13/10/2008 9:13:00 AM||No general preferences found - using BOINC defaults 13/10/2008 9:13:00 AM||Reading preferences override file 13/10/2008 9:13:00 AM||Preferences limit memory usage when active to 4095.06MB 13/10/2008 9:13:00 AM||Preferences limit memory usage when idle to 7371.11MB 13/10/2008 9:13:00 AM||Preferences limit disk usage to 9.31GB 13/10/2008 9:13:00 AM||Running CPU benchmarks 13/10/2008 9:13:31 AM||Benchmark results: 13/10/2008 9:13:31 AM|| Number of CPUs: 2 13/10/2008 9:13:31 AM|| 3345 floating point MIPS (Whetstone) per CPU 13/10/2008 9:13:31 AM|| 7026 integer MIPS (Dhrystone) per CPU 13/10/2008 9:13:32 AM|climateprediction.net|Restarting task hadsm3mh_kl8v_006000313_3 using hadsm3mh version 602 13/10/2008 9:13:32 AM|climateprediction.net|Restarting task hadcm3ivolc_l2b9_2000_80_06001703_2 using hadcm3i version 602 |
![]() Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
... Wonder should I let it crawl to the finish line or put it down now? ... It has 12 trickles to go before the end of phase 4. It could take up to a week per iceworld trickle, though yours is a very fast machine - so 2-3 months to finish. I would put it out of its misery. For interest\'s sake you might let it run to the next trickle: the jump in sec/TS would then serve as a warning to others in that work unit ... |
Send message Joined: 15 Aug 08 Posts: 2 Credit: 751,934 RAC: 0 |
Thanks very much for the thoughtful reply. I shall, as you suggest, allow it to trickle once more, then bury it in the permafrost. |
![]() Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
I\'ve got a problematic slab model of my own now: 6173911. It\'s not a slow-processing iceworld as I\'ve had many times previously, but is in a seemingly infinite loop, submitting one trickle at each checkpoint - but making no progress at all. It\'s been aborted, and the other stalled cruncher (non-anonymous, Wintel) in that work unit informed. |
![]() Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
One of my slabs turned blue a couple days ago, in late Phase 3, if I recall correctly. Grabbed one of my Samaurai swords and dispatched it with ease -- so to speak. (A couple days ago, I reran a crashed Beta Spinup Run, only to have it crash in the same way, in the same place. Main site Slabs don\'t get that consideration, not any more.) "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 22 May 07 Posts: 35 Credit: 1,065,741 RAC: 0 |
I\'m aborting this model as it has turned into a slow ice-world at 12%: 645222. |
![]() ![]() Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Ed, I think two other members sharing your workunit have already hit the same problem. One of them only joined CPDN on Sunday so it\'s a baptism of fire for him. It\'s too late now, but tomorrow I\'ll send PMs to everybody sharing your WU and also to a couple of the people sharing Fundin\'s. I hope some of them have email notification of PMs enabled...... Cpdn news |
Send message Joined: 5 Jan 05 Posts: 4 Credit: 1,544,444 RAC: 0 |
|
![]() ![]() Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Thanks for reporting that. These models were very badly stuck in their iceworlds! Your links don\'t work (there may have been an erroneous / at the end of the addresses) so here they are again: Task 8083034 Task 8082988 I\'ll send private messages to the other crunchers to warn them. Cpdn news |
Send message Joined: 14 Mar 06 Posts: 1 Credit: 372,031 RAC: 0 |
I believe my model has changed to an Ice World. http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=8162104 28/11/2008 13:50:55||Starting BOINC client version 6.2.19 for windows_intelx86 28/11/2008 13:50:55||log flags: task, file_xfer, sched_ops 28/11/2008 13:50:55||Libraries: libcurl/7.18.0 OpenSSL/0.9.8e zlib/1.2.3 28/11/2008 13:50:55||Data directory: D:\\BoincData 28/11/2008 13:50:55||Running under account Bernard 28/11/2008 13:50:55||Processor: 1 GenuineIntel Intel(R) Pentium(R) M processor 1.73GHz [x86 Family 6 Model 13 Stepping 8] 28/11/2008 13:50:55||Processor features: fpu tsc pae nx sse sse2 mmx 28/11/2008 13:50:55||OS: Microsoft Windows XP: Home x86 Editon, Service Pack 3, (05.01.2600.00) 28/11/2008 13:50:55||Memory: 1022.42 MB physical, 2.40 GB virtual 28/11/2008 13:50:55||Disk: 62.89 GB total, 49.88 GB free 28/11/2008 13:50:55||Local time is UTC +0 hours 28/11/2008 13:50:55|climateprediction.net|URL: http://climateprediction.net/; Computer ID: 568448; location: home; project prefs: default 28/11/2008 13:50:55||General prefs: from climateprediction.net (last modified 15-Mar-2007 01:23:29) 28/11/2008 13:50:55||Computer location: home 28/11/2008 13:50:55||General prefs: no separate prefs for home; using your defaults 28/11/2008 13:50:55||Reading preferences override file 28/11/2008 13:50:55||Preferences limit memory usage when active to 511.21MB 28/11/2008 13:50:55||Preferences limit memory usage when idle to 920.18MB 28/11/2008 13:50:55||Preferences limit disk usage to 49.79GB 28/11/2008 13:50:56|climateprediction.net|Restarting task hadsm3mh_kl4j_006005547_3 using hadsm3mh version 602 01/12/2008 00:24:48||Running CPU benchmarks 01/12/2008 00:24:48||Suspending computation - running CPU benchmarks 01/12/2008 00:25:20||[error] FP benchmark ran only 1.171875 sec; ignoring 01/12/2008 00:25:20||[error] CPU benchmarks error 01/12/2008 00:25:22||Resuming computation 01/12/2008 22:40:01||Running CPU benchmarks 01/12/2008 22:40:01||Suspending computation - running CPU benchmarks 01/12/2008 22:40:32||Benchmark results: 01/12/2008 22:40:32|| Number of CPUs: 1 01/12/2008 22:40:32|| 1557 floating point MIPS (Whetstone) per CPU 01/12/2008 22:40:32|| 3100 integer MIPS (Dhrystone) per CPU 01/12/2008 22:40:33||Resuming computation It is still continuing the calculation but has slowed from 2.2 s/ts to 62 s/ts. Should I abort this model? |
![]() Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
It is still continuing the calculation but has slowed from 2.2 s/ts to 62 s/ts. Yes, Bernard. It\'s a goner. The other Windows/Intel model in that work unit has also run into the same difficulty, which is conclusive proof. I\'ll send the affected people in that work unit a PM to advise them to abort. Thanks for reporting it. |
![]() ![]() Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Bernard, welcome to the forum and thank you for reporting the problem. Yes, it\'s definitely an iceworld. The model stopped processing its precipitation data properly during the last one or two trickle periods before the end of phase one. It then uploaded its phase 1 data so, unusually, this provides us with a graph that illustrates what happens. The temp graph is normal, but look at this: ![]() In my experience when this happens the model never recovers. So you\'ll need to abort it. Two other members with a model from the same workunit did get past this danger point and continued crunching normally, but whereas you are crunching it on an Intel, one of these members has an AMD and the other a Mac. Another member has a model stuck at exactly the same point. This shows that the model\'s defective and the problem hasn\'t been caused by your computer. Cpdn news |
Send message Joined: 5 Jan 05 Posts: 4 Credit: 1,544,444 RAC: 0 |
Hi again, I have recently aborted these other two models. They were also stuck, I suppose. Task 8160591 Task 8127746 Regards |
©2025 cpdn.org