climateprediction.net home page
hadam model restart errors

hadam model restart errors

Questions and Answers : Unix/Linux : hadam model restart errors
Message board moderation

To post messages, you must log in.

AuthorMessage
Andris Pavenis

Send message
Joined: 22 Oct 05
Posts: 15
Credit: 2,340,122
RAC: 0
Message 34130 - Posted: 24 Jun 2008, 4:20:15 UTC
Last modified: 24 Jun 2008, 4:26:11 UTC

After booting once computer found that BOINC crashed and restarted and HADAM model time has ben reset to some hours (see log below). After that HADAM model each time restarts and sends trickle message twice after starting BOINC client.

Additionally consumed time was reset to something near zero. I guess this time corruption is BOINC not project as I have seen similar time corruption also with 2 Einstein@Home workunits more than once.

Is it worth to continue the model?

System information: Linux, Fedora 8 x86_64, etc.

Andris

hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018737 A - 11/08/2000 02:50 - H:M:S=0075:45:34 AVG=14.56 DLT=10.57
hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018738 A - 11/08/2000 03:00 - H:M:S=0075:45:43 AVG=14.56 DLT= 9.36
hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018739 A - 11/08/2000 03:10 - H:M:S=0075:45:53 AVG=14.56 DLT=10.00
Cleaning up graphics data...
Detaching shared memory...
23-Jun-2008 18:55:31 [---] Starting BOINC client version 5.10.45 for x86_64-pc-linux-gnu
23-Jun-2008 18:55:32 [---] log flags: task, file_xfer, sched_ops
23-Jun-2008 18:55:32 [---] Libraries: libcurl/7.18.2 NSS/3.12.0.3 zlib/1.2.3 libidn/0.6.14
23-Jun-2008 18:55:32 [---] Executing as a daemon
23-Jun-2008 18:55:32 [---] Data directory: /var/lib/boinc
23-Jun-2008 18:55:32 [Einstein@Home] Found app_info.xml; using anonymous platform
23-Jun-2008 18:55:32 [---] Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU @ 2.40GHz [Family 6 Model 15 Stepping 7]
23-Jun-2008 18:55:32 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
23-Jun-2008 18:55:32 [---] OS: Linux: 2.6.25.6-27.fc8
23-Jun-2008 18:55:32 [---] Memory: 3.87 GB physical, 3.91 GB virtual
23-Jun-2008 18:55:32 [---] Disk: 47.30 GB total, 29.55 GB free
23-Jun-2008 18:55:32 [---] Local time is UTC +3 hours
23-Jun-2008 18:55:33 [climateprediction.net] URL: http://climateprediction.net/; Computer ID: 826910; location: home; project prefs: default
23-Jun-2008 18:55:33 [3x+1@home] URL: http://allprojectstats.com/collatz/; Computer ID: 1108; location: (none); project prefs: default
23-Jun-2008 18:55:33 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: 4174592; location: home; project prefs: default
23-Jun-2008 18:55:33 [orbit@home] URL: http://orbit.psi.edu/oah/; Computer ID: 3512; location: (none); project prefs: default
23-Jun-2008 18:55:33 [Cosmology@Home] URL: http://www.cosmologyathome.org/; Computer ID: 14606; location: (none); project prefs: default
23-Jun-2008 18:55:33 [Milkyway@home] URL: http://milkyway.cs.rpi.edu/milkyway/; Computer ID: 9818; location: (none); project prefs: default
23-Jun-2008 18:55:33 [Einstein@Home] URL: http://einstein.phys.uwm.edu/; Computer ID: 1099657; location: home; project prefs: default
23-Jun-2008 18:55:33 [---] General prefs: from http://cosmologyathome.org/ (last modified 26-Jan-2008 22:13:49)
23-Jun-2008 18:55:33 [---] Host location: none
23-Jun-2008 18:55:33 [---] General prefs: using your defaults
23-Jun-2008 18:55:33 [---] Reading preferences override file
23-Jun-2008 18:55:33 [---] Preferences limit memory usage when active to 1983.22MB
23-Jun-2008 18:55:33 [---] Preferences limit memory usage when idle to 3569.80MB
23-Jun-2008 18:55:33 [---] Preferences limit disk usage to 9.31GB
23-Jun-2008 18:55:33 [climateprediction.net] Restarting task hadam3h_c_52s16_2000_2000_1_0 using hadam3 version 503
23-Jun-2008 18:55:41 [Einstein@Home] Restarting task h1_1089.05_S5R3__350_S5R3b_1 using einstein_S5R3 version 438
23-Jun-2008 18:55:41 [Einstein@Home] Restarting task h1_1089.05_S5R3__349_S5R3b_0 using einstein_S5R3 version 438
23-Jun-2008 18:55:42 [Einstein@Home] Restarting task h1_1089.05_S5R3__549_S5R3b_2 using einstein_S5R3 version 438
Beginning work on result hadam3h_c_52s16_2000_2000_1_0...
Starting model in /var/lib/boinc/projects/climateprediction.net...
Created shared memory region key = 113980 of size 5009240 bytes (version 602)
.so shmem return code = 1152
Starting model ID hadam3h_c_52s16_2000_2000_1 Phase 1
Program launched with process id # 3583
Climate model starting - use graphics to monitor progress.
Or visit the website to see the graphs for this run.
Getting pthread attributes - retval=0
Setting pthread size (576716800 bytes) - retval=0
Executing program hadam3_um_5.03_i686-pc-linux-gnu 113980 dth20l_052s16.anc ssta2000.anc sicea2000.anc
hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018721 A - 11/08/2000 00:10 - H:M:S=0075:41:25 AVG=14.56 DLT= 0.00
23-Jun-2008 18:55:49 [climateprediction.net] Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
23-Jun-2008 18:55:54 [climateprediction.net] Scheduler request succeeded: got 0 new tasks
Cleaning up graphics data...
Detaching shared memory...
23-Jun-2008 18:56:11 [climateprediction.net] Task hadam3h_c_52s16_2000_2000_1_0 exited with zero status but no \'finished\' file
23-Jun-2008 18:56:11 [climateprediction.net] If this happens repeatedly you may need to reset the project.
23-Jun-2008 18:56:11 [climateprediction.net] Restarting task hadam3h_c_52s16_2000_2000_1_0 using hadam3 version 503
Beginning work on result hadam3h_c_52s16_2000_2000_1_0...
Starting model in /var/lib/boinc/projects/climateprediction.net...
Created shared memory region key = 113980 of size 5009240 bytes (version 602)
.so shmem return code = 1152
Starting model ID hadam3h_c_52s16_2000_2000_1 Phase 1
Getting pthread attributes - retval=0
Setting pthread size (576716800 bytes) - retval=0
Executing program hadam3_um_5.03_i686-pc-linux-gnu 113980 dth20l_052s16.anc ssta2000.anc sicea2000.anc
Program launched with process id # 3596
Climate model starting - use graphics to monitor progress.
Or visit the website to see the graphs for this run.
hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018721 A - 11/08/2000 00:10 - H:M:S=0001:13:57 AVG= 0.24 DLT= 0.00
23-Jun-2008 18:56:15 [climateprediction.net] Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
23-Jun-2008 18:56:20 [climateprediction.net] Scheduler request succeeded: got 0 new tasks
hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018722 A - 11/08/2000 00:20 - H:M:S=0001:15:28 AVG= 0.24 DLT=90.95
hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018723 A - 11/08/2000 00:30 - H:M:S=0001:15:37 AVG= 0.24 DLT= 8.99
hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018724 A - 11/08/2000 00:40 - H:M:S=0001:15:47 AVG= 0.24 DLT= 9.99
ID: 34130 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 34132 - Posted: 24 Jun 2008, 8:03:07 UTC

Did you Exit from BOINC before re-booting?
Highly recommended if you don\'t want to crash climate models.

Several trickles after a re-start is OK. If you look at the trickle file, you\'ll see that they are type \'cse\', whereas \'normal\' trickles are type \'orig\'.
type \'cse\' just tells the server that the model is back up and running.

As for continuing, is the day/month/year constantly advancing in the graphics?
If so, continue.
If it keeps jumping backwards, it may be best to abort.


Backups: Here
ID: 34132 · Report as offensive     Reply Quote
Andris Pavenis

Send message
Joined: 22 Oct 05
Posts: 15
Credit: 2,340,122
RAC: 0
Message 34134 - Posted: 24 Jun 2008, 16:37:26 UTC - in response to Message 34132.  
Last modified: 24 Jun 2008, 16:37:56 UTC

Did you Exit from BOINC before re-booting?
Highly recommended if you don\'t want to crash climate models.


Linux shut-down scripts stops BOINC automatically. I\'m using Fedora-8 boinc-client RPM which runs BOINC in daemon mode.


Several trickles after a re-start is OK. If you look at the trickle file, you\'ll see that they are type \'cse\', whereas \'normal\' trickles are type \'orig\'.
type \'cse\' just tells the server that the model is back up and running.

As for continuing, is the day/month/year constantly advancing in the graphics?
If so, continue.
If it keeps jumping backwards, it may be best to abort.



Text output from BOINC client shows that there are no backjumps (except of
using last checkpoint after restarting). In daemon mode graphics does not seenm to be working.

Tried to:
1) suspended CPDN model
2) restarted BOINC (/etc/init.d/boinc-client stop with following starting it)
3) resumed CPDN model

It still exited and restarted from checkpoint in the same way as in earlier included log.
ID: 34134 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : hadam model restart errors

©2024 climateprediction.net