climateprediction.net home page
Posts by Andris Pavenis

Posts by Andris Pavenis

1) Questions and Answers : Unix/Linux : Network connection problems and CPDN model crashes (Message 40930)
Posted 29 Oct 2010 by Andris Pavenis
Post:
The same problem (crash of CPDN model) repeated once more about 20 seconds after restarting model. BOINC was started by startup scripts while booting computer. Network (WLAN) was not yet up. Unfortunately due to NetworkManager problems WLAN connection only happens after GNOME session is started.

This time boincmgr was not running so I guess interaction between boincmgr and boinc client can be excluded as a reason.

Running for prime95 3 hours (4 threads as one should on Core 2 Quad) did not show any errors and I did not expect them either.

Would it be worth to try some more model to get core dump for somebody to examine if crash repeats? (Core dump generation for binaries not belonging to installed RPM packages was off by default. It is on now)

2) Questions and Answers : Unix/Linux : Network connection problems and CPDN model crashes (Message 40907)
Posted 25 Oct 2010 by Andris Pavenis
Post:

One crashed task in July was a FAMOUS. The error messages show that the crash was caused by the parameter values of your model. So that is not a problem.


I'm not writing about that


But four HadSM crashed. Probably two at one moment and the other two at another moment. All have exit code 11 and similar error messages. One of the tasks is here. Click on stderr + to see the messages. Signal 11 in every case.


I saw all that. Unfortunately even if core file was generated Fedora 13 crash handler found that it is not caused by any Fedora 13 packages, so it was automatically deleted.


This is not, as far as I know, caused by network connection problems. Jorden has some explanations of the Signal 11 error in his FAQ here. I see that your computer has processed a lot of CPDN models successfully in the past. But have you installed a new version of Linux and now need the 32bit compatibility libraries? See Geophi's post.

If this is the problem please tell us.


The problem is not related with Linux upgrade (no serious upgrades in last month). Also all 32 bit compatibility libraries are in place. Linux 64 bit version is used there already for a long time.

What I saw is that similarly as sometimes earlier there has probably been some bad interaction between boincmgr and boinc client when there is network connection problems (when network is still on, but one is getting no response and connections times out as far as I have observed)

I could try to reproduce the problem by
- getting a new work unit on the same system
- trying to attach GDB to the process (unfortunately without debug info there would be little use of GDB)
- messing with network (trying to break it various ways)


3) Questions and Answers : Unix/Linux : Network connection problems and CPDN model crashes (Message 40900)
Posted 23 Oct 2010 by Andris Pavenis
Post:
Had 4 CPDN models crashed recently on http://climateapps2.oucs.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=826910

Both times it seems to be related by temporary network connection problems. I have noticed also earlier problems when:
- BOINC manager is running
- there are network connection problems

In second BOINC log one can see that also Einstein@HOME had some problems (similar as I have observed earlier)

Andris

22-Oct-2010 07:29:14 [Milkyway@home] Sending scheduler request: To fetch work.
22-Oct-2010 07:29:14 [Milkyway@home] Requesting new tasks for GPU
22-Oct-2010 07:29:19 [Milkyway@home] Scheduler request completed: got 1 new tasks
22-Oct-2010 07:29:21 [Milkyway@home] Started download of stars-td82-2stream_20.txt
22-Oct-2010 07:29:21 [Milkyway@home] Started download of de_separation_82_3s_20_1_185629_1287721425_search_parameters
22-Oct-2010 07:29:24 [Milkyway@home] Finished download of de_separation_82_3s_20_1_185629_1287721425_search_parameters
22-Oct-2010 07:29:36 [Milkyway@home] Finished download of stars-td82-2stream_20.txt
22-Oct-2010 07:29:42 [Einstein@Home] Computation for task h1_1099.75_S5R4__244_S5GC1a_0 finished
22-Oct-2010 07:29:42 [Einstein@Home] Restarting task h1_1099.75_S5R4__224_S5GC1a_1 using einstein_S5GC1 version 105
22-Oct-2010 07:29:44 [Einstein@Home] Started upload of h1_1099.75_S5R4__244_S5GC1a_0_0
22-Oct-2010 07:29:50 [Einstein@Home] Finished upload of h1_1099.75_S5R4__244_S5GC1a_0_0
22-Oct-2010 07:38:24 [Milkyway@home] Sending scheduler request: To fetch work.
22-Oct-2010 07:38:24 [Milkyway@home] Requesting new tasks for GPU
22-Oct-2010 07:38:52 [Milkyway@home] Scheduler request failed: Couldn't resolve host name
22-Oct-2010 07:38:57 [Collatz Conjecture] Sending scheduler request: To fetch work.
22-Oct-2010 07:38:57 [Collatz Conjecture] Reporting 1 completed tasks, requesting new tasks for GPU
22-Oct-2010 07:39:20 [Collatz Conjecture] Scheduler request completed: got 1 new tasks
22-Oct-2010 07:39:20 [---] Couldn't parse preferences file - using BOINC defaults
22-Oct-2010 07:39:20 [---] Reading preferences override file
22-Oct-2010 07:39:20 [---] Preferences:
22-Oct-2010 07:39:20 [---] max memory usage when active: 4000.55MB
22-Oct-2010 07:39:20 [---] max memory usage when idle: 7200.99MB
22-Oct-2010 07:39:20 [---] max disk usage: 10.00GB
22-Oct-2010 07:39:20 [---] max CPUs used: 3
22-Oct-2010 07:39:20 [---] (to change, visit the web site of an attached project,
22-Oct-2010 07:39:20 [---] or click on Preferences)
22-Oct-2010 07:39:22 [---] Project communication failed: attempting access to reference site
22-Oct-2010 07:39:22 [Collatz Conjecture] Started download of collatz_1286499150_1156660
22-Oct-2010 07:40:11 [---] BOINC can't access Internet - check network connection or proxy configuration.
22-Oct-2010 07:40:11 [Collatz Conjecture] Temporarily failed download of collatz_1286499150_1156660: can't resolve hostname
22-Oct-2010 07:40:11 [Collatz Conjecture] Backing off 1 min 0 sec on download of collatz_1286499150_1156660
22-Oct-2010 07:40:11 [climateprediction.net] Computation for task hadsm3dhet2_jme9_006592019_8 finished
22-Oct-2010 07:40:11 [climateprediction.net] Output file hadsm3dhet2_jme9_006592019_8_1.zip for task hadsm3dhet2_jme9_006592019_8 absent
22-Oct-2010 07:40:11 [climateprediction.net] Output file hadsm3dhet2_jme9_006592019_8_2.zip for task hadsm3dhet2_jme9_006592019_8 absent
22-Oct-2010 07:40:11 [climateprediction.net] Output file hadsm3dhet2_jme9_006592019_8_3.zip for task hadsm3dhet2_jme9_006592019_8 absent
22-Oct-2010 07:40:12 [Einstein@Home] Restarting task h1_1099.75_S5R4__214_S5GC1a_0 using einstein_S5GC1 version 105
22-Oct-2010 07:40:12 [Milkyway@home] Sending scheduler request: To fetch work.
22-Oct-2010 07:40:12 [Milkyway@home] Requesting new tasks for GPU
22-Oct-2010 07:40:28 [climateprediction.net] Computation for task hadsm3dhet2_jme8_006592018_3 finished
22-Oct-2010 07:40:28 [climateprediction.net] Output file hadsm3dhet2_jme8_006592018_3_1.zip for task hadsm3dhet2_jme8_006592018_3 absent
22-Oct-2010 07:40:28 [climateprediction.net] Output file hadsm3dhet2_jme8_006592018_3_2.zip for task hadsm3dhet2_jme8_006592018_3 absent
22-Oct-2010 07:40:28 [climateprediction.net] Output file hadsm3dhet2_jme8_006592018_3_3.zip for task hadsm3dhet2_jme8_006592018_3 absent
22-Oct-2010 07:40:28 [Einstein@Home] Starting p2030_54075_19693_0073_G189.73-02.68.C_0.dm_380_2
22-Oct-2010 07:40:29 [Einstein@Home] Starting task p2030_54075_19693_0073_G189.73-02.68.C_0.dm_380_2 using einsteinbinary_ABP2 version 108
22-Oct-2010 07:40:29 [Einstein@Home] Task h1_1099.75_S5R4__224_S5GC1a_1 exited with zero status but no 'finished' file
22-Oct-2010 07:40:29 [Einstein@Home] If this happens repeatedly you may need to reset the project.
22-Oct-2010 07:40:30 [Einstein@Home] Restarting task h1_1099.75_S5R4__224_S5GC1a_1 using einstein_S5GC1 version 105
22-Oct-2010 07:40:54 [Milkyway@home] Scheduler request failed: Couldn't connect to server
22-Oct-2010 07:41:12 [Collatz Conjecture] Started download of collatz_1286499150_1156660
22-Oct-2010 07:41:48 [Collatz Conjecture] Finished download of collatz_1286499150_1156660

23-Oct-2010 13:13:10 [Milkyway@home] Computation for task de_separation_82_3s_10_1_591473_1287779991_1 finished
23-Oct-2010 13:13:10 [Milkyway@home] [coproc_debug] Assigning CUDA instance 0 to de_separation_82_2s_20_1_595037_1287780659_1
23-Oct-2010 13:13:10 [Milkyway@home] Starting de_separation_82_2s_20_1_595037_1287780659_1
23-Oct-2010 13:13:10 [Milkyway@home] Starting task de_separation_82_2s_20_1_595037_1287780659_1 using milkyway version 24
23-Oct-2010 13:13:12 [Milkyway@home] Started upload of de_separation_82_3s_10_1_591473_1287779991_1_0
23-Oct-2010 13:13:19 [Milkyway@home] Finished upload of de_separation_82_3s_10_1_591473_1287779991_1_0
23-Oct-2010 13:26:43 [Milkyway@home] Computation for task de_separation_82_2s_20_1_595037_1287780659_1 finished
23-Oct-2010 13:26:43 [Collatz Conjecture] [coproc_debug] Assigning CUDA instance 0 to collatz_1286499150_1211437_1
23-Oct-2010 13:26:43 [Collatz Conjecture] Starting collatz_1286499150_1211437_1
23-Oct-2010 13:26:43 [Collatz Conjecture] Starting task collatz_1286499150_1211437_1 using collatz version 202
23-Oct-2010 13:26:45 [Milkyway@home] Started upload of de_separation_82_2s_20_1_595037_1287780659_1_0
23-Oct-2010 13:27:21 [Einstein@Home] Task h1_1099.75_S5R4__154_S5GC1a_0 exited with zero status but no 'finished' file
23-Oct-2010 13:27:21 [Einstein@Home] If this happens repeatedly you may need to reset the project.
23-Oct-2010 13:27:21 [---] Project communication failed: attempting access to reference site
23-Oct-2010 13:27:21 [Milkyway@home] Temporarily failed upload of de_separation_82_2s_20_1_595037_1287780659_1_0: can't resolve hostname
23-Oct-2010 13:27:21 [Milkyway@home] Backing off 1 min 0 sec on upload of de_separation_82_2s_20_1_595037_1287780659_1_0
23-Oct-2010 13:27:22 [Einstein@Home] Restarting task h1_1099.75_S5R4__154_S5GC1a_0 using einstein_S5GC1 version 105
23-Oct-2010 13:27:27 [climateprediction.net] Computation for task hadsm3dhet2_jkqz_006589885_7 finished
23-Oct-2010 13:27:27 [climateprediction.net] Output file hadsm3dhet2_jkqz_006589885_7_1.zip for task hadsm3dhet2_jkqz_006589885_7 absent
23-Oct-2010 13:27:27 [climateprediction.net] Output file hadsm3dhet2_jkqz_006589885_7_2.zip for task hadsm3dhet2_jkqz_006589885_7 absent
23-Oct-2010 13:27:27 [climateprediction.net] Output file hadsm3dhet2_jkqz_006589885_7_3.zip for task hadsm3dhet2_jkqz_006589885_7 absent
23-Oct-2010 13:27:27 [Einstein@Home] Resuming task h1_1099.75_S5R4__153_S5GC1a_1 using einstein_S5GC1 version 105
23-Oct-2010 13:27:28 [climateprediction.net] Computation for task hadsm3dhet2_jkqy_006589884_5 finished
23-Oct-2010 13:27:28 [climateprediction.net] Output file hadsm3dhet2_jkqy_006589884_5_1.zip for task hadsm3dhet2_jkqy_006589884_5 absent
23-Oct-2010 13:27:28 [climateprediction.net] Output file hadsm3dhet2_jkqy_006589884_5_2.zip for task hadsm3dhet2_jkqy_006589884_5 absent
23-Oct-2010 13:27:28 [climateprediction.net] Output file hadsm3dhet2_jkqy_006589884_5_3.zip for task hadsm3dhet2_jkqy_006589884_5 absent
23-Oct-2010 13:27:28 [Einstein@Home] Resuming task h1_1099.80_S5R4__146_S5GC1a_1 using einstein_S5GC1 version 105
23-Oct-2010 13:27:36 [---] Internet access OK - project servers may be temporarily down.
23-Oct-2010 13:28:21 [Milkyway@home] Started upload of de_separation_82_2s_20_1_595037_1287780659_1_0
23-Oct-2010 13:28:31 [Milkyway@home] Finished upload of de_separation_82_2s_20_1_595037_1287780659_1_0
23-Oct-2010 13:35:27 [---] Already attached - deleting project_init.xml
23-Oct-2010 13:37:34 [Milkyway@home] Sending scheduler request: To fetch work.
23-Oct-2010 13:37:34 [Milkyway@home] Reporting 2 completed tasks, requesting new tasks for GPU
23-Oct-2010 13:37:56 [---] Project communication failed: attempting access to reference site
23-Oct-2010 13:37:57 [---] BOINC can't access Internet - check network connection or proxy configuration.
23-Oct-2010 13:37:59 [Milkyway@home] Scheduler request failed: Couldn't connect to server
23-Oct-2010 13:38:04 [Collatz Conjecture] Sending scheduler request: To fetch work.
23-Oct-2010 13:38:04 [Collatz Conjecture] Requesting new tasks for GPU
23-Oct-2010 13:38:09 [Collatz Conjecture] Scheduler request failed: Couldn't resolve host name
23-Oct-2010 13:38:59 [Milkyway@home] Sending scheduler request: To fetch work.
23-Oct-2010 13:38:59 [Milkyway@home] Reporting 2 completed tasks, requesting new tasks for GPU
23-Oct-2010 13:39:24 [Milkyway@home] Scheduler request failed: Couldn't connect to server

4) Questions and Answers : Unix/Linux : hadam model restart errors (Message 34134)
Posted 24 Jun 2008 by Andris Pavenis
Post:
Did you Exit from BOINC before re-booting?
Highly recommended if you don\'t want to crash climate models.


Linux shut-down scripts stops BOINC automatically. I\'m using Fedora-8 boinc-client RPM which runs BOINC in daemon mode.


Several trickles after a re-start is OK. If you look at the trickle file, you\'ll see that they are type \'cse\', whereas \'normal\' trickles are type \'orig\'.
type \'cse\' just tells the server that the model is back up and running.

As for continuing, is the day/month/year constantly advancing in the graphics?
If so, continue.
If it keeps jumping backwards, it may be best to abort.



Text output from BOINC client shows that there are no backjumps (except of
using last checkpoint after restarting). In daemon mode graphics does not seenm to be working.

Tried to:
1) suspended CPDN model
2) restarted BOINC (/etc/init.d/boinc-client stop with following starting it)
3) resumed CPDN model

It still exited and restarted from checkpoint in the same way as in earlier included log.
5) Questions and Answers : Unix/Linux : hadam model restart errors (Message 34130)
Posted 24 Jun 2008 by Andris Pavenis
Post:
After booting once computer found that BOINC crashed and restarted and HADAM model time has ben reset to some hours (see log below). After that HADAM model each time restarts and sends trickle message twice after starting BOINC client.

Additionally consumed time was reset to something near zero. I guess this time corruption is BOINC not project as I have seen similar time corruption also with 2 Einstein@Home workunits more than once.

Is it worth to continue the model?

System information: Linux, Fedora 8 x86_64, etc.

Andris

hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018737 A - 11/08/2000 02:50 - H:M:S=0075:45:34 AVG=14.56 DLT=10.57
hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018738 A - 11/08/2000 03:00 - H:M:S=0075:45:43 AVG=14.56 DLT= 9.36
hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018739 A - 11/08/2000 03:10 - H:M:S=0075:45:53 AVG=14.56 DLT=10.00
Cleaning up graphics data...
Detaching shared memory...
23-Jun-2008 18:55:31 [---] Starting BOINC client version 5.10.45 for x86_64-pc-linux-gnu
23-Jun-2008 18:55:32 [---] log flags: task, file_xfer, sched_ops
23-Jun-2008 18:55:32 [---] Libraries: libcurl/7.18.2 NSS/3.12.0.3 zlib/1.2.3 libidn/0.6.14
23-Jun-2008 18:55:32 [---] Executing as a daemon
23-Jun-2008 18:55:32 [---] Data directory: /var/lib/boinc
23-Jun-2008 18:55:32 [Einstein@Home] Found app_info.xml; using anonymous platform
23-Jun-2008 18:55:32 [---] Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU @ 2.40GHz [Family 6 Model 15 Stepping 7]
23-Jun-2008 18:55:32 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
23-Jun-2008 18:55:32 [---] OS: Linux: 2.6.25.6-27.fc8
23-Jun-2008 18:55:32 [---] Memory: 3.87 GB physical, 3.91 GB virtual
23-Jun-2008 18:55:32 [---] Disk: 47.30 GB total, 29.55 GB free
23-Jun-2008 18:55:32 [---] Local time is UTC +3 hours
23-Jun-2008 18:55:33 [climateprediction.net] URL: http://climateprediction.net/; Computer ID: 826910; location: home; project prefs: default
23-Jun-2008 18:55:33 [3x+1@home] URL: http://allprojectstats.com/collatz/; Computer ID: 1108; location: (none); project prefs: default
23-Jun-2008 18:55:33 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: 4174592; location: home; project prefs: default
23-Jun-2008 18:55:33 [orbit@home] URL: http://orbit.psi.edu/oah/; Computer ID: 3512; location: (none); project prefs: default
23-Jun-2008 18:55:33 [Cosmology@Home] URL: http://www.cosmologyathome.org/; Computer ID: 14606; location: (none); project prefs: default
23-Jun-2008 18:55:33 [Milkyway@home] URL: http://milkyway.cs.rpi.edu/milkyway/; Computer ID: 9818; location: (none); project prefs: default
23-Jun-2008 18:55:33 [Einstein@Home] URL: http://einstein.phys.uwm.edu/; Computer ID: 1099657; location: home; project prefs: default
23-Jun-2008 18:55:33 [---] General prefs: from http://cosmologyathome.org/ (last modified 26-Jan-2008 22:13:49)
23-Jun-2008 18:55:33 [---] Host location: none
23-Jun-2008 18:55:33 [---] General prefs: using your defaults
23-Jun-2008 18:55:33 [---] Reading preferences override file
23-Jun-2008 18:55:33 [---] Preferences limit memory usage when active to 1983.22MB
23-Jun-2008 18:55:33 [---] Preferences limit memory usage when idle to 3569.80MB
23-Jun-2008 18:55:33 [---] Preferences limit disk usage to 9.31GB
23-Jun-2008 18:55:33 [climateprediction.net] Restarting task hadam3h_c_52s16_2000_2000_1_0 using hadam3 version 503
23-Jun-2008 18:55:41 [Einstein@Home] Restarting task h1_1089.05_S5R3__350_S5R3b_1 using einstein_S5R3 version 438
23-Jun-2008 18:55:41 [Einstein@Home] Restarting task h1_1089.05_S5R3__349_S5R3b_0 using einstein_S5R3 version 438
23-Jun-2008 18:55:42 [Einstein@Home] Restarting task h1_1089.05_S5R3__549_S5R3b_2 using einstein_S5R3 version 438
Beginning work on result hadam3h_c_52s16_2000_2000_1_0...
Starting model in /var/lib/boinc/projects/climateprediction.net...
Created shared memory region key = 113980 of size 5009240 bytes (version 602)
.so shmem return code = 1152
Starting model ID hadam3h_c_52s16_2000_2000_1 Phase 1
Program launched with process id # 3583
Climate model starting - use graphics to monitor progress.
Or visit the website to see the graphs for this run.
Getting pthread attributes - retval=0
Setting pthread size (576716800 bytes) - retval=0
Executing program hadam3_um_5.03_i686-pc-linux-gnu 113980 dth20l_052s16.anc ssta2000.anc sicea2000.anc
hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018721 A - 11/08/2000 00:10 - H:M:S=0075:41:25 AVG=14.56 DLT= 0.00
23-Jun-2008 18:55:49 [climateprediction.net] Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
23-Jun-2008 18:55:54 [climateprediction.net] Scheduler request succeeded: got 0 new tasks
Cleaning up graphics data...
Detaching shared memory...
23-Jun-2008 18:56:11 [climateprediction.net] Task hadam3h_c_52s16_2000_2000_1_0 exited with zero status but no \'finished\' file
23-Jun-2008 18:56:11 [climateprediction.net] If this happens repeatedly you may need to reset the project.
23-Jun-2008 18:56:11 [climateprediction.net] Restarting task hadam3h_c_52s16_2000_2000_1_0 using hadam3 version 503
Beginning work on result hadam3h_c_52s16_2000_2000_1_0...
Starting model in /var/lib/boinc/projects/climateprediction.net...
Created shared memory region key = 113980 of size 5009240 bytes (version 602)
.so shmem return code = 1152
Starting model ID hadam3h_c_52s16_2000_2000_1 Phase 1
Getting pthread attributes - retval=0
Setting pthread size (576716800 bytes) - retval=0
Executing program hadam3_um_5.03_i686-pc-linux-gnu 113980 dth20l_052s16.anc ssta2000.anc sicea2000.anc
Program launched with process id # 3596
Climate model starting - use graphics to monitor progress.
Or visit the website to see the graphs for this run.
hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018721 A - 11/08/2000 00:10 - H:M:S=0001:13:57 AVG= 0.24 DLT= 0.00
23-Jun-2008 18:56:15 [climateprediction.net] Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
23-Jun-2008 18:56:20 [climateprediction.net] Scheduler request succeeded: got 0 new tasks
hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018722 A - 11/08/2000 00:20 - H:M:S=0001:15:28 AVG= 0.24 DLT=90.95
hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018723 A - 11/08/2000 00:30 - H:M:S=0001:15:37 AVG= 0.24 DLT= 8.99
hadam3h_c_52s16_2000_2000_1 - PH 1 TS 0018724 A - 11/08/2000 00:40 - H:M:S=0001:15:47 AVG= 0.24 DLT= 9.99
6) Questions and Answers : Unix/Linux : HTTP server internal errors when sending trickle (Message 31334)
Posted 12 Nov 2007 by Andris Pavenis
Post:

What is your networking setup like? (home PC, work PC, university PC, on broadband, dialup, wireless, ...? Does it use a proxy server?)


Home PC, ADSL (currently 4M/512K), WLAN, no proxy server in use. I had this problem with Fedora 7. I updated to released Fedora 8 (had to do a fresh install as upgrade did not work) and exactly now I\'m seeing the same problem again.

This problem happens much more often with HadCM3 trickles rather rather than HadSM3 ones (HadSM3 ones usually succeed on HadCM3 usually only after retries).

Computer ID 750202, HadCM3 work unit ID 6081946.


7) Questions and Answers : Unix/Linux : HTTP server internal errors when sending trickle (Message 31314)
Posted 9 Nov 2007 by Andris Pavenis
Post:
I\'m often (not 100% time) getting error messages like

2007-11-09 09:37:56 [climateprediction.net] Sending scheduler request: To send trickle-up message
2007-11-09 09:37:56 [climateprediction.net] (not requesting new work or reporting completed tasks)
hadsm3fub_e065_005907913 - PH 3 TS 0223201 A - 01/11/2063 00:30 - H:M:S=0186:04:44 AVG= 0.90 DLT= 1.00
hadsm3fub_0523_005908484 - PH 1 TS 0173377 A - 13/12/1820 00:30 - H:M:S=0046:44:41 AVG= 0.97 DLT= 0.92
2007-11-09 09:38:52 [climateprediction.net] Scheduler request failed: HTTP internal server error
2007-11-09 09:38:52 [climateprediction.net] Deferring communication for 15 min 22 sec
2007-11-09 09:38:52 [climateprediction.net] Reason: scheduler request failed

when CPDN tries to send trickle. If it would be 100% time, then I would perhaps suspect firewall on my side. Sending trickle however often goes without errors.

Sample of output of tcpdump of failed trickle sending
is at http://ap1.pp.fi/tmp/cpdn-tcpdump.dat.bz2.

It happens also bit to often to suspect CPDN server (unless they are heavily overloaded). CPDN Server status page also does not show any problems

8) Questions and Answers : Unix/Linux : hadcm3lb version 5.15 crashes when showing graphics. (Message 29537)
Posted 13 Jul 2007 by Andris Pavenis
Post:
5) triggered error by trying to view graphics

This has been a common problem. It often results from old video drivers or conflict with another program using heavy graphics. (Unfortunately, most of that experience has been in Windows.)

Does your machine have a graphics card or on-board graphics chip? In either case, the vendor might have an updated driver good for Linux.


On-board. I\'m using standard drivers comming with X11. I have bad experience with ATI binary drivers for Radeon cards. They screw up system so, that I have to boot from rescue CD to recover (I have not tested sor some time now. Screwing up graphics twice was enough).


I moved project to a different computer (3.0GHz Pentium 4 HT, 1.5GB memory, Fedora Core 6). The problem remains the same - when I try to see graphics, CPDN application crashes and restarts from the last checkpoint.

Video card is different - lspci says:
00:02.0 VGA compatible controller: Intel Corporation 82915G/GV/910GL Integrated Graphics Controller (rev 04)
9) Questions and Answers : Unix/Linux : hadcm3lb version 5.15 crashes when showing graphics. (Message 28862)
Posted 21 May 2007 by Andris Pavenis
Post:
5) triggered error by trying to view graphics

This has been a common problem. It often results from old video drivers or conflict with another program using heavy graphics. (Unfortunately, most of that experience has been in Windows.)

Does your machine have a graphics card or on-board graphics chip? In either case, the vendor might have an updated driver good for Linux.


On-board. I\'m using standard drivers comming with X11. I have bad experience with ATI binary drivers for Radeon cards. They screw up system so, that I have to boot from rescue CD to recover (I have not tested sor some time now. Screwing up graphics twice was enough).
10) Questions and Answers : Unix/Linux : hadcm3lb version 5.15 crashes when showing graphics. (Message 28822)
Posted 20 May 2007 by Andris Pavenis
Post:
Your Firefox activity won\'t lock a CPDN file as some virus software does. CPDN doesn\'t react well when expected files are \"missing\".

Another issue arises when people try to squeeze this large and hungry software system into a too-small computer -- swapping and resultant timing relationships among OS/boinc/CPDN can become problematic. The graphics layer triggers some of that, too. Much as we\'d like to have the cake and eat it, too, these PCs won\'t behave like a Cray.


I would not run CPDN, if computer would swap as crazy...


All in all, considering the array of machine and OS types in which CPDN runs, and the vast array of participant\'s run mixes, I see CPDN as remarkably robust/resilient.


The problem appears when model year reaches 1951 or 1952 (currently active model, after error it restarts from the last checkpoint). Tried to get more info with the
following steps:
1) backed up BOINC directory
2) disconnected from network (to avoid unnecessary information from being sent to server)
3) started model
4) attached GDB to application (hadcm3transum_5.15_i686-pc-linux-gnu)
5) triggered error by trying to view graphics
6) tried to get backtrace in GDB
7) stopped boinc
8) restored BOINC directory from backup

Unfortunately there is not enough information in executables for backtrace to have of much use. At least one can see the exception (SIGFPE) which happens at the begin (SIGSEGV follows after that). If I would have executable with at least bit debug info, I could get more reasonable traceback.

Andris

#0 0xb7b11e88 in ?? ()
#1 0x3f800000 in ?? ()
#2 0x3f800000 in ?? ()
#3 0x3f800000 in ?? ()
#4 0x3f800000 in ?? ()
#5 0x00001d80 in ?? ()
#6 0x00001f80 in ?? ()
#7 0xbf87ce00 in ?? ()
#8 0xb7b05f9e in ?? ()
#9 0x080502d8 in pthread_create ()
#10 0x00000080 in ?? ()
#11 0x00000000 in ?? ()
11) Questions and Answers : Unix/Linux : hadcm3lb version 5.15 crashes when showing graphics. (Message 28795)
Posted 19 May 2007 by Andris Pavenis
Post:
Having it\'s origins in a supercomputer program, it\'s not used to (or intended to) having to compete with other Windows programs for hardware resources.

People who run other resource heavy programs need to be a bit protective of their climate models at such times. Suspend BOINC (and the model), before running the other program(s).



Is it really so fragile? I have many times run simultaneously compiling Mozilla Firefox development versions from source (both directly and in VmWare virtial machine running different Linux distributions) under Linux. I haven\'t seen crashes though except when trying to view graphics. Of course all these processes does not interact directly with CPDN (which viewing graphics does)
12) Questions and Answers : Unix/Linux : hadcm3lb version 5.15 crashes when showing graphics. (Message 26365)
Posted 26 Jan 2007 by Andris Pavenis
Post:
Processing workunit hadcm3lbm_azon_25282074 crashes when trying to show graphics. It didn\'t happen at the begin, but only when processing reached year 1952. Graphics window shows only Earth image up to coastlines and after that hadcm3lb version: 5.15 crashes:

Fatal signal caught, cleanup CPDN run and restart...

Tried to restart from earlier backup, but the problem reappeared when year 1952 was reached.

SYstem info: Linux, Fedora Core 6, Pentium 4.
13) Questions and Answers : Preferences : Failed to change e-mail address for account (Message 24222)
Posted 6 Sep 2006 by Andris Pavenis
Post:
thanks for posting this note, it should be fixed now!


Thanks, now works.
14) Questions and Answers : Preferences : Failed to change e-mail address for account (Message 24203)
Posted 6 Sep 2006 by Andris Pavenis
Post:
i got this error message today. any idea when the problem will be solved?

markus


Maybe somebody should remaind them again. Otherwise the problem may remain forgotten in queue of more urgent works for a very long time. But it would be not nice if too many of us would try to do that.
15) Questions and Answers : Preferences : Failed to change e-mail address for account (Message 24112)
Posted 28 Aug 2006 by Andris Pavenis
Post:
E-mail address, which I used about 1 year ago to register is about to go away. I tried to change the address to the current one, but did not succeed.

Here is error message, which I got:

Fatal error: Cannot redeclare send_verify_email() (previously declared in /websites/boinc/projects/cpdnboinc/html/user/edit_email_action.php:8) in /websites/boinc/projects/cpdnboinc/html/inc/email.inc on line 60

Some system information:
Linux (Fedora Core 5), Firefox 2.0b1.




©2024 climateprediction.net