climateprediction.net home page
Posts by old_user6656

Posts by old_user6656

1) Questions and Answers : Unix/Linux : Stability Problems on SMP Linux? (Message 2889)
Posted 3 Sep 2004 by old_user6656
Post:

Hi

> I hope you are not also being bitten by SuSE 9.1!

I hope not to bee, and yes the kernel version provided on the DVD did not work for SMP on P4 HT. Problems went from finding it, but not using it, up to freezing the system. But up to now some updates took place....

And I do not consider this to be the problem. No other program I am using has any problem with the HT Kernel - especially seti boinc is up fine. The Kernel itself is working, and reporting the "two" cpus properly
2) Questions and Answers : Unix/Linux : Stability Problems on SMP Linux? (Message 2592)
Posted 1 Sep 2004 by old_user6656
Post:
Seems that CP / boinc (4.05) is somewhat instable in SMP Kernel (P4 HyperThreading). (Suse 9.1; 2.6.5-7.104-smp). The problem does not exist on uniprocessor Machines.
I already tried do detach / reatach the machine, to get a fresh version of the cp clients. But that does not change anything. The downloaded files are identical to the ones on my uniprocessor machines:

hadsm3_4.03_i686-pc-linux-gnu
hadsm3se_4.03_i686-pc-linux-gnu
hadsm3um_4.03_i686-pc-linux-gnu

The log only says this (alternating)
....
Model timeout at 180.00 seconds
Model crashed...retrying...restart level 0
Preparing for restart...
Rewinding a model-day...
Starting model ID 05x4_000032685 Phase 1
Stack size=4096.00 MB
Waiting for model startup, this may take a minute...
05x4_000032685 - PH 1 TS 000001 - 00/00/0000 00:00 - H:M:S=0000:00:00 AVG= 0.00 DLT= 0.00
Model timeout at 180.00 seconds
Model crashed...retrying...restart level 1
Preparing for restart...
Rewinding a model-month...
Error: Restart files for dataout/restart.month not found
Giving up, this result exceeded crash count for available restart files.
adding: ncatts.cpdc (deflated 72%)
adding: climate.cont (deflated 79%)
adding: climate.cpdc (deflated 79%)
adding: climate.doub (deflated 79%)
adding: climate.spin (deflated 79%)
adding: 05x4_000032685.xml (deflated 65%)
adding: ncatts.cpdc (deflated 72%)
adding: ncatts.cpdc (deflated 72%)
adding: ncatts.cpdc (deflated 72%)
adding: stderr_um.txt (deflated 75%)
adding: yabsd.out (deflated 93%)
adding: restart.day (deflated 43%)
2004-09-02 00:05:09 [climateprediction.net] Unrecoverable error for result 05x4_000032685_0 (process exited with code 251 (0xfb))


Top tells me that a process is defunctional:
26381 distrib 34 19 3480 1512 2776 S 0.0 0.3 0:00.24 hadsm3_4.03_i68
26543 distrib 34 19 0 0 0 Z 0.0 0.0 0:00.31 hadsm3um_4.03_i





©2024 climateprediction.net