climateprediction.net home page
Posts by old_user61264

Posts by old_user61264

1) Questions and Answers : Unix/Linux : Multiple failures (Message 32303)
Posted 23 Jan 2008 by old_user61264
Post:
I think I\'ve found something that helps. I installed boinc through the Ubuntu servers (something like apt-get install boinc-client). That put an entry in /etc/init.d that explicitly starts up boinc when I boot Ubuntu. More importantly, it explicitly shuts down boinc when I shutdown the machine. This gives boinc/cpdn the time and commands to shut down gracefully during the process.

I\'m about half way through the next model series, and hopeful I\'ll get a success this time.

Nic out
2) Questions and Answers : Unix/Linux : Multiple failures (Message 32197)
Posted 16 Jan 2008 by old_user61264
Post:
Hey Mark,

Actually, I\'m not sure what is happening when the \'client error\' occurs. I really only check my CPDN numbers once a month or so. Hence, it is usually days to weeks past as CPDN has efficiently sent me a new work unit.

The two machine run pretty much the same software (Ubuntu dual boot with WinXp, my professional software). The one real difference is that the home computer is rebooted almost every evening to play WoW with the kids, while the lab computer goes weeks between reboots.

I\'ll give a try with the explicit boinc quit command to see if it helps. Otherwise there is lot of new research I can do in your \'README\' collection. I\'ll poke around.

Thanks for your help.

Nic out



The AMD errors look similar (signal 11, and error code 139, which I think is the same thing), but much more frequent. What was happening on the PC at the moment those crashes took place? Is there anything in the Boinc messages log? (or stderr/stdout?).

Is there any software in common between your home and work PCs? Something you run more frequently at home?

How do you stop Boinc running when you need it for something else? If you use \'kill -9\' or similar I\'d recommend using \'boinc_cmd --quit\' instead.


3) Questions and Answers : Unix/Linux : Multiple failures (Message 32181)
Posted 15 Jan 2008 by old_user61264
Post:
Thanks for the response.

Actually, the AMD PC is at home and pretty much runs CPDN anytime I don\'t reboot it into WinXP to play WoW. The Intel PC is my workstation at the lab and is regularly running heavy jobs.

I start them both with;
cd ~/bin/boinc
nohup ./run_client > test.log &

and let them run until I need the CPU for something else.

Nic out




On the AMD PC, it appears as if when one model fails, the other one on the dual core PC also fails within a few/several minutes. It\'s almost like they error out on an unclean shutdown of boinc, or when some other intensive process runs. If it was pure PC instability, they would be failing at various times, instead of nearly the same time for both runs of a pair.

How do you start boinc on that PC. Does that PC run other intensive programs at various times during the day?

4) Questions and Answers : Unix/Linux : Multiple failures (Message 32178)
Posted 15 Jan 2008 by old_user61264
Post:
All,

I have two machines currently crunching climate prediction clients. They both run Ubuntu 7.10. One is a dual processor AMD chip on an Asus motherboard, and repeatedly (20x) ends the run with \'Client Error\'. It is always at different places in the run with between 60k and 2,000k CPU seconds committed.

The second machine has an Intel dual core (4gb memory, runs 64bit Ubuntu), and has success about half the time, and Client error about half the time.

Am I wasting my time/energy trying to do Climate Prediction? From the figures I have the impression I am contributing very little to the effort in spite of months of CPU time.

Thanks,

Nic out




©2024 climateprediction.net