1)
Questions and Answers :
Unix/Linux :
Multiple failures
(Message 32303)
Posted 23 Jan 2008 by old_user61264 Post: I think I\'ve found something that helps. I installed boinc through the Ubuntu servers (something like apt-get install boinc-client). That put an entry in /etc/init.d that explicitly starts up boinc when I boot Ubuntu. More importantly, it explicitly shuts down boinc when I shutdown the machine. This gives boinc/cpdn the time and commands to shut down gracefully during the process. I\'m about half way through the next model series, and hopeful I\'ll get a success this time. Nic out |
2)
Questions and Answers :
Unix/Linux :
Multiple failures
(Message 32197)
Posted 16 Jan 2008 by old_user61264 Post: Hey Mark, Actually, I\'m not sure what is happening when the \'client error\' occurs. I really only check my CPDN numbers once a month or so. Hence, it is usually days to weeks past as CPDN has efficiently sent me a new work unit. The two machine run pretty much the same software (Ubuntu dual boot with WinXp, my professional software). The one real difference is that the home computer is rebooted almost every evening to play WoW with the kids, while the lab computer goes weeks between reboots. I\'ll give a try with the explicit boinc quit command to see if it helps. Otherwise there is lot of new research I can do in your \'README\' collection. I\'ll poke around. Thanks for your help. Nic out The AMD errors look similar (signal 11, and error code 139, which I think is the same thing), but much more frequent. What was happening on the PC at the moment those crashes took place? Is there anything in the Boinc messages log? (or stderr/stdout?). |
3)
Questions and Answers :
Unix/Linux :
Multiple failures
(Message 32181)
Posted 15 Jan 2008 by old_user61264 Post: Thanks for the response. Actually, the AMD PC is at home and pretty much runs CPDN anytime I don\'t reboot it into WinXP to play WoW. The Intel PC is my workstation at the lab and is regularly running heavy jobs. I start them both with; cd ~/bin/boinc nohup ./run_client > test.log & and let them run until I need the CPU for something else. Nic out On the AMD PC, it appears as if when one model fails, the other one on the dual core PC also fails within a few/several minutes. It\'s almost like they error out on an unclean shutdown of boinc, or when some other intensive process runs. If it was pure PC instability, they would be failing at various times, instead of nearly the same time for both runs of a pair. |
4)
Questions and Answers :
Unix/Linux :
Multiple failures
(Message 32178)
Posted 15 Jan 2008 by old_user61264 Post: All, I have two machines currently crunching climate prediction clients. They both run Ubuntu 7.10. One is a dual processor AMD chip on an Asus motherboard, and repeatedly (20x) ends the run with \'Client Error\'. It is always at different places in the run with between 60k and 2,000k CPU seconds committed. The second machine has an Intel dual core (4gb memory, runs 64bit Ubuntu), and has success about half the time, and Client error about half the time. Am I wasting my time/energy trying to do Climate Prediction? From the figures I have the impression I am contributing very little to the effort in spite of months of CPU time. Thanks, Nic out |
©2024 climateprediction.net