climateprediction.net home page
CPDN Crash timing.

CPDN Crash timing.

Questions and Answers : Windows : CPDN Crash timing.
Message board moderation

To post messages, you must log in.

AuthorMessage
John McLeod VII
Avatar

Send message
Joined: 5 Aug 04
Posts: 172
Credit: 4,023,611
RAC: 0
Message 21999 - Posted: 12 Apr 2006, 13:41:00 UTC

I have noticed that every time that my system gets busy with other tasks (the work for which the computer is sitting on my desk) for a couple of hours, CPDN starts multiple crunching processes (the ones that actually do the work). After the system becomes less busy, CPDN crashes. This happens every time.

I believe that the process that BOINC starts starts a task to do the crunching, and if the task that BOINC starts loses commmunications with the crunching task, another crunching task is started. I also believe that a mutex that the cruncher locks when it is started would work. This way, the task that BOINC starts could check the state of the mutex. If it is abandoned, the crunching task failed, and needs to be restarted.


BOINC WIKI
ID: 21999 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 22005 - Posted: 12 Apr 2006, 22:47:18 UTC
Last modified: 12 Apr 2006, 22:47:57 UTC

Yes, I\'ve noticed something similar myself (I believe the problem was reported to the Boinc development team).

The problem seems to be that the manager runs at normal priority, the projects work at idle priority, so if something takes up all the idle CPU for long enough, then the manager\'s \'are you alive\' queries go unanswered, and it assumes that the project(s) have crashed, downloads a new one, ... etc.

Ever since then I\'ve always \'suspended\' boinc before playing games etc.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 22005 · Report as offensive     Reply Quote
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1278
Credit: 15,779,438
RAC: 0
Message 22015 - Posted: 13 Apr 2006, 9:06:14 UTC
Last modified: 13 Apr 2006, 9:35:58 UTC

I\'ve noticed the same thing, but I think the problem might lie in the BOINC API code. Every time it\'s happened to me it\'s immediately following an exited with zero status but no \'finished\' file message.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 22015 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 22028 - Posted: 13 Apr 2006, 22:14:22 UTC

I suspect that the \'zero exit\' message is actually misleading in this case - my hunch is that the manager thinks the WU has crashed, when it hasn\'t. Hence in the worst case you can get several copies of the project code running a single WU (probably leading to corruption fairly soon).
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 22028 · Report as offensive     Reply Quote
John McLeod VII
Avatar

Send message
Joined: 5 Aug 04
Posts: 172
Credit: 4,023,611
RAC: 0
Message 22047 - Posted: 15 Apr 2006, 0:23:00 UTC

CPDN itself is split into two pieces. One that does the actual work (and it was several of these that I saw running), and one the communicates with BOINC and also with the worker (I have only ever seen one of these running). My belief is that it is the communications between the two CPDN pieces that is causing the trouble.

Yes, I did report this on the BOINC Dev list, but got absoloutely no feedback that it had been received. Yes, the machine in question went to 99% CPU usage for a couple of hours on a normal priority process - so BOINC was getting nothing. I can replicate this completely at will, and I had yet another CPDN result crash this afternoon on that machine.

It is my opinion that if this is solved, it would probably cut the number of crashed runs by about 90%. I am also giving up on CPDN on that host in frustration (until such a time as this is reported fixed).


BOINC WIKI
ID: 22047 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 22048 - Posted: 15 Apr 2006, 0:24:51 UTC

You could set the timer so that it only runs Boinc on the hours you know you won\'t be using the machine? (Say, 2am - 8am or whatever).
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 22048 · Report as offensive     Reply Quote
Profile Andrew Hingston
Volunteer moderator

Send message
Joined: 17 Aug 04
Posts: 753
Credit: 9,804,700
RAC: 0
Message 22050 - Posted: 15 Apr 2006, 8:07:09 UTC - in response to Message 22048.  

You could set the timer so that it only runs Boinc on the hours you know you won\'t be using the machine? (Say, 2am - 8am or whatever).

Can\'t see JM7 doing that ;-)

ID: 22050 · Report as offensive     Reply Quote
John McLeod VII
Avatar

Send message
Joined: 5 Aug 04
Posts: 172
Credit: 4,023,611
RAC: 0
Message 22074 - Posted: 15 Apr 2006, 21:38:33 UTC - in response to Message 22050.  

You could set the timer so that it only runs Boinc on the hours you know you won\'t be using the machine? (Say, 2am - 8am or whatever).

Can\'t see JM7 doing that ;-)

The machine normally has bunches of idle time, even when I am using it, and it is ONLY CPDN that has this problem. CPDN ONLY crashes when the machine is running at 100% CPU with a normal process task for long enough so that extra CPDN crunching applications are started. Setting a timer for that machine would mean that other projects would not to get to run during the time that the machine might (but usually doesn\'t) run at 100% for a couple of hours. So, no, I am not going to set a timer to turn BOINC off. It would also cut the running time for that machien down to 1/3 of the time (as in it would be running 8 hours / day).

BTW, I lost another CPDN result yesterday in exactly the same manner.


BOINC WIKI
ID: 22074 · Report as offensive     Reply Quote

Questions and Answers : Windows : CPDN Crash timing.

©2020 climateprediction.net