climateprediction.net home page
WUs crash when closing BOINC
WUs crash when closing BOINC
log in

Advanced search

Message boards : Number crunching : WUs crash when closing BOINC

Author Message
jjv
Send message
Joined: 21 Apr 12
Posts: 2
Credit: 9,006,731
RAC: 11,520
Message 54770 - Posted: 11 Sep 2016, 8:25:46 UTC

When for whatever reason I have to close BOINC or reboot my machine I dread seeing climateprediction units running. It is pretty much a certainty that at least some of them will crash and burn. Last reboot I lost five.
So is there anything I can do to limit the problem? 'Leave tasks in memory' on or off? Change the requested checkpoint limit? Suspend before closing?

JJ

Profile Dave Jackson
Volunteer moderator
Send message
Joined: 15 May 09
Posts: 2387
Credit: 3,073,986
RAC: 370
Message 54771 - Posted: 11 Sep 2016, 9:04:25 UTC - in response to Message 54770.

Leave tasks in memory is better on all the time. I always suspend tasks and exit BOINC before closing down. The problem you mention seems to be even worse on Linux for some reason if I run windows tasks in WINE on linux it is better but I still suspend tasks and exit boinc, leaving a decent interval between the two and then close down.

Problem doesn't seem to occur if I use Hibernate instead of closing down.

Lockleys
Send message
Joined: 13 Jan 07
Posts: 183
Credit: 9,541,689
RAC: 4,256
Message 54772 - Posted: 11 Sep 2016, 9:07:30 UTC - in response to Message 54770.
Last modified: 11 Sep 2016, 9:08:16 UTC

I usually close down like this: 1) go to the tasks tab and close each task individually, starting with the ones that are still waiting to run; 2) go to the projects tab and close climateprediction; 3) click File>Exit BOINC. For me, this procedure seems almost faultless. When it's closed down, I usually make a backup of the BOINC data folder, even though it is becoming unfashionable to do so as the task durations reduce from what was over a year to complete a single task when I started crunching.
By the way, having "Leave Tasks in Memory = YES" is desirable but for completely different reasons, so I do that too.

EDIT, Dave Jackson beat me.

jjv
Send message
Joined: 21 Apr 12
Posts: 2
Credit: 9,006,731
RAC: 11,520
Message 54773 - Posted: 11 Sep 2016, 13:36:30 UTC

Hmm, manually suspending individually is not really an option since this computer is running a rather large amount of work at any time. So suspend then close is recommended?
Is there any idea on the cause of the issue? HDD congestion, BOINC shutting down too fast etc.

JJ

Les Bayliss
Volunteer moderator
Send message
Joined: 5 Sep 04
Posts: 6866
Credit: 20,843,205
RAC: 216
Message 54774 - Posted: 11 Sep 2016, 18:03:46 UTC - in response to Message 54773.

There are a lot of open files with this project, which need to be closed before the computer shuts down. And if you have a large amount of work on that computer then there's a good chance that this won't happen if you just shut down with everything still running.

HDD congestion, BOINC shutting down too fast ...
Both

This has been posted about many times over the years.
At the very least, Suspend BOINC in the menu, and wait a few seconds before Exiting from BOINC.

Eirik Redd
Send message
Joined: 31 Aug 04
Posts: 362
Credit: 110,737,592
RAC: 168,104
Message 54781 - Posted: 13 Sep 2016, 8:58:19 UTC - in response to Message 54772.

What you do works for me. I haven't lost a WU since June.
I do backups before serious kernel updates. Works for me.

I usually close down like this: 1) go to the tasks tab and close each task individually, starting with the ones that are still waiting to run; 2) go to the projects tab and close climateprediction; 3) click File>Exit BOINC. For me, this procedure seems almost faultless. When it's closed down, I usually make a backup of the BOINC data folder, even though it is becoming unfashionable to do so as the task durations reduce from what was over a year to complete a single task when I started crunching.
By the way, having "Leave Tasks in Memory = YES" is desirable but for completely different reasons, so I do that too.

EDIT, Dave Jackson beat me.


____________

Jean-David Beyer
Send message
Joined: 5 Aug 04
Posts: 274
Credit: 2,995,866
RAC: 1,353
Message 55856 - Posted: 6 Mar 2017, 3:22:08 UTC

Is this the same problem that everyone is talking about?

Name wah2_sas50_mtb4_201512_13_530_010941345_1
Workunit 10941345
Created 24 Feb 2017, 17:37:19 UTC
Sent 24 Feb 2017, 17:38:24 UTC
Report deadline 6 Feb 2018, 22:58:24 UTC
Received 5 Mar 2017, 3:55:37 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 1256552
Run time 7 days 21 hours 44 min 24 sec
CPU time 7 days 11 hours 46 min 12 sec
Validate state Initial
Credit 0.00
Device peak FLOPS 1.28 GFLOPS
Application version Weather At Home 2 (wah2) (region independent) v8.12
i686-pc-linux-gnu
stderr out

<core_client_version>7.2.33</core_client_version>
<![CDATA[
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...
Leaving CPDN_Main::Monitor...
10:24:10 (3024): called boinc_finish(0)

</stderr_txt>
]]>

I did not shut down boinc, but I did turn of the boincmanager by selecting File->ExitBoincManager, that I have been doing for years, but I do not recall this ever happing before.

Running Red Hat Enterprise Linux Server release 6.8 (Santiago)
____________

Les Bayliss
Volunteer moderator
Send message
Joined: 5 Sep 04
Posts: 6866
Credit: 20,843,205
RAC: 216
Message 55857 - Posted: 6 Mar 2017, 5:36:20 UTC

The Manager is just a gui to see what the important bit is doing - boinc-client.

And it's the client that needs to be stopped and exited from.

My process now is:

In menu -> Activity, Network activity suspended
Above it, Suspend BOINC
Wait a few seconds
In menu -> File Exit

Then there's a pop up window, which asks if I want to suspend running tasks, so I click Yes.

**********

That should eliminate 99.9% of known problems. :)

Profile Dave Jackson
Volunteer moderator
Send message
Joined: 15 May 09
Posts: 2387
Credit: 3,073,986
RAC: 370
Message 55858 - Posted: 6 Mar 2017, 9:04:21 UTC

That should eliminate 99.9% of known problems. :)


I haven't done many reboots to update kernel recently but in the past, even doing that would at intervals crash tasks under Linux. Now I reboot as little as possible and rather than exiting boinc I suspend activity then hibernate if I am turning a machine off. I have yet to have a task crash after restarting when hibernate has been used even when I have forgotten to suspend boinc first.

Don't know why windows tasks are different but rebooting has when I have followed Les's advice yet to crash a task running under WINE.

Profile Iain Inglis
Volunteer moderator
Send message
Joined: 16 Jan 10
Posts: 972
Credit: 2,829,736
RAC: 9,746
Message 55859 - Posted: 6 Mar 2017, 10:07:31 UTC

Maybe I'm missing something, Jean-David, but that model looks like a success ...

Jean-David Beyer
Send message
Joined: 5 Aug 04
Posts: 274
Credit: 2,995,866
RAC: 1,353
Message 55862 - Posted: 6 Mar 2017, 12:53:51 UTC - in response to Message 55859.

Maybe I'm missing something, Jean-David, but that model looks like a success ...


I see that now. When I posted my question, it showed that result, but with zero credits. And when I looked at at the other results for that work unit, there was only one other that crashed early.
____________

Profile Dave Jackson
Volunteer moderator
Send message
Joined: 15 May 09
Posts: 2387
Credit: 3,073,986
RAC: 370
Message 55863 - Posted: 6 Mar 2017, 13:33:22 UTC

it showed that result, but with zero credits.


A while ago the credit script was set to only run once a week. Recently there have been times when either even that has not happened or the script has failed in some fashion. Just over a week ago it was run manually and looking at my account it seems it has just finished running from its regular weekly run.

Profile Byron Leigh Hatch @ team Carl Sagan
Avatar
Send message
Joined: 17 Aug 04
Posts: 280
Credit: 43,277,847
RAC: 5,991
Message 55867 - Posted: 6 Mar 2017, 18:58:58 UTC - in response to Message 55857.

Thanks Les Bayliss for the following:

My process now is:
In menu -> Activity, Network activity suspended
Above it, Suspend BOINC
Wait a few seconds
In menu -> File Exit
Then there's a pop up window, which asks if I want to suspend running tasks, so I click Yes.

also is the following of any help?

For those of us Running:
Windows 10 Professional x64 Edition, (10.00.14393.00)
BOINC version 7.6.33

My experience is the following:
When for whatever reason I have to close BOINC or reboot my machine,
I have never had climateprediction Tasks Crash, When I the use following:

1) Open BOINC Manager
2) you will see tabs for - - - File, View, Activity, Options, Tools, Help

3) Click on the Options tap,
4) you will see a drop down List,
5) Scroll down to other Options,
6) Click on other Options and you will see the following,

7) Under the general tap. you should see Enable Manager exit dialog?

8) Make sure you tick that Box
9) then click OK
10) Now when you Exit BOINC, this popup Box should appear:

(11) Make sure the little box is tick: stop running tasks when exiting BOINC
(12) then click OK

I hope i got it right.
is this of any help to anyone?

Les Bayliss
Volunteer moderator
Send message
Joined: 5 Sep 04
Posts: 6866
Credit: 20,843,205
RAC: 216
Message 55868 - Posted: 6 Mar 2017, 21:10:11 UTC

Hi Byron

That last pop up that you're listed is roughly (possibly exactly), the pop up that I mentioned. So, yes that will help those who don't already get it on Exit.


I never went to Windows 10; rather I followed MS's request while on XP, to update to a new OS, and if necessary, new hardware.
So I went from a Q6600, (Kentsfield), to an i7-3770K (Ivy Bridge), and the OS from XP to LINUX. And they were right - it's much better. :)

Profile Byron Leigh Hatch @ team Carl Sagan
Avatar
Send message
Joined: 17 Aug 04
Posts: 280
Credit: 43,277,847
RAC: 5,991
Message 55874 - Posted: 7 Mar 2017, 14:52:48 UTC - in response to Message 55868.

Thanks Les,
Best Wishes,
Byron

Message boards : Number crunching : WUs crash when closing BOINC


Main page · Your account · Message boards


Copyright © 2019 climateprediction.net