Message boards :
Number crunching :
Frustrating that you can't shut down without error
Message board moderation
Author | Message |
---|---|
Send message Joined: 21 Nov 04 Posts: 6 Credit: 902,269 RAC: 0 |
It is extremely frustrating that you cannot shut down BOINC without causing a climate model to error. It does not always happen, but it happens too frequently. I have tried suspending activity, waiting for a period of time and then exiting the BOINC process. I saw a thread about making a change in the registry and I also did that. It did not help. I used to shut down BOINC in order to allow disk defragmentation to take place, but I find that simply suspending activity for an hour or so without exiting BOINC works about as well. Unfortunately, Windows and other program updates sometimes make a shutdown necessary. If it weren't for this problem, I would have almost no errored models. Chuck |
Send message Joined: 16 Jan 10 Posts: 1081 Credit: 6,980,320 RAC: 3,893 |
Unfortunately, Windows and other program updates sometimes make a shutdown necessary. If it weren't for this problem, I would have almost no errored models.Turning off automatic Windows updates does help reduce errors. Shutting BOINC down manually as well should reduce the error rate to a very low level. (FAMOUS models have a relatively high repeatable error rate that is nothing to do with the computer itself - anyone with the same setup would get the error.) |
Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370 |
Couple things you can do. In the BOINC manager, in advanced view, goto the Advanced menu and select "Shutdown connected client..." before you backup/defrag/shutdown your machine. I always shutdown BOINC during a backup, so the BOINC restarts in a consistent state, should I ever need to restore. Or, a less preferred option: In preferences, set the "leave applications in memory" to off. When you suspend work (after at least 1 checkpoint), the application will unload from memory cleanly. Also, if you installed BOINC as a service, you may want to check registry key: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\WaitToKillServiceTimeout and set to at least 30000. Must be of type REG_SZ. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
I believe it's only possible to defragment the Boinc directories (as well as everything else) if one exits from Boinc first. If Boinc is left running it does no harm but it doesn't get defragmented. I know that some people run background backups with Boinc running; apparently this method produces restorable backups most of the time but not invariably. I take manual backups, copying and pasting. In Windows I've found that if I forget to exit from Boinc first, the pasting fails and I have to exit from Boinc and start again. The advantage is that in my experience a manual backup can always be restored. Exiting from Boinc before shutting down the computer should only take a few moments. The problem is that there's nothing in Boinc itself to indicate that this is necessary, at least for CPDN, and lots of members probably don't realise. Cpdn news |
Send message Joined: 21 Nov 04 Posts: 6 Credit: 902,269 RAC: 0 |
I did everything suggested. Did the "Shutdown connected client". I have always had "leave applications in memory" off. It doesn't make any difference. A model still errored as soon as I restarted BOINC. I see messages like this in STDERR <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> CPDN Monitor - Quit request from BOINC... CPDN Monitor - Quit request from BOINC... Global Worker:: CPDN process is not running, exiting, bRetVal = 0, checkPID=0, selfPID=3456, iMonCtr=1 Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=0, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=2584, selfPID=4632, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 11:01:12 (4632): called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>hadam3p_eu_xexy_1981_1_006966974_0_5.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_eu_xexy_1981_1_006966974_0_6.zip</file_name> <error_code>-161</error_code> </file_xfer_error> Followed by the list of all missing ZIP files. |
Send message Joined: 16 Jan 10 Posts: 1081 Credit: 6,980,320 RAC: 3,893 |
For reference: hadam3p_eu_xexy_1981_1_006966974_0. The next thing to check is that the BOINC application and data folders are excluded from any virus checking. A virus checker is just the kind of thing that could intervene when BOINC starts. |
Send message Joined: 21 Nov 04 Posts: 6 Credit: 902,269 RAC: 0 |
C:\Program Files\BOINC and C:\Programdata\BOINC have always been excluded from virus scanning (Kaspersky). SETI never has this problem. I have to believe that there is some flaw in the Climate Models that is not handling a shutdown properly. Chuck |
Send message Joined: 3 Oct 06 Posts: 43 Credit: 8,017,057 RAC: 0 |
I can shutdown without models erroring out. I have no idea why you can't. I do know I do not do anything special before shutting down. I know this remark doesn't help you, but it is my comment on the subject title of this thread. |
©2024 climateprediction.net