Questions and Answers :
Windows :
Models Stopped
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0 |
Having been making backups since the start of my 2 models, I decided it was high time that I tested the restore process. Mistake! To backup, I closed down BOINC/CCE with File > Exit. I then took a backup by copying all files from the c:\\BOINC folder into a backup location, for this exercise, elsewhere on c: although I normally backup to another PC on the network. To restore, I then deleted all files and folders from c:/BOINC . I then copied everything back from the backup folder to c:/BOINC. I then restarted BOINC/CCE. Everything appeared to be OK, except for one thing: the models were just sitting there and not executing, i.e. the CPU timers were not incrementing and, when I clicked \"Show Graphics\", the world has no cloud or temp gradiation. What am I doing wrong and how do I recover? /N |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
A few questions for you: Did you get any error messages while doing either copy? What messages are displayed in the boinc manager? Do the tasks show as \'running\' on the work/tasks tab? I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0 |
Absolutely no error messages. On my other (BBC) model, I do this twice a week as I transfer the model from desktop to laptop and vv, so felt myself to be familiar with the process. It looked procedurally identical to what\'s usual for me, except that it was back onto the same PC. BOINC Manager messages: 01/06/2007 15:13:38||Starting BOINC client version 5.8.16 for windows_intelx86 01/06/2007 15:13:38||log flags: task, file_xfer, sched_ops 01/06/2007 15:13:38||Libraries: libcurl/7.16.0 OpenSSL/0.9.8a zlib/1.2.3 01/06/2007 15:13:38||Data directory: C:\\Program Files\\BOINC 01/06/2007 15:13:38||Processor: 2 GenuineIntel Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz [x86 Family 6 Model 15 Stepping 6] [fpu tsc pae nx sse sse2 mmx] 01/06/2007 15:13:38||Memory: 2.00 GB physical, 3.85 GB virtual 01/06/2007 15:13:38||Disk: 298.09 GB total, 243.17 GB free 01/06/2007 15:13:38|climateprediction.net|URL: http://climateprediction.net/; Computer ID: 668376; location: home; project prefs: default 01/06/2007 15:13:38||General prefs: from climateprediction.net (last modified 2007-05-19 15:01:48) 01/06/2007 15:13:38||Host location: home 01/06/2007 15:13:38||General prefs: no separate prefs for home; using your defaults 01/06/2007 15:13:38|climateprediction.net|Restarting task hadcm3ohe_0zos_05694590_0 using hadcm3 version 515 01/06/2007 15:13:38|climateprediction.net|Restarting task hadcm3ohe_2d8p_05758811_1 using hadcm3 version 515 01/06/2007 15:13:54|climateprediction.net|Resuming task hadcm3ohe_0zos_05694590_0 using hadcm3 version 515 01/06/2007 15:13:54|climateprediction.net|Resuming task hadcm3ohe_2d8p_05758811_1 using hadcm3 version 515 01/06/2007 15:14:03|climateprediction.net|Resuming task hadcm3ohe_0zos_05694590_0 using hadcm3 version 515 The Resumings are from when I did suspend/resume to attempt to jolt-start. Tasks do both show as running. Am wondering whether to try reinstalling BOINC onto the existing folder? Thanks for help. |
Send message Joined: 9 Jan 07 Posts: 497 Credit: 342,899 RAC: 0 |
Lockleys, I\'ve occasionally had tasks showing as running but not actually running. Rebooting the computer has always got them going again. Best regards, MM Visit the Scotland team |
Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0 |
SP: Tried that. On reboot and opening BOINC, my CPDN Tasks have disappeared entirely from the task list. Ouch! |
Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0 |
More: I reprimed from backup, then rebooted. This time, the tasks didn\'t disappear, but they still didn\'t start. Once again, no sinister BOINC messages, just no action. |
Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0 |
I have come back to this one hour later and found the following message string: 01/06/2007 18:42:20||Starting BOINC client version 5.8.16 for windows_intelx86 01/06/2007 18:42:20||log flags: task, file_xfer, sched_ops 01/06/2007 18:42:20||Libraries: libcurl/7.16.0 OpenSSL/0.9.8a zlib/1.2.3 01/06/2007 18:42:20||Data directory: C:\\Program Files\\BOINC 01/06/2007 18:42:20||Processor: 2 GenuineIntel Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz [x86 Family 6 Model 15 Stepping 6] [fpu tsc pae nx sse sse2 mmx] 01/06/2007 18:42:20||Memory: 2.00 GB physical, 3.85 GB virtual 01/06/2007 18:42:20||Disk: 298.09 GB total, 243.16 GB free 01/06/2007 18:42:20|climateprediction.net|URL: http://climateprediction.net/; Computer ID: 668376; location: home; project prefs: default 01/06/2007 18:42:20||General prefs: from climateprediction.net (last modified 2007-05-19 15:01:48) 01/06/2007 18:42:20||Host location: home 01/06/2007 18:42:20||General prefs: no separate prefs for home; using your defaults 01/06/2007 18:42:20||Suspending network activity - user request 01/06/2007 18:42:20|climateprediction.net|Restarting task hadcm3ohe_0zos_05694590_0 using hadcm3 version 515 01/06/2007 18:42:21|climateprediction.net|Restarting task hadcm3ohe_2d8p_05758811_1 using hadcm3 version 515 01/06/2007 18:43:06||Resuming network activity 01/06/2007 18:43:06|climateprediction.net|Sending scheduler request: To send trickle-up message 01/06/2007 18:43:06|climateprediction.net|(not requesting new work or reporting completed tasks) 01/06/2007 18:43:07|climateprediction.net|[file_xfer] Started upload of file hadcm3ohe_0zos_05694590_0_7.zip 01/06/2007 18:43:09|climateprediction.net|[file_xfer] Finished upload of file hadcm3ohe_0zos_05694590_0_7.zip 01/06/2007 18:43:09|climateprediction.net|[file_xfer] Throughput 245 bytes/sec 01/06/2007 18:43:11|climateprediction.net|Scheduler RPC succeeded [server version 509] 01/06/2007 18:43:11|climateprediction.net|Generated new host CPID: f6cdb01a1c13c902045e7a10e5c2b151 01/06/2007 18:43:42||Suspending network activity - user request 01/06/2007 19:30:39|climateprediction.net|Computation for task hadcm3ohe_0zos_05694590_0 finished 01/06/2007 19:30:39|climateprediction.net|Output file hadcm3ohe_0zos_05694590_0_8.zip for task hadcm3ohe_0zos_05694590_0 absent 01/06/2007 19:30:39|climateprediction.net|Output file hadcm3ohe_0zos_05694590_0_9.zip for task hadcm3ohe_0zos_05694590_0 absent 01/06/2007 19:30:39|climateprediction.net|Output file hadcm3ohe_0zos_05694590_0_10.zip for task hadcm3ohe_0zos_05694590_0 absent 01/06/2007 19:30:39|climateprediction.net|Output file hadcm3ohe_0zos_05694590_0_11.zip for task hadcm3ohe_0zos_05694590_0 absent 01/06/2007 19:30:39|climateprediction.net|Output file hadcm3ohe_0zos_05694590_0_12.zip for task hadcm3ohe_0zos_05694590_0 absent 01/06/2007 19:30:39|climateprediction.net|Output file hadcm3ohe_0zos_05694590_0_13.zip for task hadcm3ohe_0zos_05694590_0 absent 01/06/2007 19:30:39|climateprediction.net|Output file hadcm3ohe_0zos_05694590_0_14.zip for task hadcm3ohe_0zos_05694590_0 absent 01/06/2007 19:30:39|climateprediction.net|Output file hadcm3ohe_0zos_05694590_0_15.zip for task hadcm3ohe_0zos_05694590_0 absent 01/06/2007 19:30:39|climateprediction.net|Output file hadcm3ohe_0zos_05694590_0_16.zip for task hadcm3ohe_0zos_05694590_0 absent 01/06/2007 19:30:40|climateprediction.net|Deferring communication for 1 min 0 sec 01/06/2007 19:30:40|climateprediction.net|Reason: Unrecoverable error for result hadcm3ohe_0zos_05694590_0 (<file_xfer_error> <file_name>hadcm3ohe_0zos_05694590_0_8.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_0zos_05694590_0_9.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_0zos_05694590_0_10.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_0zos_05694590_0_11.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_0zos_05694590_0_12.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_0zos_05694590_0_13.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_0zos_05694590_0_14.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_0zos_05694590_0_15.zip</file_ 01/06/2007 19:30:40|climateprediction.net|Computation for task hadcm3ohe_2d8p_05758811_1 finished 01/06/2007 19:30:40|climateprediction.net|Output file hadcm3ohe_2d8p_05758811_1_8.zip for task hadcm3ohe_2d8p_05758811_1 absent 01/06/2007 19:30:40|climateprediction.net|Output file hadcm3ohe_2d8p_05758811_1_9.zip for task hadcm3ohe_2d8p_05758811_1 absent 01/06/2007 19:30:40|climateprediction.net|Output file hadcm3ohe_2d8p_05758811_1_10.zip for task hadcm3ohe_2d8p_05758811_1 absent 01/06/2007 19:30:40|climateprediction.net|Output file hadcm3ohe_2d8p_05758811_1_11.zip for task hadcm3ohe_2d8p_05758811_1 absent 01/06/2007 19:30:40|climateprediction.net|Output file hadcm3ohe_2d8p_05758811_1_12.zip for task hadcm3ohe_2d8p_05758811_1 absent 01/06/2007 19:30:40|climateprediction.net|Output file hadcm3ohe_2d8p_05758811_1_13.zip for task hadcm3ohe_2d8p_05758811_1 absent 01/06/2007 19:30:40|climateprediction.net|Output file hadcm3ohe_2d8p_05758811_1_14.zip for task hadcm3ohe_2d8p_05758811_1 absent 01/06/2007 19:30:40|climateprediction.net|Output file hadcm3ohe_2d8p_05758811_1_15.zip for task hadcm3ohe_2d8p_05758811_1 absent 01/06/2007 19:30:40|climateprediction.net|Output file hadcm3ohe_2d8p_05758811_1_16.zip for task hadcm3ohe_2d8p_05758811_1 absent 01/06/2007 19:30:41|climateprediction.net|Deferring communication for 1 min 0 sec 01/06/2007 19:30:41|climateprediction.net|Reason: Unrecoverable error for result hadcm3ohe_2d8p_05758811_1 (<file_xfer_error> <file_name>hadcm3ohe_2d8p_05758811_1_8.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_2d8p_05758811_1_9.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_2d8p_05758811_1_10.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_2d8p_05758811_1_11.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_2d8p_05758811_1_12.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_2d8p_05758811_1_13.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_2d8p_05758811_1_14.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_2d8p_05758811_1_15.zip</file_ I am running with networking suspended, so I guess these errors will not have been communicated to the server. I should also say that the backup from which I restored was taken just before a decadal upload. Although the tasks failed to resume after the restore, the decadal upload went up to the server before I turned networking off again. |
Send message Joined: 9 Jan 07 Posts: 497 Credit: 342,899 RAC: 0 |
This is all beyond me I\'m afraid, Lockleys - but I\'ve also posted about it in the Scottish forum, so maybe some of our team-mates will be along to help in a minute. The messages all look normal up to the end of your 70-year zip file upload, but then I see it says \"Generated new host CPID\" - as if you needed yet another one of those! I wonder if that\'s part of the problem? Visit the Scotland team |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
\'Generated new host ID\' tends to happen after a backup is restored, it\'s not particularly significant. I can\'t see any specific reason for the crash in the error log. Try this: Restore the backup, then immediately suspend CPU and suspend network activity. Quit boinc and then restart it. Resume CPU, leave network disabled, see if it stops at the same point or continues on. It might be that the backup isn\'t right (for example, a file missing or corrupt), in which case there\'s not much that can be done about it. You could try comparing the files in the backup to your other system, to see if there is anything not present. Also make sure all files are set read/write rather than read/only. If you see the task marked as \'running\', but not taking any CPU time in Task Manager, try leaving it for 20 minutes or so (there is a 15 minute timer). I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 16 Feb 06 Posts: 23 Credit: 3,515,174 RAC: 0 |
I also see no known, to me fail! Keep on going for the next trickle, as I see no fatal errors. Upload and check again. Just no error as in 0x0? Not my forty but pulled a few back! Oh! merge the hosts. Rory Leave a planet to those following! |
Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0 |
Thanks all. Tried the MikeMars suggestion, but my tasks have not resumed. If no other ideas by tomorrow, I\'ll have to try going back to earlier backups (I still have 19 of them) and see if they all behave the same. :( Still, a day lost would be better than 2 models lost! |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
The CPDN models will not run, of course, if something else is running. My machine (including BOINC) has been repeatedly brought down in recent weeks by an anonymous system process running flat-out in the background: eventually mouse and keyboard control are lost and the power switch is the only option. There does appear to be a Microsoft Update problem with some of these symptoms doing the rounds at the moment, in which case I would suggest: - run Microsoft Update (the first half of their fix is out - more to follow, I think) - boot the machine, but don\'t start BOINC (if BOINC starts automatically, then suspend and exit) - check whether any other task is consuming a significant percentage of the CPU time, using Task Manager (right mouse click on taskbar) - if another task is running then let it run for two hours to see whether it finishes. (This worked on two of my machines, but on my crash-prone machine the task ran for over 12 hours without finishing.) If nothing is running in the background but the CPDN models still won\'t run then ignore this post! |
Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0 |
The CPDN models will not run, of course, if something else is running. My machine (including BOINC) has been repeatedly brought down in recent weeks by an anonymous system process running flat-out in the background: eventually mouse and keyboard control are lost and the power switch is the only option. There does appear to be a Microsoft Update problem with some of these symptoms doing the rounds at the moment, in which case I would suggest: Thanks Iain. I checked Windows Task manager, but there\'s nothing there punching the CPU above 1%. |
Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0 |
I seem to have cracked the problem!?! I deleted all files from c:/BOINC. Then loaded back the earliest backup I still have (about 3 months old). When I clicked Resume, the tasks started. So I Suspended and Exited, then copied my recent backup over c:/BOINC without deleting the old files. Opened and Resumed and it started straight away. Odd, or what? Perhaps this may help others. Or pehaps nobody will ever have this anomoly. Thanks to all for assistance and ideas. |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
It sounds from your description that a file was missing from the most recent backup, glad it\'s working now :-) I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 9 Jan 07 Posts: 497 Credit: 342,899 RAC: 0 |
Glad to hear you\'re operational again, Lockleys - well done and happy continued crunching! Regards, MM @ the Pavilion Visit the Scotland team |
Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0 |
Thought I was operational OK again - certainly it looks like that from my end of the periscope, but when I look at my results on the server it suudenly looks alarming with all sort of errors listed. See http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6243274 and http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6170819 for my two models. Should I be worrying? /Cheers |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
You can worry if you like. But it\'s not compulsary. :) The messages that get uploaded to the server first stay there, as there\'s no mechanism for replacing them with latter messages. (This is just part of the way BOINC is designed for the many other DC projects.) Just carry on crunching, and rely on what you see in the Manager on your computer. |
Send message Joined: 9 Jan 07 Posts: 497 Credit: 342,899 RAC: 0 |
Les Bayliss wrote: You can worry if you like. Les! Please don\'t worry Lockleys! Best regards, MM @ the Pavilion Visit the Scotland team |
©2024 cpdn.org