climateprediction.net home page
COmputation error

COmputation error

Questions and Answers : Windows : COmputation error
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user279673

Send message
Joined: 18 Feb 06
Posts: 4
Credit: 76,344
RAC: 0
Message 35257 - Posted: 15 Oct 2008, 18:29:15 UTC

Hi,

I\'ve been running two climate simulations side-by-side on my quad-core CPU, both had each been running for 350 hours and were 18.5% completed (with ~1200 hrs left to completion).

However, after a reboot (due to a Windows Update) one of them showed as \"Computation error\".

I closed BOINC and reloaded it and it now says \"Ready To Report\". Obviously something went wrong. Is there anything I can do, given the amount of time I\'ve invested in it?

The specific task is: hadcm3istd_1414_1920_160_15995109_4

I\'ve closed BOINC to stop it from being uploaded (given that it\'s current state is Ready to Report) in case I can somehow salvage and re-continue it.

Thanks,

Tim
ID: 35257 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 35258 - Posted: 15 Oct 2008, 19:39:06 UTC

Models can only be restarted from a backup of the entire BOINC folder (version5), or the entire BOINC data folder (version 6), made BEFORE the failure.
See my sig for a link to details.

The failure was probably due to restarting the computer without first stopping BOINC.


Backups: Here
ID: 35258 · Report as offensive     Reply Quote
old_user279673

Send message
Joined: 18 Feb 06
Posts: 4
Credit: 76,344
RAC: 0
Message 35259 - Posted: 15 Oct 2008, 20:41:15 UTC
Last modified: 15 Oct 2008, 20:46:44 UTC

Ok, so that particular simulation is now unrecoverable? No backups are automatically generated?

I\'m actually quite particular about BOINC, and closed it down (fully exited the program, as confirmed by the systray icon disappearing) some 20 seconds prior to selecting to reboot Windows. I always monitor my CPU temperatures and they dropped, as usual, confirming BOINC was closed.

However, when the computer rebooted it hung before booting into Windows - on a black screen. After leaving it five minutes in case it was doing something behind the scenes, I eventually had to hit the Reset button. Nothing else was functioning. The reboot worked this time, but then the computation error was shown when BOINC was restarted.

Looks like I found out about the need for making manual backups too late. =/

Thanks,

Tim
ID: 35259 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 35260 - Posted: 15 Oct 2008, 22:35:21 UTC

It would be very nice if BOINC could back its own project data up regularly. The problem is that you need to have exited completely from BOINC before the backup is made, otherwise files might be in use and the backup wouldn\'t be restorable. Once you\'ve stopped BOINC it can\'t do anything for itself, for the models or for you.

Have a look at the backup methods in the README and choose the one that suits you best. With a fast quad like yours it would probably be a good idea to back up every couple of days. So if you ever did need to restore a backup to rescue a crashed model you wouldn\'t need to repeat much crunching.
Cpdn news
ID: 35260 · Report as offensive     Reply Quote
old_user279673

Send message
Joined: 18 Feb 06
Posts: 4
Credit: 76,344
RAC: 0
Message 35281 - Posted: 16 Oct 2008, 18:04:28 UTC
Last modified: 16 Oct 2008, 18:05:09 UTC

Indeed. A quick Rar backup of the whole folder only takes little over 60 seconds with half a Gig of data in there, after everything\'s closed/cleared. Very easy to do. Wish I\'d somehow been pre-warned beforehand, I\'d have taken one every day!

Thanks. :)
ID: 35281 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 35282 - Posted: 16 Oct 2008, 19:28:46 UTC

We\'ve been urging people to make backups for years. The problem is to get people to read what\'s posted.


Backups: Here
ID: 35282 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 35288 - Posted: 16 Oct 2008, 22:48:58 UTC

Tímo, go to the links in Les\'s signature and mine. They will take you straight to the most useful CPDN information pages and forum threads.
Cpdn news
ID: 35288 · Report as offensive     Reply Quote
old_user279673

Send message
Joined: 18 Feb 06
Posts: 4
Credit: 76,344
RAC: 0
Message 35300 - Posted: 17 Oct 2008, 12:04:54 UTC - in response to Message 35282.  

We\'ve been urging people to make backups for years. The problem is to get people to read what\'s posted.



Actually I ran BOINC standalone, and had never been on any BOINC forums (never saw the need to) until I got the error, so would never have known. I googled to find an answer, and it was only that that brought me here.

Tímo, go to the links in Les\'s signature and mine. They will take you straight to the most useful CPDN information pages and forum threads.


Thnx. :)
ID: 35300 · Report as offensive     Reply Quote
MattShizzle

Send message
Joined: 23 Aug 09
Posts: 5
Credit: 726,744
RAC: 133
Message 38090 - Posted: 11 Oct 2009, 4:08:35 UTC

I\'ve gotten \"computation error\" as the result of 3 out of 4 of the completed results without any sort of update.
ID: 38090 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 38099 - Posted: 12 Oct 2009, 8:32:51 UTC

Hi Matt

This 22 error you\'ve been getting is a nuisance because it covers so many different possible things that may have gone wrong. I can only suggest backing up your Boinc Data folder regularly in case the same thing happens again and restoring it if the task crashes. The server will accept whatever new trickles and files you upload. The longer a task lasts the more useful it is to do this.

Have a look at the README collection about problems to see if there\'s anything you should be doing and aren\'t. There\'s a link in my signature. There\'s also a README about how to back up and restore if you haven\'t done it before. Les\'s quick manual method works perfectly.
Cpdn news
ID: 38099 · Report as offensive     Reply Quote
MattShizzle

Send message
Joined: 23 Aug 09
Posts: 5
Credit: 726,744
RAC: 133
Message 38110 - Posted: 13 Oct 2009, 2:46:02 UTC
Last modified: 13 Oct 2009, 3:06:11 UTC

The problem is that I don\'t usually even know until it\'s already been reported as such so backing it up might not help. I have Boinc running when I\'m online, playing games and even when I\'m away from my computer for a little so I might not check up on it for hours at a time. Apparently as it uploads every year of data at least and I still get points for these it is still contributing but I\'d just like to see more complete cycles. It does seem to either get the error within a week of running - as the one that completed was going since I\'ve been on here and only finished maybe 2 weeks ago (though the deadline was sometime in late summer 2010!) I\'m pretty bad with computer stuff so a lot of that is beyond me. edit: I read the links and couldn\'t make head or tail of it. Way too technical for me. Note I use IE because I was completely unable to figure out how to use any other browser. I can\'t find a BOINC or climate prediction folder anywhere.
ID: 38110 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,085,690
RAC: 2,334
Message 38112 - Posted: 13 Oct 2009, 5:16:26 UTC - in response to Message 38110.  

The problem is that I don\'t usually even know until it\'s already been reported as such so backing it up might not help. I have Boinc running when I\'m online, playing games and even when I\'m away from my computer for a little so I might not check up on it for hours at a time. Apparently as it uploads every year of data at least and I still get points for these it is still contributing but I\'d just like to see more complete cycles. It does seem to either get the error within a week of running - as the one that completed was going since I\'ve been on here and only finished maybe 2 weeks ago (though the deadline was sometime in late summer 2010!) I\'m pretty bad with computer stuff so a lot of that is beyond me. edit: I read the links and couldn\'t make head or tail of it. Way too technical for me. Note I use IE because I was completely unable to figure out how to use any other browser. I can\'t find a BOINC or climate prediction folder anywhere.


Hi, Matt:

The boinc folder is in the programdata folder. The reason why you are unable to find it is that Windows hides this folder by default. To make it visible you need to click on “control panel” and then click on “folder options”. Once “folder options” is open click the “view” tab. Scroll down to “Hidden files an folder” and click “show hidden files, folders, and drives.” Then click “OK”. This will make it visible.

ID: 38112 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 38114 - Posted: 13 Oct 2009, 11:11:29 UTC
Last modified: 13 Oct 2009, 11:15:32 UTC

Hi Matt

You can see what happened to your past models by clicking on your own name on the forum then following the links to your computer and its tasks. One of the crashed models is here. If you click on + you\'ll see messages about how the model progressed. Unfortunately we can only see these messages once a model\'s finished, completed or crashed, so we can\'t see whether anything\'s going wrong as it happens.

I don\'t think you need to do anything very complicated. Try this first.

I think your models may be crashing because you turn off the computer without exiting from Boinc first, and it sounds as if you turn off the computer every day. Mostly this does no harm but sooner or later when you turn off the computer the model will be caught at a moment when it\'s trying to record data on your disk and it will crash. This is a nuisance.

As there are two ways to exit from Boinc we need to know whether you have it installed as a \'service\' or not. Like many people you may not know. Could you please copy for us the first 20 or so lines of your messages in your Boinc manager.

Press the Shift key, click on the first then the last line you want to show us. They\'ll all be highlighted. Click \'Copy selected messages\'. Then in a post here go to Page - Paste (or File - Paste) and they should appear in your post.

We should then be able to tell you exactly how to exit from Boinc. (A Boinc exit is quick and easy; we just need to tell you the method for your Boinc installation.)
Cpdn news
ID: 38114 · Report as offensive     Reply Quote
MattShizzle

Send message
Joined: 23 Aug 09
Posts: 5
Credit: 726,744
RAC: 133
Message 38117 - Posted: 13 Oct 2009, 14:20:19 UTC - in response to Message 38114.  

Here they are:

10/13/2009 8:36:55 AM||Starting BOINC client version 6.2.28 for windows_intelx86
10/13/2009 8:36:55 AM||log flags: task, file_xfer, sched_ops
10/13/2009 8:36:55 AM||Libraries: libcurl/7.19.0 OpenSSL/0.9.8i zlib/1.2.3
10/13/2009 8:36:55 AM||Data directory: C:\\Documents and Settings\\All Users\\Application Data\\BOINC
10/13/2009 8:36:55 AM||Running under account Owner
10/13/2009 8:36:58 AM||Processor: 2 GenuineIntel Intel(R) Pentium(R) 4 CPU 3.40GHz [x86 Family 15 Model 3 Stepping 4]
10/13/2009 8:36:58 AM||Processor features: fpu tsc sse sse2 mmx
10/13/2009 8:36:58 AM||OS: Microsoft Windows XP: Home x86 Editon, Service Pack 3, (05.01.2600.00)
10/13/2009 8:36:58 AM||Memory: 1021.78 MB physical, 2.40 GB virtual
10/13/2009 8:36:58 AM||Disk: 232.88 GB total, 90.41 GB free
10/13/2009 8:36:58 AM||Local time is UTC -4 hours
10/13/2009 8:36:58 AM|climateprediction.net|URL: http://climateprediction.net/; Computer ID: 1001743; location: (none); project prefs: default
10/13/2009 8:36:58 AM|lhcathome|URL: http://lhcathome.cern.ch/lhcathome/; Computer ID: 9811710; location: (none); project prefs: default
10/13/2009 8:36:58 AM|Milkyway@home|URL: http://milkyway.cs.rpi.edu/milkyway/; Computer ID: 104520; location: (none); project prefs: default
10/13/2009 8:36:58 AM|Cosmology@Home|URL: http://www.cosmologyathome.org/; Computer ID: 60243; location: (none); project prefs: default
10/13/2009 8:36:58 AM|World Community Grid|URL: http://www.worldcommunitygrid.org/; Computer ID: 1030424; location: (none); project prefs: default
10/13/2009 8:36:58 AM||General prefs: from World Community Grid (last modified 31-Dec-1969 19:00:01)
10/13/2009 8:36:58 AM||Host location: none
10/13/2009 8:36:58 AM||General prefs: using your defaults
10/13/2009 8:36:58 AM||Reading preferences override file
10/13/2009 8:36:58 AM||Preferences limit memory usage when active to 510.89MB
10/13/2009 8:36:58 AM||Preferences limit memory usage when idle to 766.33MB
10/13/2009 8:36:58 AM||Preferences limit disk usage to 18.63GB
10/13/2009 8:38:56 AM|Cosmology@Home|Restarting task wu_100309_230540_1_1_0 using camb version 216
ID: 38117 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 38120 - Posted: 14 Oct 2009, 1:07:17 UTC

Hi Matt

Thanks for the messages.

Your Boinc isn\'t running as a service, otherwise it would say \'Running as a daemon\'.

Exiting from Boinc is easy. Right-click on the Boinc icon at the bottom of the screen (it\'s like a fried egg on a grill) and click Exit. The icon will disappear. Now it\'s safe for the model to shut down the computer.

When you restart the computer the Boinc icon should reappear. If it ever doesn\'t just go to Start > Programs > Boinc > click on Boinc manager. The icon should jump into place.


Get used to doing that. Later if you want to try making a backup (eg when you have a bit of spare time say at the weekend) let us know and I\'ll explain an easy method click-by-click.
Cpdn news
ID: 38120 · Report as offensive     Reply Quote
MattShizzle

Send message
Joined: 23 Aug 09
Posts: 5
Credit: 726,744
RAC: 133
Message 38121 - Posted: 14 Oct 2009, 3:03:26 UTC - in response to Message 38120.  

OK that sounds pretty easy to me. Just have to remember to do it - not used to it and I turn off my computer at bedtime and I\'m often a bit drunk then. I\'m guessing you don\'t have to shut off BOINC if you go into stnadby mode without actually turning the computer off - I usually do that when I\'m eating or away from my computer for whatever reason more than half an hour or so.
ID: 38121 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 38123 - Posted: 14 Oct 2009, 16:38:20 UTC - in response to Message 38121.  

... I\'m guessing you don\'t have to shut off BOINC if you go into stnadby mode without actually turning the computer off ...

It would be a good idea to stop BOINC in that situation, though you would have to remember to start it again when the computer comes out of hibernation. Or you could abandon hibernation and just close down as usual at the end of the day - the model will finish earlier!

Though BOINC is designed to cope with lots of these kinds of situations - starting, stopping, busy computers etc. - it is nonetheless the case that it\'s exactly at these times when model errors tend to happen. If you want to complete more models then the models will have to be protected a bit (or backed up). I don\'t like making backups so I make one at the beginning of a run and take care not to crash the model before it\'s finished; other people take lots of backups and thrash their computers, restoring the backup if the model crashes. It just depends what each person wants to do.
ID: 38123 · Report as offensive     Reply Quote
old_user597798

Send message
Joined: 15 Oct 09
Posts: 1
Credit: 381,051
RAC: 0
Message 38215 - Posted: 28 Oct 2009, 21:31:03 UTC

I am getting this error around 35% to 40% into the progressed time myself. A fortran error. It looks like I might have to learn to make backups also, if that will help with this error. And no, I do not turn off my computer without exiting Boinc first.
ID: 38215 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2168
Credit: 64,541,825
RAC: 6,664
Message 38216 - Posted: 28 Oct 2009, 22:18:38 UTC

Moved this thread from the \"climateprediction.net Science\" forum to the Windows one as the people reporting problems are running in that OS. The problems are not related to climateprediction.net science.
ID: 38216 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 38217 - Posted: 29 Oct 2009, 1:50:57 UTC
Last modified: 29 Oct 2009, 1:54:21 UTC

Hi Quinsha

The visual Fortran runtime error in a popup window is a nuisance because we don\'t know the cause. It says Fortran because that\'s the program the models are written in. If the error message appears repeatedly you could

* make backups in case the error crashes the model, though often the model continues successfully
* exclude Boinc from anti-virus scans
* completely exit from Boinc then restart it
* exit from Boinc and reboot the computer
* before all the climate models on the problem computer have finished set CPDN to No new tasks, then when all the models have finished and reported reset CPDN.

But sometimes the Fortran runtime error just stops appearing for no apparent reason. This happened to me. I don\'t know why the error message appeared a few times or why it then disappeared.
Cpdn news
ID: 38217 · Report as offensive     Reply Quote

Questions and Answers : Windows : COmputation error

©2024 climateprediction.net