climateprediction.net home page
Computation error
Computation error
log in

Advanced search

Message boards : Cafe CPDN : Computation error

Author Message
Ron Voss
Send message
Joined: 15 Jun 07
Posts: 3
Credit: 971,868
RAC: 0
Message 58984 - Posted: 9 Nov 2018, 20:59:24 UTC

Most of my BOINC projects take a few minutes to a few hours, but CP takes days to run, so I was bummed that after two days (blocking other projects because "Switch between tasks every N minutes" isn't working) my two CP tasks aborted simultaneously with "Computation error" after a restart, despite checkpointing. So I'm sorry to have to abandon CP; I don't want it wasting more cycles. These were my first two tasks after rejoining CP since being away a few years; I don't remember why I left, perhaps for the same reason. 16GB iMac MacOS 10.13.6.

Les Bayliss
Volunteer moderator
Send message
Joined: 5 Sep 04
Posts: 6909
Credit: 20,843,205
RAC: 108
Message 58985 - Posted: 9 Nov 2018, 21:36:54 UTC

BOINC has to be Suspended, and then Exited BEFORE any computer restart.
And the only model type that's currently available for the Mac are very touchy anyway.

Also, see Why Macs are on the way out at cpdn at the top of the Macintosh section.

Thanks for trying. This isn't an easy project to handle.

Ron Voss
Send message
Joined: 15 Jun 07
Posts: 3
Credit: 971,868
RAC: 0
Message 58986 - Posted: 9 Nov 2018, 22:30:24 UTC - in response to Message 58985.

I meant a restart of BOINC; Mac wasn't rebooted. But I would think (naively?) the last CP checkpoint should *always* survive *any* kind of restart. Thanks for your efforts; I'm a volunteer board mod elsewhere.

Les Bayliss
Volunteer moderator
Send message
Joined: 5 Sep 04
Posts: 6909
Credit: 20,843,205
RAC: 108
Message 58987 - Posted: 10 Nov 2018, 6:15:50 UTC

That's another of the many problems with this project - the large number of files that are open.
I don't recall anyone having looked into it, but "check pointing" may not be the same as "all files saved". Stopping / shutting down parts of it, may just happen to occur while some of the files are still waiting to be saved.

Back when there were still graphics, with some info about the model's state on it, I used to wait until the countdown timer (to next checkpoint), showed zero, and then a few more, before I Suspended that model. And each model was Suspended individually, before Suspending BOINC, and then Exiting BOINC.
I don't know how much overkill this was, but it worked, and it didn't take long.

The new modelling programs seem to need lots of tlc for certain types of OS, and certain versions of the OS.
e.g. Windows 10 may be the cause of a lot of the failures with the South American models, (sas25), a lot of which fail at about 3 minutes.
But on my Linux Mint computers, running the latest version of WINE, with a Windows version of BOINC, I don't have that problem.

Message boards : Cafe CPDN : Computation error


Main page · Your account · Message boards


Copyright © 2019 climateprediction.net