climateprediction.net home page
All work units received since 1-Aug-18 get a "Computation error"
All work units received since 1-Aug-18 get a "Computation error"
log in

Advanced search

Questions and Answers : Windows : All work units received since 1-Aug-18 get a "Computation error"

Author Message
RogerM
Send message
Joined: 31 Aug 04
Posts: 2
Credit: 4,192,205
RAC: 2,521
Message 58699 - Posted: 5 Sep 2018, 19:48:48 UTC

The work units run for varying lengths of time so I'm burning through a lot of CPU time with out getting any credits since the project came back on line on 1-Aug-18. Here's a typical work unit; https://www.cpdn.org/cpdnboinc/workunit.php?wuid=11584208. And here's another; https://www.cpdn.org/cpdnboinc/workunit.php?wuid=11606997. The work units also appear to fail on other computers. Is this a known issue, and is there something I can do about it?

Thank you.
____________

Profile geophi
Volunteer moderator
Send message
Joined: 7 Aug 04
Posts: 1811
Credit: 36,354,717
RAC: 10,605
Message 58715 - Posted: 6 Sep 2018, 16:43:51 UTC - in response to Message 58699.

I started looking at that PC's failures, from the ones around early Aug until now. At first I thought that maybe you had incredibly bad luck with the SAM25 models, which have a pretty high failure rate over all. But then I saw that you had PNW, CAM and CAF failures as well. All the SAM and CAM failure were signal 11 while the PNW ones weren't.

Did anything change on your PC around August 1st?

Besides basic maintenance such as blowing out the air ducts with compressed air when the PC is shutdown and ensuring that there is some space between the vents and the surface it is on, you could whitelist the BOINC program files and data folders from antivirus scanning.

On the failures that occurred after quite some time and some returned trickles, the stderr event log has lots of "suspends" in the log. cpdn tasks are more prone to failure when there are lots of suspends. In the computing preferences in BOINC manager, you should un-tick "Suspend when computer is in use" and "Suspend when non-BOINC usage is above xx percent" and tick "Leave non-GPU tasks in memory when suspended".

Finally, you might want to set the climateprediction.net project to no new tasks, remove it from boinc manager, then re-add it in order to make sure there are no corrupted files in the projects/climateprediction.net directory.

Jim1348
Send message
Joined: 15 Jan 06
Posts: 356
Credit: 14,883,302
RAC: 49,539
Message 58716 - Posted: 6 Sep 2018, 19:26:19 UTC - in response to Message 58715.

I started looking at that PC's failures, from the ones around early Aug until now. At first I thought that maybe you had incredibly bad luck with the SAM25 models, which have a pretty high failure rate over all. But then I saw that you had PNW, CAM and CAF failures as well. All the SAM and CAM failure were signal 11 while the PNW ones weren't.

I am not sure of the difference between a "signal 11" failure and anything else. But I had a pnw25 fail recently without signal 11. It may have been due to lack of space on my ramdisk; I was not around at the time to check it.
https://www.cpdn.org/cpdnboinc/result.php?resultid=21263860

It seems that otherwise each of RogerM's failures can be explained by bad luck (very bad luck). It is very strange.

Les Bayliss
Volunteer moderator
Send message
Joined: 5 Sep 04
Posts: 6909
Credit: 20,843,205
RAC: 108
Message 58717 - Posted: 6 Sep 2018, 19:49:18 UTC

It may also be overclocking.

Les Bayliss
Volunteer moderator
Send message
Joined: 5 Sep 04
Posts: 6909
Credit: 20,843,205
RAC: 108
Message 58719 - Posted: 6 Sep 2018, 20:49:56 UTC

Do you Suspend BOINC, then Exit BOINC before shutting down the computer?

Do you allow Windows to apply updates while climate models are running?

WB8ILI
Send message
Joined: 1 Sep 04
Posts: 129
Credit: 47,913,774
RAC: 31,208
Message 58726 - Posted: 7 Sep 2018, 19:50:53 UTC

I am having the same problem (Signal 11) on one of my computers. I know there were a lot of segment violations in the past, but as I remember, most of those were on LINUX machines. But, mine is a Windows 10 Laptop.

I am not overclocking, CPU temp is reasonable, not installing Windows updates, and not suspending work.


https://www.cpdn.org/cpdnboinc/show_host_detail.php?hostid=1317652
____________

Profile geophi
Volunteer moderator
Send message
Joined: 7 Aug 04
Posts: 1811
Credit: 36,354,717
RAC: 10,605
Message 58727 - Posted: 7 Sep 2018, 22:05:35 UTC - in response to Message 58726.
Last modified: 7 Sep 2018, 22:06:52 UTC

I am having the same problem (Signal 11) on one of my computers. I know there were a lot of segment violations in the past, but as I remember, most of those were on LINUX machines. But, mine is a Windows 10 Laptop.

I am not overclocking, CPU temp is reasonable, not installing Windows updates, and not suspending work.


https://www.cpdn.org/cpdnboinc/show_host_detail.php?hostid=1317652

Looks like all your recent failures were SAM25 models from batch 742. All of those have had at least one failure on another PC before you downloaded them. This batch has a high failure rate relative to a lot of other batches. But, I appear to have been lucky so far with 4 completions and 3 more running with at least one trickle with no failures from that batch.

You're running 2 EU25's now so hopefully you'll have more luck with them.

sinusoid
Send message
Joined: 7 Dec 07
Posts: 1
Credit: 10,549,749
RAC: 24,003
Message 59002 - Posted: 13 Nov 2018, 17:15:55 UTC - in response to Message 58727.

I am having the same issues, Most of the recent ones are WAH, and they keep having errors. No overclocking, and over half the time I am not even on the computer while it is working.

Profile geophi
Volunteer moderator
Send message
Joined: 7 Aug 04
Posts: 1811
Credit: 36,354,717
RAC: 10,605
Message 59004 - Posted: 14 Nov 2018, 0:39:48 UTC - in response to Message 59002.

I am having the same issues, Most of the recent ones are WAH, and they keep having errors. No overclocking, and over half the time I am not even on the computer while it is working.


Most of the errors on that computer are on SAM25 models with signal 11 errors. The SAM25 models are very sensitive and quite a few computers are having those problems. Hopefully you'll pick up some of the different WAH2 regions from now on.

Questions and Answers : Windows : All work units received since 1-Aug-18 get a "Computation error"


Main page · Your account · Message boards


Copyright © 2019 climateprediction.net