climateprediction.net home page
50-100% Error while computing

50-100% Error while computing

Questions and Answers : Windows : 50-100% Error while computing
Message board moderation

To post messages, you must log in.

AuthorMessage
Lasse Hintze Třndering-Jensen

Send message
Joined: 18 Jun 14
Posts: 1
Credit: 1,235,717
RAC: 0
Message 55594 - Posted: 29 Jan 2017, 10:11:40 UTC

Having run Boinc CP tasks for a while on about 10 computers having windows 7, 10 and server 2012, I have just recently noted that a huge part, 50-100% of the tasks, ends in “Error while computing”. What I up with that?

Example I have a 48 processor Intel Xeon CPU E5-2697 having run 31 tasks (4 is in progress), in about a week, where 26 of them ended in “Error while computing”. Only one is completed.

On another 24 processor Intel Xeon CPU E5-2620, having run 12 tasks in about a week, all ending with “Error while computing”.

Strangely enough I do get some credit, for what I translate as incomplete tasks? So questions; why does this project have this insane amount of computing errors, compared to other projects? No other project I participate in is even remotely in this neighborhood of errors. And second seeing that you do get credits for errors, do you get enough for the amount of processing power you put into it? or does a huge part end in the bin?

I’m seriously considering dropping CP all together, because I don’t want to waste CPU power on constantly incomplete tasks.
ID: 55594 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,377,675
RAC: 3,657
Message 55595 - Posted: 29 Jan 2017, 11:17:17 UTC

A number of questions which might give hints. - I get very few errors but my boxes are only 2/4 core machines.
1. Are BOINC working and data directories excluded from antivirus scans? If an anti-virus program has a lock on a file when BOINC wants to write to it, it will cause a crash.

2. Are the machines overclocked? This makes crashes more likely.

3. How fast are disks compared to throughput of work. I recall a previous thread where disk writes not being fast enough for the amount of work being done by so many cores was suggested as a problem.

4.Do you regularly have non-BOINC things happening on the computer which quickly take cpu time away from BOINC? CPDN does seem more prone to problems due to this.

5 Do you have Leave non gpu tasks in memory when suspended ticked? If not ticking it seems to greatly reduce the number of errors.

I am sure there are a couple more that I have missed out but brain isn't really in gear yet.
ID: 55595 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 55602 - Posted: 29 Jan 2017, 20:18:55 UTC

The only other one that I can think of that's general, is Suspend work if CPU usage is above
Unless this is either turned off or set to maximum, (not sure what it's called these days), the task will keep being turned on and off every time that your use causes it to exceed the limit.
You can get away with this for a while, but sooner or later you'll catch the program at a critical point, and it'll fail.

Some ideas for improving the chances of a task surviving are here.

PS
I think that in the BOINC versions since then, the Suspend work if CPU usage is above setting needs to be 100%, not zero, to turn it off.
ID: 55602 · Report as offensive     Reply Quote

Questions and Answers : Windows : 50-100% Error while computing

©2024 climateprediction.net