climateprediction.net home page
VANISHING WU'S

VANISHING WU'S

Message boards : Number crunching : VANISHING WU'S
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1123
Credit: 20,460,788
RAC: 2,008
Message 47972 - Posted: 14 Jan 2014, 14:44:03 UTC


ID: 47972 · Report as offensive     Reply Quote
Alex Plantema

Send message
Joined: 3 Sep 04
Posts: 122
Credit: 26,128,640
RAC: 1,316
Message 47973 - Posted: 14 Jan 2014, 14:49:30 UTC - in response to Message 47972.  

Right, but aborting them seems of little use as long as they are picked up by someone else immediately.
ID: 47973 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 47974 - Posted: 14 Jan 2014, 16:51:44 UTC - in response to Message 47973.  

Right, but aborting them seems of little use as long as they are picked up by someone else immediately.


True, I had the same experience with one of mine. I've raised it with Andy.

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 47974 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7233
Credit: 23,154,247
RAC: 170
Message 47975 - Posted: 14 Jan 2014, 20:15:35 UTC

Alex

These models are set to run five times until there's a success of 1, so someone's going to have to run them.
There's still a lot of "serial killers" out there, plus all of the "set and forget, and never look at the forums" people, so let them run this lot and get them off the queue. Just abort them if you find any on your machine.


Backups: Here
ID: 47975 · Report as offensive     Reply Quote
Alex Plantema

Send message
Joined: 3 Sep 04
Posts: 122
Credit: 26,128,640
RAC: 1,316
Message 47977 - Posted: 14 Jan 2014, 22:25:45 UTC - in response to Message 47975.  

It looks like they crash anyway, so aborting them saves processor time.
ID: 47977 · Report as offensive     Reply Quote
Profile Bonsai911

Send message
Joined: 9 Sep 04
Posts: 226
Credit: 29,832,224
RAC: 0
Message 47978 - Posted: 14 Jan 2014, 23:22:28 UTC

aborted, and forget about them
ID: 47978 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1123
Credit: 20,460,788
RAC: 2,008
Message 47979 - Posted: 15 Jan 2014, 1:17:57 UTC


ID: 47979 · Report as offensive     Reply Quote
DadX

Send message
Joined: 30 Aug 06
Posts: 24
Credit: 1,245,326
RAC: 0
Message 47987 - Posted: 15 Jan 2014, 18:16:06 UTC

If I keep on aborting these jobs will my machines get blacklisted?
ID: 47987 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,263,895
RAC: 599
Message 47988 - Posted: 15 Jan 2014, 18:48:14 UTC
Last modified: 15 Jan 2014, 18:48:46 UTC

If you mean will one of us ask the programmers to minus your computer(s) so that you can't receive new tasks until you've posted on the forum and sorted out the problem, then no, of course not. When we come across computers that appear to be serial model crashers the first thing we do is look in detail at a selection of the models to see what the problem seems to consist of. We also look at whether the problem of crashed models has been going on for quite a while without the owner doing anything about it and without posting on the forum for advice. The last thing we want to do is penalise people for one-off or short-term problems which any of us could have to deal with from time to time.

You and your computers don't fall into these serial-crasher categories. In fact it's rare for members who post on the forum to be serial model crashers.

If instead you were wondering whether the server will reduce the number of tasks you're allowed per core per day, then yes, I'd expect the server to treat each aborted model like a crashed task. This is the BOINC rule. However, even if you abort so many models that your daily quota is reduced to zero new models per core, after your midnight the quota will be increased to one per core per day again.

I've aborted quite a few of these defective models myself and advise everybody else to do the same. I believe that these models are likely to crash anyway at some stage so there's nothing to be gained through wasting electricity and computer power on crunching that cannot be of any help to the researchers.
Cpdn news
ID: 47988 · Report as offensive     Reply Quote
DadX

Send message
Joined: 30 Aug 06
Posts: 24
Credit: 1,245,326
RAC: 0
Message 47993 - Posted: 16 Jan 2014, 2:54:17 UTC

Thanks mo.v
I had in mind the latter, not the former, but didn't think the process through.
On my "value" machines and with my workday it would be rare indeed if there was more than one or two WUs to abort per day, let alone per CPU.

Energise main phasers, All weapons to full power.
Target that WU (hadcm3n_7k6p_1980_40_008437444_1)
Phaser one, fire.
Phaser two, fire.

Target destroyed.
Confirmed.
ID: 47993 · Report as offensive     Reply Quote
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 335
Credit: 16,789,387
RAC: 2,093
Message 47994 - Posted: 16 Jan 2014, 9:49:53 UTC

I'd picked up 7. hadcm3n_7l9l, _7aou, _7aol, _7mok, _7gl5,_7c00 and _7c07. 7l9l still runing, rest all aborted and looks as if someone else has got them. All had been marked no resubmission:-((
ID: 47994 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1123
Credit: 20,460,788
RAC: 2,008
Message 48016 - Posted: 21 Jan 2014, 15:37:01 UTC


ID: 48016 · Report as offensive     Reply Quote
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 335
Credit: 16,789,387
RAC: 2,093
Message 48017 - Posted: 21 Jan 2014, 16:47:19 UTC - in response to Message 47994.  

7l9l completed OK for me yesterday without any apparent problems. Mine was the fourth go at it.
ID: 48017 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1019
Credit: 5,511,676
RAC: 2,196
Message 48018 - Posted: 21 Jan 2014, 16:49:49 UTC - in response to Message 48016.  
Last modified: 21 Jan 2014, 19:06:36 UTC

There are also some legitimate resends and the project staff may be reluctant to tinker with the general BOINC settings to sort out a specific problem. I got one valid model in among the 7-series. [Edit: by "valid" I mean not from the 7-series.]

These reissues and a number of other problems may perhaps come from the project's original and creditable desire to "reward" participants by retaining job information that on other projects would be rapidly deleted. However, the graphics that used to provide some interest in the longer jobs have gone, partly through the chopping up of jobs into smaller pieces (at the request of participants) and partly through neglect. Rather than fix bugs in user facilities the project invariably seems to opt for deleting them instead! There needs, I think, to be a major design correction: either adopt the more common BOINC "easy come, easy go" philosophy or seriously invest in the kind of reliability and user experience that would make the big jobs a pleasure to complete.
ID: 48018 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1123
Credit: 20,460,788
RAC: 2,008
Message 48020 - Posted: 22 Jan 2014, 0:43:33 UTC


ID: 48020 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,263,895
RAC: 599
Message 48023 - Posted: 22 Jan 2014, 2:08:28 UTC

Every night at midnight your time you'll still have your work quota put back to one model per core per day.

While waiting for more good models we just have to get on with work from other projects, of which we should all be able to choose several we think are worth spending computer time on. The WCG projects are all highly respected.
Cpdn news
ID: 48023 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7233
Credit: 23,154,247
RAC: 170
Message 48025 - Posted: 22 Jan 2014, 5:25:40 UTC - in response to Message 48020.  

Hi Jim

The best way to avoid these bad tasks is to set the project for No new tasks, and just keep an eye on the News thread and the Server Status page.

ID: 48025 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 252
Credit: 15,751,502
RAC: 41
Message 48030 - Posted: 23 Jan 2014, 2:52:46 UTC

I have a work unit: hadcm3n_7vwx_1980_40_008452644_1
It has been running successfully and is at about 25% complete....should I abort it??
ID: 48030 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 1931
Credit: 41,487,636
RAC: 2,320
Message 48031 - Posted: 23 Jan 2014, 3:12:35 UTC - in response to Message 48030.  

I have a work unit: hadcm3n_7vwx_1980_40_008452644_1
It has been running successfully and is at about 25% complete....should I abort it??

That was a resend of a task that was sent out originally in September and never completed. So, it wasn't one of the bad batch and you should probably continue crunching it.
ID: 48031 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 252
Credit: 15,751,502
RAC: 41
Message 48035 - Posted: 24 Jan 2014, 1:37:25 UTC - in response to Message 48031.  

Will do...its the only WU I have and now at 33% so I'll keep crunching.
Art
ID: 48035 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : VANISHING WU'S

©2020 climateprediction.net