climateprediction.net home page
VANISHING WU'S

VANISHING WU'S

Message boards : Number crunching : VANISHING WU'S
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 47974 - Posted: 14 Jan 2014, 16:51:44 UTC - in response to Message 47973.  

Right, but aborting them seems of little use as long as they are picked up by someone else immediately.


True, I had the same experience with one of mine. I've raised it with Andy.

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 47974 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 47975 - Posted: 14 Jan 2014, 20:15:35 UTC

Alex

These models are set to run five times until there's a success of 1, so someone's going to have to run them.
There's still a lot of "serial killers" out there, plus all of the "set and forget, and never look at the forums" people, so let them run this lot and get them off the queue. Just abort them if you find any on your machine.


Backups: Here
ID: 47975 · Report as offensive     Reply Quote
Alex Plantema

Send message
Joined: 3 Sep 04
Posts: 126
Credit: 26,363,193
RAC: 0
Message 47977 - Posted: 14 Jan 2014, 22:25:45 UTC - in response to Message 47975.  

It looks like they crash anyway, so aborting them saves processor time.
ID: 47977 · Report as offensive     Reply Quote
Profile Bonsai911

Send message
Joined: 9 Sep 04
Posts: 228
Credit: 30,229,255
RAC: 3,258
Message 47978 - Posted: 14 Jan 2014, 23:22:28 UTC

aborted, and forget about them
ID: 47978 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,053,321
RAC: 4,417
Message 47979 - Posted: 15 Jan 2014, 1:17:57 UTC

The number of �tasks in progress� is back up to 48054. When I looked at it before the shutdown it was at about 43000. So it would seem that the bogus release pumped about 5000 old WU�s into the stream. That�s about 25,000 aborts to get rid of all of them!

ID: 47979 · Report as offensive     Reply Quote
DadX

Send message
Joined: 30 Aug 06
Posts: 27
Credit: 1,481,548
RAC: 768
Message 47987 - Posted: 15 Jan 2014, 18:16:06 UTC

If I keep on aborting these jobs will my machines get blacklisted?
ID: 47987 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 47988 - Posted: 15 Jan 2014, 18:48:14 UTC
Last modified: 15 Jan 2014, 18:48:46 UTC

If you mean will one of us ask the programmers to minus your computer(s) so that you can't receive new tasks until you've posted on the forum and sorted out the problem, then no, of course not. When we come across computers that appear to be serial model crashers the first thing we do is look in detail at a selection of the models to see what the problem seems to consist of. We also look at whether the problem of crashed models has been going on for quite a while without the owner doing anything about it and without posting on the forum for advice. The last thing we want to do is penalise people for one-off or short-term problems which any of us could have to deal with from time to time.

You and your computers don't fall into these serial-crasher categories. In fact it's rare for members who post on the forum to be serial model crashers.

If instead you were wondering whether the server will reduce the number of tasks you're allowed per core per day, then yes, I'd expect the server to treat each aborted model like a crashed task. This is the BOINC rule. However, even if you abort so many models that your daily quota is reduced to zero new models per core, after your midnight the quota will be increased to one per core per day again.

I've aborted quite a few of these defective models myself and advise everybody else to do the same. I believe that these models are likely to crash anyway at some stage so there's nothing to be gained through wasting electricity and computer power on crunching that cannot be of any help to the researchers.
Cpdn news
ID: 47988 · Report as offensive     Reply Quote
DadX

Send message
Joined: 30 Aug 06
Posts: 27
Credit: 1,481,548
RAC: 768
Message 47993 - Posted: 16 Jan 2014, 2:54:17 UTC

Thanks mo.v
I had in mind the latter, not the former, but didn't think the process through.
On my "value" machines and with my workday it would be rare indeed if there was more than one or two WUs to abort per day, let alone per CPU.

Energise main phasers, All weapons to full power.
Target that WU (hadcm3n_7k6p_1980_40_008437444_1)
Phaser one, fire.
Phaser two, fire.

Target destroyed.
Confirmed.
ID: 47993 · Report as offensive     Reply Quote
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 484
Credit: 29,579,234
RAC: 4,572
Message 47994 - Posted: 16 Jan 2014, 9:49:53 UTC

I'd picked up 7. hadcm3n_7l9l, _7aou, _7aol, _7mok, _7gl5,_7c00 and _7c07. 7l9l still runing, rest all aborted and looks as if someone else has got them. All had been marked no resubmission:-((
ID: 47994 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,053,321
RAC: 4,417
Message 48016 - Posted: 21 Jan 2014, 15:37:01 UTC

I see that the hadcm3n_7xxx WU�s are still with us. I aborted 2 of them this morning. Judging from the �tasks in progress� I�d say that there are about 2K of them still the hopper.

One reason it is taking so long to get rid of them is that with all the download problems recently with of hadam3p and the need to abort all the hadcm3n �7�s� my daily download quota is probably about 1 or 2 WU�s per day per machine. Most of the rest of us are probably in the same condition. Is there some way that the project staff can do to increase these quotas so we can clear them faster. I don�t mind downloading and aborting them (all you can eat bandwidth in America from most ISP�s), but, I am only getting about 1 per machine per day.

ID: 48016 · Report as offensive     Reply Quote
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 484
Credit: 29,579,234
RAC: 4,572
Message 48017 - Posted: 21 Jan 2014, 16:47:19 UTC - in response to Message 47994.  

7l9l completed OK for me yesterday without any apparent problems. Mine was the fourth go at it.
ID: 48017 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1079
Credit: 6,903,221
RAC: 6,722
Message 48018 - Posted: 21 Jan 2014, 16:49:49 UTC - in response to Message 48016.  
Last modified: 21 Jan 2014, 19:06:36 UTC

There are also some legitimate resends and the project staff may be reluctant to tinker with the general BOINC settings to sort out a specific problem. I got one valid model in among the 7-series. [Edit: by "valid" I mean not from the 7-series.]

These reissues and a number of other problems may perhaps come from the project's original and creditable desire to "reward" participants by retaining job information that on other projects would be rapidly deleted. However, the graphics that used to provide some interest in the longer jobs have gone, partly through the chopping up of jobs into smaller pieces (at the request of participants) and partly through neglect. Rather than fix bugs in user facilities the project invariably seems to opt for deleting them instead! There needs, I think, to be a major design correction: either adopt the more common BOINC "easy come, easy go" philosophy or seriously invest in the kind of reliability and user experience that would make the big jobs a pleasure to complete.
ID: 48018 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,053,321
RAC: 4,417
Message 48020 - Posted: 22 Jan 2014, 0:43:33 UTC

I am always careful to check whether a download is one of the 7 series before I abort it. The problem is that all of these aborts have lowered my daily quota so much that when real WU�s come along I may not get any. My daily quota may have already been filled for that day with �7�.

One of my machines has not been able to download any real (i.e. non-7 WU�s) for several day now and is completely out ot CPDN work. It is running WCG projects 24/7. Every day or two I delete 1 (or 2) 7 series WU�s.

ID: 48020 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 48023 - Posted: 22 Jan 2014, 2:08:28 UTC

Every night at midnight your time you'll still have your work quota put back to one model per core per day.

While waiting for more good models we just have to get on with work from other projects, of which we should all be able to choose several we think are worth spending computer time on. The WCG projects are all highly respected.
Cpdn news
ID: 48023 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 48025 - Posted: 22 Jan 2014, 5:25:40 UTC - in response to Message 48020.  

Hi Jim

The best way to avoid these bad tasks is to set the project for No new tasks, and just keep an eye on the News thread and the Server Status page.

ID: 48025 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 20
Message 48030 - Posted: 23 Jan 2014, 2:52:46 UTC

I have a work unit: hadcm3n_7vwx_1980_40_008452644_1
It has been running successfully and is at about 25% complete....should I abort it??
ID: 48030 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,403,322
RAC: 5,085
Message 48031 - Posted: 23 Jan 2014, 3:12:35 UTC - in response to Message 48030.  

I have a work unit: hadcm3n_7vwx_1980_40_008452644_1
It has been running successfully and is at about 25% complete....should I abort it??

That was a resend of a task that was sent out originally in September and never completed. So, it wasn't one of the bad batch and you should probably continue crunching it.
ID: 48031 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 20
Message 48035 - Posted: 24 Jan 2014, 1:37:25 UTC - in response to Message 48031.  

Will do...its the only WU I have and now at 33% so I'll keep crunching.
Art
ID: 48035 · Report as offensive     Reply Quote
ed2353

Send message
Joined: 15 Feb 06
Posts: 137
Credit: 33,347,857
RAC: 0
Message 48040 - Posted: 24 Jan 2014, 10:54:27 UTC

With the recent backup of the server and reset of the updating there were a few tasks available this morning.
Some of them were the "No Resubmission" variety.
I noticed that those had a completion year of 2023, so they should be easy to recognize in your Task list.
ID: 48040 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 20
Message 48056 - Posted: 26 Jan 2014, 15:04:03 UTC

Any idea when there will be some actual new work units?? I have three machines working on other projects, but just wondering why we are seeing such a long gap in new work units.
ID: 48056 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : VANISHING WU'S

©2024 climateprediction.net