climateprediction.net home page
VANISHING WU'S

VANISHING WU'S

Message boards : Number crunching : VANISHING WU'S
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,053,321
RAC: 4,417
Message 47123 - Posted: 19 Sep 2013, 17:14:57 UTC

What happened? Yesterday, when I checked the �server status� page there were ~ 39,000 WU�s waiting to be downloaded. Now there are 2116. I can�t believe that project members downloaded 37,000 WU�s overnight. Did the Admins. Pull them for some reason?

ID: 47123 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 925
Credit: 34,100,818
RAC: 11,270
Message 47124 - Posted: 19 Sep 2013, 17:20:32 UTC - in response to Message 47123.  

Andy told some of us "In case there is a query on the boards: I have been asked to pause the current workunits in the queue in and put out another batch of workunits, the scientists want this other batch of workunits computed before the current workunits in the queue, so you will shortly see a drop of the queue to 2200." - obviously, no mod with access to the news thread has picked that up yet.
ID: 47124 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,053,321
RAC: 4,417
Message 47125 - Posted: 19 Sep 2013, 20:02:45 UTC

Thanks. I just wondered what had happened to them. As far as WU�s go the more the merrier.

ID: 47125 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 47129 - Posted: 20 Sep 2013, 3:51:12 UTC

Is there any way to tell this priority batch -- say by the 4-letter code?
I can give the models some help - even by suspending others - if that would be useful
ID: 47129 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,053,321
RAC: 4,417
Message 47160 - Posted: 25 Sep 2013, 4:05:24 UTC

I see that WU counter on the server is now at zero. Does this mean that the WU�s that the Scientist�s wanted run immediately have been downloaded and are being processed or are they still to be generated.

Also, will the 37,000 WU�s that were pulled from the queue be restored soon. It was nice having work available again.

ID: 47160 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 47161 - Posted: 25 Sep 2013, 4:59:35 UTC
Last modified: 25 Sep 2013, 5:10:00 UTC

It seems a safe assumption the the small set, now absorbed, will be allowed to ferment (meaning that time will be allowed for all tasks downloaded by dodgy machines [and there are too many] to crash and be reissued to machines with a chance to complete them [I received one]).

Given that the larger lot was also on the "wanted" list, it seems a safe assumption that lot will be reloaded, eh?

Patience...


[Edit] Eirik,

As best I can determine, they have "8xxx" names. My guesstimate resulted from the original issue date.
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 47161 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 47163 - Posted: 25 Sep 2013, 5:54:25 UTC - in response to Message 47161.  

Astro, your analysis seems logical to me. Might also explain why the queued WUs on the server declined slowly at first and then declined faster - at first lots of failures due to "dodgy machines" getting re-issued - then - more and more get picked up by machines that can hack them, at least for a while. Size of the volunteers' input queue probably also a factor.

Anyhow, with the big server queue before the priority batch came out, few volunteers should be running out of work soon.

So yeah - patience is a virtue.

And thanks - I see several 8xxx running and issued on the 19th and 20th UTC - I'll persuade BOINC to give them priority.

Thanks

e




ID: 47163 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,053,321
RAC: 4,417
Message 47169 - Posted: 26 Sep 2013, 4:31:05 UTC
Last modified: 26 Sep 2013, 4:32:42 UTC

So if I understand your answer. All of the priority WU�s have been downloaded, but, I can expect the download queue to remain empty for some time as the Scientist wait to see how many of them get crashed by the less stable computers (and less careful or knowledgeable) crunchers. That way they can redownload them fast to other hopefully more capable machines.
ID: 47169 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 47170 - Posted: 26 Sep 2013, 4:53:27 UTC - in response to Message 47169.  

So if I understand your answer. All of the priority WU�s have been downloaded, but, I can expect the download queue to remain empty for some time as the Scientist wait to see how many of them get crashed by the less stable computers (and less careful or knowledgeable) crunchers. That way they can redownload them fast to other hopefully more capable machines.


A nitpick -- the priority WU's have been downloaded AT LEAST ONCE - or twice -- etc.
The failure rate on newly downloaded WUs and the time to failure depends on the fraction of "impaired boxes" and the "days_extra_work" on those machines. This is a feature of distributed computing. CPDN has extremely long run-times to "success" -- that is another thing to think about.

How long the download queue remains empty depends on lots of factors -- like how many of the priority WUs are actually running and not failed now. And the strategy at CPDN for dealing with the inevitable residue of priority WUs that hit too many "loser boxes" - none of which we volunteers can estimate.

BUT - at some point -- fairly soon I'd expect -- there will be new WUs available - probably a lot of them
ID: 47170 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 47171 - Posted: 26 Sep 2013, 4:59:50 UTC - in response to Message 47169.  
Last modified: 26 Sep 2013, 5:01:36 UTC

There seems to be a LIFO (Last In, First Out) algorithm working here. What we don't know is how long the staff will procrastinate before reloading the earlier set.

Meanwhile, we wait...

[EDIT] Eirik beat me to it, and with a better reply.
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 47171 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 47173 - Posted: 26 Sep 2013, 5:20:27 UTC

And we don't know WHY the sudden interest in this smaller, more "urgent" batch.
That may influence the release of a larger batch, either the ones that have been deferred, or a new lot.

Perhaps that group of researchers thinks it's on to something, and wants a more precise look at some data.

Time will tell.
Ommmm
Or Moooo

ID: 47173 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1079
Credit: 6,904,878
RAC: 6,593
Message 47194 - Posted: 28 Sep 2013, 17:07:19 UTC

8k tasks at the moment, so something is back ...
ID: 47194 · Report as offensive     Reply Quote
Profile Nick Perry
Avatar

Send message
Joined: 12 Jul 05
Posts: 11
Credit: 528,541
RAC: 0
Message 47287 - Posted: 12 Oct 2013, 11:31:54 UTC

Arising from earlier posts; I have been having problems running CPDN which I have finally resolved (dry joint on MB and intermittent PSU fault), will this mean my 'reliability score' improves or will my system remain on the Bad Boy list?

What I'd really like is to be able to reload a failed task and rerun it to make sure.
Prior to the debacle of the 19th century industrial revolution, Earth�s ecology had been an object of particular admiration for commentators in the Western Spiral Arm of the Galaxy .
ID: 47287 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,378,503
RAC: 3,632
Message 47290 - Posted: 12 Oct 2013, 12:33:22 UTC - in response to Message 47287.  

You will have to wait till some more work unit's come around unfortunately.
ID: 47290 · Report as offensive     Reply Quote
Profile Nick Perry
Avatar

Send message
Joined: 12 Jul 05
Posts: 11
Credit: 528,541
RAC: 0
Message 47292 - Posted: 12 Oct 2013, 17:02:45 UTC - in response to Message 47290.  

Just got a single unit which died while somebody else was crunching it. That'll be a good enough test I think. If not then the argument for a new system becomes moot!
Prior to the debacle of the 19th century industrial revolution, Earth�s ecology had been an object of particular admiration for commentators in the Western Spiral Arm of the Galaxy .
ID: 47292 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,378,503
RAC: 3,632
Message 47309 - Posted: 13 Oct 2013, 7:20:56 UTC - in response to Message 47292.  

Just as long as the task didn't fail because of a problem with the task - I note the other task in the same work unit is also showing as a computation error.
ID: 47309 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,053,321
RAC: 4,417
Message 47335 - Posted: 17 Oct 2013, 16:00:07 UTC

The new work queue has now been empty to 4 or 5 days. My machines are getting to the point were they are starting to beg for work. It would be nice if some of those 37,000 WU�s that were pulled to run the special batch were returned to the queue. (Hint, hint).

ID: 47335 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 47336 - Posted: 17 Oct 2013, 17:22:40 UTC - in response to Message 47335.  

Talk to the researchers at the other science centres. :)

ID: 47336 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 47359 - Posted: 20 Oct 2013, 13:06:17 UTC

A few points to consider

That "priority batch" issued back about September 20 -- first results started coming in maybe a week ago, lots more to come, considering median speed of crunchers out there.

There is work available -- doesn't show on server status page because it's being snarfed up as fast as it shows up. The models my (mid-sized) bunch of cores have got the last few days all have _1900_ in their names - significant? maybe not. Maybe rebasing some of the models back to early time-base, or maybe not.

For the 95% of volunteers who never read these forums -- who's to know that the work-queue is shorter or longer?

Like Les said (re-interpreting a bit)
The various research centres that use CPDN are probably continuously re-evaluating what models are top priority - for them.
CPDN is a shared science resource - not a huge one like CERN (or the ZGS in my youth) -- but a valuable resource. And one with a looong turnaround time. How long to get your centre's batch 50% done? A few months? Maybe.

It's easy for me to picture the various researchers squabbling to get priority on CPDN. "MY project needs your resources -- it's free aint it?" "No - we got some good results last batch - we need to get in there first!" etc.

Fortunately, all we volunteers have to do is crunch, crunch, crunch. We will never know whatever priority fights the users of our machines go through -- they will have to work that out amongst themselves, and the CPDN staff will be the final silent arbiter.

Hope this prolix post makes the (probable) situation clearer.
But 95% of the actual producers won't know or care.

e



ID: 47359 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 47396 - Posted: 23 Oct 2013, 8:36:34 UTC - in response to Message 47395.  

No.

ID: 47396 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Number crunching : VANISHING WU'S

©2024 climateprediction.net