climateprediction.net home page
VANISHING WU'S

VANISHING WU'S

Message boards : Number crunching : VANISHING WU'S
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1123
Credit: 20,460,788
RAC: 2,008
Message 47123 - Posted: 19 Sep 2013, 17:14:57 UTC


ID: 47123 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 384
Credit: 8,655,572
RAC: 650
Message 47124 - Posted: 19 Sep 2013, 17:20:32 UTC - in response to Message 47123.  

Andy told some of us "In case there is a query on the boards: I have been asked to pause the current workunits in the queue in and put out another batch of workunits, the scientists want this other batch of workunits computed before the current workunits in the queue, so you will shortly see a drop of the queue to 2200." - obviously, no mod with access to the news thread has picked that up yet.
ID: 47124 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1123
Credit: 20,460,788
RAC: 2,008
Message 47125 - Posted: 19 Sep 2013, 20:02:45 UTC


ID: 47125 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 368
Credit: 137,845,641
RAC: 18,633
Message 47129 - Posted: 20 Sep 2013, 3:51:12 UTC

Is there any way to tell this priority batch -- say by the 4-letter code?
I can give the models some help - even by suspending others - if that would be useful
ID: 47129 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1123
Credit: 20,460,788
RAC: 2,008
Message 47160 - Posted: 25 Sep 2013, 4:05:24 UTC


ID: 47160 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 94,916,150
RAC: 20
Message 47161 - Posted: 25 Sep 2013, 4:59:35 UTC
Last modified: 25 Sep 2013, 5:10:00 UTC

It seems a safe assumption the the small set, now absorbed, will be allowed to ferment (meaning that time will be allowed for all tasks downloaded by dodgy machines [and there are too many] to crash and be reissued to machines with a chance to complete them [I received one]).

Given that the larger lot was also on the "wanted" list, it seems a safe assumption that lot will be reloaded, eh?

Patience...


[Edit] Eirik,

As best I can determine, they have "8xxx" names. My guesstimate resulted from the original issue date.
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 47161 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 368
Credit: 137,845,641
RAC: 18,633
Message 47163 - Posted: 25 Sep 2013, 5:54:25 UTC - in response to Message 47161.  

Astro, your analysis seems logical to me. Might also explain why the queued WUs on the server declined slowly at first and then declined faster - at first lots of failures due to "dodgy machines" getting re-issued - then - more and more get picked up by machines that can hack them, at least for a while. Size of the volunteers' input queue probably also a factor.

Anyhow, with the big server queue before the priority batch came out, few volunteers should be running out of work soon.

So yeah - patience is a virtue.

And thanks - I see several 8xxx running and issued on the 19th and 20th UTC - I'll persuade BOINC to give them priority.

Thanks

e




ID: 47163 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1123
Credit: 20,460,788
RAC: 2,008
Message 47169 - Posted: 26 Sep 2013, 4:31:05 UTC
Last modified: 26 Sep 2013, 4:32:42 UTC


ID: 47169 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 368
Credit: 137,845,641
RAC: 18,633
Message 47170 - Posted: 26 Sep 2013, 4:53:27 UTC - in response to Message 47169.  


ID: 47170 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 94,916,150
RAC: 20
Message 47171 - Posted: 26 Sep 2013, 4:59:50 UTC - in response to Message 47169.  
Last modified: 26 Sep 2013, 5:01:36 UTC

There seems to be a LIFO (Last In, First Out) algorithm working here. What we don't know is how long the staff will procrastinate before reloading the earlier set.

Meanwhile, we wait...

[EDIT] Eirik beat me to it, and with a better reply.
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 47171 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7233
Credit: 23,154,247
RAC: 170
Message 47173 - Posted: 26 Sep 2013, 5:20:27 UTC

And we don't know WHY the sudden interest in this smaller, more "urgent" batch.
That may influence the release of a larger batch, either the ones that have been deferred, or a new lot.

Perhaps that group of researchers thinks it's on to something, and wants a more precise look at some data.

Time will tell.
Ommmm
Or Moooo

ID: 47173 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1019
Credit: 5,511,676
RAC: 2,196
Message 47194 - Posted: 28 Sep 2013, 17:07:19 UTC

8k tasks at the moment, so something is back ...
ID: 47194 · Report as offensive     Reply Quote
Profile Nick Perry
Avatar

Send message
Joined: 12 Jul 05
Posts: 11
Credit: 528,541
RAC: 107
Message 47287 - Posted: 12 Oct 2013, 11:31:54 UTC

Arising from earlier posts; I have been having problems running CPDN which I have finally resolved (dry joint on MB and intermittent PSU fault), will this mean my 'reliability score' improves or will my system remain on the Bad Boy list?

What I'd really like is to be able to reload a failed task and rerun it to make sure.
ID: 47287 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2739
Credit: 3,394,851
RAC: 1,929
Message 47290 - Posted: 12 Oct 2013, 12:33:22 UTC - in response to Message 47287.  

You will have to wait till some more work unit's come around unfortunately.
ID: 47290 · Report as offensive     Reply Quote
Profile Nick Perry
Avatar

Send message
Joined: 12 Jul 05
Posts: 11
Credit: 528,541
RAC: 107
Message 47292 - Posted: 12 Oct 2013, 17:02:45 UTC - in response to Message 47290.  

Just got a single unit which died while somebody else was crunching it. That'll be a good enough test I think. If not then the argument for a new system becomes moot!
ID: 47292 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2739
Credit: 3,394,851
RAC: 1,929
Message 47309 - Posted: 13 Oct 2013, 7:20:56 UTC - in response to Message 47292.  

Just as long as the task didn't fail because of a problem with the task - I note the other task in the same work unit is also showing as a computation error.
ID: 47309 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1123
Credit: 20,460,788
RAC: 2,008
Message 47335 - Posted: 17 Oct 2013, 16:00:07 UTC


ID: 47335 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7233
Credit: 23,154,247
RAC: 170
Message 47336 - Posted: 17 Oct 2013, 17:22:40 UTC - in response to Message 47335.  

Talk to the researchers at the other science centres. :)

ID: 47336 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 368
Credit: 137,845,641
RAC: 18,633
Message 47359 - Posted: 20 Oct 2013, 13:06:17 UTC

A few points to consider

That "priority batch" issued back about September 20 -- first results started coming in maybe a week ago, lots more to come, considering median speed of crunchers out there.

There is work available -- doesn't show on server status page because it's being snarfed up as fast as it shows up. The models my (mid-sized) bunch of cores have got the last few days all have _1900_ in their names - significant? maybe not. Maybe rebasing some of the models back to early time-base, or maybe not.

For the 95% of volunteers who never read these forums -- who's to know that the work-queue is shorter or longer?

Like Les said (re-interpreting a bit)
The various research centres that use CPDN are probably continuously re-evaluating what models are top priority - for them.
CPDN is a shared science resource - not a huge one like CERN (or the ZGS in my youth) -- but a valuable resource. And one with a looong turnaround time. How long to get your centre's batch 50% done? A few months? Maybe.

It's easy for me to picture the various researchers squabbling to get priority on CPDN. "MY project needs your resources -- it's free aint it?" "No - we got some good results last batch - we need to get in there first!" etc.

Fortunately, all we volunteers have to do is crunch, crunch, crunch. We will never know whatever priority fights the users of our machines go through -- they will have to work that out amongst themselves, and the CPDN staff will be the final silent arbiter.

Hope this prolix post makes the (probable) situation clearer.
But 95% of the actual producers won't know or care.

e



ID: 47359 · Report as offensive     Reply Quote
Misty

Send message
Joined: 14 Feb 06
Posts: 50
Credit: 7,856,758
RAC: 0
Message 47395 - Posted: 23 Oct 2013, 7:55:56 UTC

Any news of when we can expect more work?
ID: 47395 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Number crunching : VANISHING WU'S

©2020 climateprediction.net