climateprediction.net home page
Smaller Work Units

Smaller Work Units

Questions and Answers : Wish list : Smaller Work Units
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
el_gallo_azul

Send message
Joined: 29 Nov 13
Posts: 14
Credit: 5,526,173
RAC: 0
Message 47832 - Posted: 22 Dec 2013, 1:38:59 UTC - in response to Message 45478.  

Yes I agree. I came to this forum to request smaller work units. I've got four climateprediction.net work units running at the moment, with a typical elapsed time of 160 hours, with remaining 120 hours.

Most (possibly all) other BOINC projects that I run (currently 7 projects) are usually "ready to report" within less than 2 hours for each work unit.

I saw the comment "is unlikely to be made any smaller", and I accept that and I will continue to plug away with these 4 that I have running, but I believe it would be overall faster and more accurate for climateprediction.net to divide the jobs into smaller chunks (if possible).
ID: 47832 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,053,321
RAC: 4,417
Message 47833 - Posted: 22 Dec 2013, 2:16:13 UTC

I well remember running the old 160 year WU�s. They took forever to finish. I was running them on a 1.2 GHz single core machine with 256 MB of RAM. Average completion time was in excess of 3,300 hours! Running about 20 hours a day they took about 8 or 9 months to finish 1 WU. So don�t complain, 160 hours is nothing.

ID: 47833 · Report as offensive     Reply Quote
Alex Plantema

Send message
Joined: 3 Sep 04
Posts: 126
Credit: 26,363,193
RAC: 0
Message 47834 - Posted: 22 Dec 2013, 15:17:27 UTC
Last modified: 22 Dec 2013, 15:17:52 UTC

@Les: A climate model should not depend on rounding differences between processors; it would be highly unstable.
Re: smaller tasks: It seems easier to me to make more trustworthy checkpoints than splitting models in shorter runs.
@JIM: My longest task took 20974460 seconds (5826 hours or 242.76 days) and was awarded with 52254.72 credits.
ID: 47834 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 47837 - Posted: 22 Dec 2013, 17:19:45 UTC

Hola Gallo Azul

Both the regional Hadam and the global Hadcm models are already divided up, so each task is already part of a much longer climate run. I don't think Andy, our programmer, intends to divide the long climate models into smaller sections. The more pieces each long climate run consists of, the more difficult it is to ensure that every part is processed and the more server storage space is required.

Alex, I think the real problem at the moment is that although the regional Hadam models are usually reliable, too many of the Hadcm global models fail at 25%, 50% etc at the end of each model decade. This stability problem doesn't depend on whether we crunch complete models or slice them up.
Cpdn news
ID: 47837 · Report as offensive     Reply Quote
brown

Send message
Joined: 24 Feb 06
Posts: 10
Credit: 10,142,658
RAC: 0
Message 47841 - Posted: 22 Dec 2013, 19:19:46 UTC - in response to Message 47837.  

Have to agree!
I do not really care if the work goes on forever as long as it remains stable.
There is nothing worse than hundreds of hours of work down the pan because them failing to pass check points. Or having to revert back on a days work for 8 work units because one pulled a hissy fit at 25%.
Anyway to restore one unit and allow the others to continue?
Shame these things were not more stable or at least backed themselves up. :-(
ID: 47841 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 47861 - Posted: 24 Dec 2013, 15:08:05 UTC - in response to Message 47841.  

...Shame these things were not more stable or at least backed themselves up. :-(


In the old days we used to do manual backups of our running models (when they took many months to run, losing a model was a bit of a downer). But with the introduction of hyperthreading / multiple cores, it became very difficult to restore models individually, so nobody bothers anymore.

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 47861 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,053,321
RAC: 4,417
Message 47865 - Posted: 25 Dec 2013, 4:20:51 UTC

I still make backups every 3 or 4 days, usually when I shut down Boinc for some other reason. It would be nice if there was a reasonably easy way to restore individual WU's from a backup without setting all the other running WU�s back to the same point.

ID: 47865 · Report as offensive     Reply Quote
Sebh007

Send message
Joined: 17 Jan 13
Posts: 9
Credit: 8,916,783
RAC: 296
Message 48179 - Posted: 17 Feb 2014, 17:20:06 UTC

Not sure that this is the right place, but can't see anywhere better. I first got into BOINC via CPDN and I'm very happy that my various PCs run CPDN which I believe in wholeheartedly.

Like so many others, I get sad when there are no tasks available and all that lovely computing power is going to waste, so I like to try and do something useful by occupying the CPUs with something else that I consider worthwhile.

My current 'second choice' is malariacontrol.net which has much smaller work units. To try to make sure that I give CPDN the very best chance of running, I have set the preferences to 99% CPDN and 1% malariacontrol.net which I hoped would achieve what I wanted, namely CPDN running the vast majority of the time but any spare time being allocated to malariacontrol.net.

The problem that this causes is that when there are no tasks available from CPDN, the other project loads up a mass of short workunits with relatively short deadlines, and if CPDN is running, then it gets demoted in favour of the short deadline tasks because they become high priority. So, short of suspending them, I can't get CPDN to run (mostly because it has such long deadlines).

Of course if there were smaller work units then deadlines could be shorter and the problem would go away, but as smaller work units are not an option, is there any way to give CPDN its preferred status ie. to always run as long as there is a work unit available to work on, but not waste capacity at the same time?

ID: 48179 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 48181 - Posted: 17 Feb 2014, 22:30:09 UTC - in response to Message 48179.  

I think that the only way to do something like this, is to use the 2 cache settings. (Under Network usage in your account preferences.)

The first one is
Computer is connected to ... which is now thought of as a "High water" mark
and the second is Maintain enough work for an additional, which is now a "Low water" mark

Because the recent batches of work have been of around 5,000 units, they're gone in a few hours, so you'll need to use very small values for those 2 settings.
You could start with 0.2 and 0.1 and see how many tasks that downloads from the other projects. And you'll need to chose projects that have very short task runs.

If you get too many tasks, and the cache takes too long to refill, by the time BOINC starts checking cpdn for work, they could all have come and gone.


Backups: Here
ID: 48181 · Report as offensive     Reply Quote
Sebh007

Send message
Joined: 17 Jan 13
Posts: 9
Credit: 8,916,783
RAC: 296
Message 48183 - Posted: 17 Feb 2014, 23:21:35 UTC - in response to Message 48181.  

Thanks Les - I think!

Sadly either I don't have the two variable you refer to or I just can't find them.

I'm running v7.2.39 and the nearest I can get to what you are talking about is Tools-Computing preferences-network usage where I have two settings that are vaguely related to what you describe: Minimum work buffer and Maximum work buffer. These are on an individual project basis. Are these what you mean?

If so, I assume that I set the CPDN values to a very low Minimum and a 'high' Maximum do I, and the 'other' project to a very low minimum and a barely higher Maximum. Not sure why I'm not seeing field labelled the way that you describe, but perhaps you can enlighten me?

Thanks.
ID: 48183 · Report as offensive     Reply Quote
Sebh007

Send message
Joined: 17 Jan 13
Posts: 9
Credit: 8,916,783
RAC: 296
Message 48184 - Posted: 17 Feb 2014, 23:36:49 UTC - in response to Message 48183.  

Apologies. Found them eventually.

Having thought about it though, is what you suggest really right? Surely I want to be sure that I have plenty of work for CPDN all the time which means that it should connect whenever it wants to (0 or continuously) and have a big buffer of several days work stored - ?30? (after all, work units of 700 hours or more are not that unusual) and the 'other' project should have relatively infrequent connections and minimal number of work units stored, so that should have, say, 0.1 days?

Hope I've got that right!
ID: 48184 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 48186 - Posted: 18 Feb 2014, 0:00:19 UTC

What I'm suggesting is something like this:

Think of it as a buffet, and every now and then something interesting shows up. So instead of loading up with lots of plates of various things while you wait for this "other" item, I suggest that you just get one sandwich, or something similar that's small, And then keep going back at frequent intervals to:

A) Check if the special item is there yet, or
B) Get another small item.

As only a few thousand climate data sets are released at a time, and then only weeks to months apart, you don't want your computer busy with so much work from other projects that it doesn't check cpdn for several days.

PS
The 2 settings that I mentioned are in a section that's universal across ALL projects.
Once you attach to another project, BOINC will communicate what your settings are here, so that the other project(s) will follow those rules

You said in your first post:
I have set the preferences to 99% CPDN and 1% malariacontrol.net


If you were talking about Resource share, (which isn't a percentage, just a proportion), then your BOINC will always look at cpdn first when it comes time for more work.
It will only look elsewhere if it finds that there's nothing here.

And it only takes one task per core from cpdn to keep your computer occupied for days.


ID: 48186 · Report as offensive     Reply Quote
Sebh007

Send message
Joined: 17 Jan 13
Posts: 9
Credit: 8,916,783
RAC: 296
Message 48209 - Posted: 21 Feb 2014, 7:07:48 UTC - in response to Message 48186.  

Perfect! Thanks.
ID: 48209 · Report as offensive     Reply Quote
Previous · 1 · 2

Questions and Answers : Wish list : Smaller Work Units

©2024 climateprediction.net