Smaller Work Units

Author	Message
el_gallo_azul Send message Joined: 29 Nov 13 Posts: 14 Credit: 5,526,173 RAC: 0	Message 47832 - Posted: 22 Dec 2013, 1:38:59 UTC - in response to Message 45478. Yes I agree. I came to this forum to request smaller work units. I've got four climateprediction.net work units running at the moment, with a typical elapsed time of 160 hours, with remaining 120 hours. Most (possibly all) other BOINC projects that I run (currently 7 projects) are usually "ready to report" within less than 2 hours for each work unit. I saw the comment "is unlikely to be made any smaller", and I accept that and I will continue to plug away with these 4 that I have running, but I believe it would be overall faster and more accurate for climateprediction.net to divide the jobs into smaller chunks (if possible). ID: 47832 · Reply Quote

JIM Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,073,265 RAC: 1,555	Message 47833 - Posted: 22 Dec 2013, 2:16:13 UTC I well remember running the old 160 year WU�s. They took forever to finish. I was running them on a 1.2 GHz single core machine with 256 MB of RAM. Average completion time was in excess of 3,300 hours! Running about 20 hours a day they took about 8 or 9 months to finish 1 WU. So don�t complain, 160 hours is nothing. ID: 47833 · Reply Quote

Alex Plantema Send message Joined: 3 Sep 04 Posts: 126 Credit: 26,363,193 RAC: 0	Message 47834 - Posted: 22 Dec 2013, 15:17:27 UTC Last modified: 22 Dec 2013, 15:17:52 UTC @Les: A climate model should not depend on rounding differences between processors; it would be highly unstable. Re: smaller tasks: It seems easier to me to make more trustworthy checkpoints than splitting models in shorter runs. @JIM: My longest task took 20974460 seconds (5826 hours or 242.76 days) and was awarded with 52254.72 credits. ID: 47834 · Reply Quote

mo.v Volunteer moderator Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0	Message 47837 - Posted: 22 Dec 2013, 17:19:45 UTC Hola Gallo Azul Both the regional Hadam and the global Hadcm models are already divided up, so each task is already part of a much longer climate run. I don't think Andy, our programmer, intends to divide the long climate models into smaller sections. The more pieces each long climate run consists of, the more difficult it is to ensure that every part is processed and the more server storage space is required. Alex, I think the real problem at the moment is that although the regional Hadam models are usually reliable, too many of the Hadcm global models fail at 25%, 50% etc at the end of each model decade. This stability problem doesn't depend on whether we crunch complete models or slice them up. Cpdn news ID: 47837 · Reply Quote

brown Send message Joined: 24 Feb 06 Posts: 10 Credit: 10,142,658 RAC: 0	Message 47841 - Posted: 22 Dec 2013, 19:19:46 UTC - in response to Message 47837. Have to agree! I do not really care if the work goes on forever as long as it remains stable. There is nothing worse than hundreds of hours of work down the pan because them failing to pass check points. Or having to revert back on a days work for 8 work units because one pulled a hissy fit at 25%. Anyway to restore one unit and allow the others to continue? Shame these things were not more stable or at least backed themselves up. :-( ID: 47841 · Reply Quote

MikeMarsUK Volunteer moderator Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0	Message 47861 - Posted: 24 Dec 2013, 15:08:05 UTC - in response to Message 47841. ...Shame these things were not more stable or at least backed themselves up. :-( In the old days we used to do manual backups of our running models (when they took many months to run, losing a model was a bit of a downer). But with the introduction of hyperthreading / multiple cores, it became very difficult to restore models individually, so nobody bothers anymore. I'm a volunteer and my views are my own. News and Announcements and FAQ ID: 47861 · Reply Quote

JIM Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,073,265 RAC: 1,555	Message 47865 - Posted: 25 Dec 2013, 4:20:51 UTC I still make backups every 3 or 4 days, usually when I shut down Boinc for some other reason. It would be nice if there was a reasonably easy way to restore individual WU's from a backup without setting all the other running WU�s back to the same point. ID: 47865 · Reply Quote

Sebh007 Send message Joined: 17 Jan 13 Posts: 9 Credit: 8,916,783 RAC: 296	Message 48179 - Posted: 17 Feb 2014, 17:20:06 UTC Not sure that this is the right place, but can't see anywhere better. I first got into BOINC via CPDN and I'm very happy that my various PCs run CPDN which I believe in wholeheartedly. Like so many others, I get sad when there are no tasks available and all that lovely computing power is going to waste, so I like to try and do something useful by occupying the CPUs with something else that I consider worthwhile. My current 'second choice' is malariacontrol.net which has much smaller work units. To try to make sure that I give CPDN the very best chance of running, I have set the preferences to 99% CPDN and 1% malariacontrol.net which I hoped would achieve what I wanted, namely CPDN running the vast majority of the time but any spare time being allocated to malariacontrol.net. The problem that this causes is that when there are no tasks available from CPDN, the other project loads up a mass of short workunits with relatively short deadlines, and if CPDN is running, then it gets demoted in favour of the short deadline tasks because they become high priority. So, short of suspending them, I can't get CPDN to run (mostly because it has such long deadlines). Of course if there were smaller work units then deadlines could be shorter and the problem would go away, but as smaller work units are not an option, is there any way to give CPDN its preferred status ie. to always run as long as there is a work unit available to work on, but not waste capacity at the same time? ID: 48179 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 48181 - Posted: 17 Feb 2014, 22:30:09 UTC - in response to Message 48179. I think that the only way to do something like this, is to use the 2 cache settings. (Under Network usage in your account preferences.) The first one is Computer is connected to ... which is now thought of as a "High water" mark and the second is Maintain enough work for an additional, which is now a "Low water" mark Because the recent batches of work have been of around 5,000 units, they're gone in a few hours, so you'll need to use very small values for those 2 settings. You could start with 0.2 and 0.1 and see how many tasks that downloads from the other projects. And you'll need to chose projects that have very short task runs. If you get too many tasks, and the cache takes too long to refill, by the time BOINC starts checking cpdn for work, they could all have come and gone. Backups: Here ID: 48181 · Reply Quote

Sebh007 Send message Joined: 17 Jan 13 Posts: 9 Credit: 8,916,783 RAC: 296	Message 48183 - Posted: 17 Feb 2014, 23:21:35 UTC - in response to Message 48181. Thanks Les - I think! Sadly either I don't have the two variable you refer to or I just can't find them. I'm running v7.2.39 and the nearest I can get to what you are talking about is Tools-Computing preferences-network usage where I have two settings that are vaguely related to what you describe: Minimum work buffer and Maximum work buffer. These are on an individual project basis. Are these what you mean? If so, I assume that I set the CPDN values to a very low Minimum and a 'high' Maximum do I, and the 'other' project to a very low minimum and a barely higher Maximum. Not sure why I'm not seeing field labelled the way that you describe, but perhaps you can enlighten me? Thanks. ID: 48183 · Reply Quote

Sebh007 Send message Joined: 17 Jan 13 Posts: 9 Credit: 8,916,783 RAC: 296	Message 48184 - Posted: 17 Feb 2014, 23:36:49 UTC - in response to Message 48183. Apologies. Found them eventually. Having thought about it though, is what you suggest really right? Surely I want to be sure that I have plenty of work for CPDN all the time which means that it should connect whenever it wants to (0 or continuously) and have a big buffer of several days work stored - ?30? (after all, work units of 700 hours or more are not that unusual) and the 'other' project should have relatively infrequent connections and minimal number of work units stored, so that should have, say, 0.1 days? Hope I've got that right! ID: 48184 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 48186 - Posted: 18 Feb 2014, 0:00:19 UTC What I'm suggesting is something like this: Think of it as a buffet, and every now and then something interesting shows up. So instead of loading up with lots of plates of various things while you wait for this "other" item, I suggest that you just get one sandwich, or something similar that's small, And then keep going back at frequent intervals to: A) Check if the special item is there yet, or B) Get another small item. As only a few thousand climate data sets are released at a time, and then only weeks to months apart, you don't want your computer busy with so much work from other projects that it doesn't check cpdn for several days. PS The 2 settings that I mentioned are in a section that's universal across ALL projects. Once you attach to another project, BOINC will communicate what your settings are here, so that the other project(s) will follow those rules You said in your first post: I have set the preferences to 99% CPDN and 1% malariacontrol.net If you were talking about Resource share, (which isn't a percentage, just a proportion), then your BOINC will always look at cpdn first when it comes time for more work. It will only look elsewhere if it finds that there's nothing here. And it only takes one task per core from cpdn to keep your computer occupied for days. ID: 48186 · Reply Quote

Sebh007 Send message Joined: 17 Jan 13 Posts: 9 Credit: 8,916,783 RAC: 296	Message 48209 - Posted: 21 Feb 2014, 7:07:48 UTC - in response to Message 48186. Perfect! Thanks. ID: 48209 · Reply Quote