climateprediction.net home page
Why is Climate Prediction grabbing all the CPU time

Why is Climate Prediction grabbing all the CPU time

Message boards : Number crunching : Why is Climate Prediction grabbing all the CPU time
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 29557 - Posted: 16 Jul 2007, 7:18:22 UTC
Last modified: 16 Jul 2007, 7:33:22 UTC


Since you don\'t want the moderators to answer your question (since we\'re all running CPDN-beta), all I can do is suggest you read the older posts in this thread which answer your question.

But to reiterate for the millionth time... Running CPDN with a resource share too low to finish the model before the deadline *will* cause the boinc manager to panic and hog the computer. Erasing the overall debt will have the wrong effect (since you\'re erasing CPDN\'s debt to the other projects, not the other way around). As stated before, this is the expected and designed behaviour of the Boinc manager. CPDN can only influence the Boinc manager\'s scheduling behaviour indirectly (via the deadline).

Why don\'t you ask about this issue on the Boinc forum (http://boinc.berkeley.edu/dev/)? There is also a more detailed description of the 5.4.x Boinc Manager\'s work scheduler on the Unofficial Boinc Wiki, and an explanation of 5.8.x\'s scheduler on the Boinc forum.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 29557 · Report as offensive     Reply Quote
Profile Pooh Bear 27
Avatar

Send message
Joined: 5 Feb 05
Posts: 465
Credit: 1,914,189
RAC: 0
Message 29560 - Posted: 16 Jul 2007, 10:34:22 UTC

(the resource share for CP is much lower than what I use for other projects, as CP doesn\'t have a fixed deadline....)

True the CP does not NEED to be in by it\'s deadline, BOINC (not CP) enforces them to use a deadline. Since there is a deadline, BOINC will do what it thinks it needs to do. CP is just listening to BOINC. Nothing CP can do about it.

ID: 29560 · Report as offensive     Reply Quote
old_user461194

Send message
Joined: 15 Jul 07
Posts: 1
Credit: 0
RAC: 0
Message 29562 - Posted: 16 Jul 2007, 11:31:05 UTC - in response to Message 29560.  

Just started on CP/BOINC, on a 1.6GHz machine, but am going to have to abort. The model does affect my system performance, and the project would require 6166 hours of dedicated computing to finish. Given how long per day I\'m able to/am prepared to keep the machine up and running, etc., I\'d expect to get through somewhere around 2010.

Sorry, way too big a bite.

ID: 29562 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 29563 - Posted: 16 Jul 2007, 11:54:24 UTC


A slower machine can still do useful climate modelling work :

* Try running the \'slab\' model rather than the \'coupled\' model. The slab model has much smaller requirements for CPU time and memory. You can control which you run (via Your Account / Project Preferences / Set HadSM3 to \'ticked\', and HadCM3 to \'unticked\').

Alternatively, you could run the http://www.apsathome.org/ project, which is doing localised atmospheric physics work which will be used to improve climate models in the future. The work units only take a few hours even on older PCs.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 29563 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 29567 - Posted: 16 Jul 2007, 16:11:24 UTC
Last modified: 16 Jul 2007, 23:08:06 UTC

Hi User

If you disable the screensaver and only look at your globe when you actually want to using the graphics button in boinc manager, you\'ll find that the processing of the model is faster. You may also find that it was the screensaver interfering with your computer\'s performance, not the processing of the model.
Cpdn news
ID: 29567 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 29569 - Posted: 16 Jul 2007, 18:02:08 UTC
Last modified: 16 Jul 2007, 18:02:34 UTC

azwoody,

Do I have enough worthless cobbles to qualify me to make a Moderator comment?

Your ad hominem attack on Les was unwarranted, ill-advised, and rude. Les\' advice helped numerous participants over a lengthy period of time; Les deserves praise, not derision or insults. Please be more circumspect/mature in future posts.

Mike gave you the correct reply.
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 29569 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 29572 - Posted: 16 Jul 2007, 18:54:23 UTC - in response to Message 29376.  
Last modified: 16 Jul 2007, 18:57:25 UTC

This is not \"BS\" and it\'s not a bug --- there is nothing we on CPDN can do other than make (useless) short workunits that would last a day or so such as other projects. In the \"old days\" of BOINC the scheduler was a simple \"round robin\" which (for example) would run us for an hour, then SETI for an hour, then Einstein for an hour if you had the share settings at 33/33/33. Then things got a bit more \"cleverer\" on BOINC scheduling and we get the problems/notions of short & long term \"debt\" etc.

This was all done for what was hoped to be a useful design & good purpose -- however if you run your PC part-time, and CPDN is <50%, it\'s probably going to be pretty useless and get into a \"long-term debt situation\" where it tries to run CPDN for awhile (otherwise it would take 5 years for the workunit to finish).

For CPDN to have \"day long\" workunits would require uploading/downloading about 50MB per day, and synch up tens of thousands of start dumps etc. It would be a pretty pointless task and may result in a lot of \"garbage\" science since there\'s no telling that sending out 160 year-long workunits to 160 different types of computers would be a meaningful, single 160-year climate model run.

And anyway, I don\'t see why people get offended or mad if CPDN runs two months if it all balances in the end. If you\'re a dedicated BOINC/volunteer computing cruncher, aren\'t you in it for the long haul? So what if it\'s on CPDN for two months then SETI for two months then Einstein for two months etc? It just seems we went from (before the scheduler changes) to people screaming about a lack of work, to now screaming there\'s too much work!


ID: 29572 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 29577 - Posted: 16 Jul 2007, 23:32:06 UTC

I\'m sure the boinc people in Berkeley would call it an \'optimised\' scheduler.....and in the longer term it is.
Cpdn news
ID: 29577 · Report as offensive     Reply Quote
EclipseHA

Send message
Joined: 28 Aug 04
Posts: 42
Credit: 1,443,857
RAC: 0
Message 29579 - Posted: 17 Jul 2007, 4:52:24 UTC
Last modified: 17 Jul 2007, 4:58:45 UTC

You folks at CPDN are lost...

The Mods don\'t even run the \"production code\" and provide answers based on theory, but not on how this project ran since the beginning...... And a machine at 1.8 ghz is now slow????????????

I\'ve been running CPDN with the same share (Linux FC3) for years, and even with updating the CC from BOINC, task switching was happening on a \"normal\" basis - that is until I had a WU complete (It did take about a year).

The machine is on 24/7, and its been 25% CPDN and 50% Einstien for a long time (and 750% LHC but there\'s never work there, so it\'s a non-issue....)

Reality is that all worked fine until I got a new CPDN WU a couple months back. Now, that\'s all that crunches! Seems to me it is a CPDN problem, in that the date for \"completion\" is now set in such a way that for a machine that\'s a year or two old (on Linux at least), the only way for BOINC to schedule CPDN is to give it all resources for weeks or months!

You guys seem to assume that everyone is running the latest HW, but you\'re wrong - The mods are ignoring the problem - that is clear.

As I said, I will be more than willing to provide any information you need to debug the problem you\'re having with CPDN NOW consuming all the BOINC time, where it hasn\'t done so for most of the time I\'ve been crunching CPDN.

Pointing me at the main BOINC site doesn\'t help, as your admins need to work with the BOINC developers to solve YOUR problem.

Not a problem for me as your attitude lead me to just abort the WU that\'s sucking all the time!

Hey guys, I\'ve got almost 300K credits with CPDN, and been contributing for a long time.. This isn\'t a \"newbie\" reporting a problem, but someone that has used \"production code\" for a long time....
ID: 29579 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 29580 - Posted: 17 Jul 2007, 6:55:21 UTC
Last modified: 17 Jul 2007, 6:59:08 UTC


All that rudeness misdirected towards Les, and you didn\'t even bother to read the links?

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 29580 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 29582 - Posted: 17 Jul 2007, 12:25:23 UTC - in response to Message 29579.  
Last modified: 17 Jul 2007, 12:39:07 UTC

Azwoody said
The Mods don\'t even run the \"production code\" and provide answers based on theory, but not on how this project ran since the beginning......

Incorrect. The mods, as well as beta-testing and running SAP models have run a lot of BBC models which are the same as the cpdn coupled ocean models. I suggest you look at the stats for MikeMars, Thyme Lawn, Astro, Geophi and Les among others. I\'m not in the same league because until recently I only had the one computer detailed below.
And a machine at 1.8 ghz is now slow????????????

Nobody at cpdn complains about 1.8GHz machines. I still have a 7-year old 1.3GHz computer that ran a 160-year model even though it doesn\'t meet the minimum specs for that type of model. It\'s now running a slab model.

Unfortunately you have your computers hidden so we can\'t check how fast you\'re crunching your cpdn WU. It does help if members making queries about crunching unhide their computers for the duration of the conversation. I have previously made this request in the News thread.

Reality is that all worked fine until I got a new CPDN WU a couple months back. Now, that\'s all that crunches!

It has been explained that the boinc scheduler now works differently. It has also been explained that the BS will do its best in the long term to achieve the exact proportions you have specified.

Seems to me it is a CPDN problem, in that the date for \"completion\" is now set in such a way that for a machine that\'s a year or two old (on Linux at least), the only way for BOINC to schedule CPDN is to give it all resources for weeks or months!

It\'s only a cpdn problem insofar as we have longer workunits than any other project and the BS now copes with this situation in a different way from before. It\'s a multi-project problem insofar as many crunchers from many projects don\'t understand how the BS works and there are frequent questions about it on many project forums.

Cpdn did not design the BS. Boinc in Berkeley did. If crunchers are patient and trust the BS to behave as it is designed to do, it isn\'t a problem at all. Except perhaps for some crunchers who have over-committed their computer/s. The problem of overcommitment was the same under the old version of BS.

Overcommitment isn\'t a problem specific to less-fast computers. Any computer can be overcommitted by its owner.

Cpdn accepts workunits submitted beyond the deadline, but everyone here encourages crunchers to be realistic about what the computer/s they have can do.

You guys seem to assume that everyone is running the latest HW, but you\'re wrong - The mods are ignoring the problem - that is clear.

No we don\'t. Other mods have helped me keep my old computer up and running long beyond its trash-by date. We look up members\' computer specs here and from the two other cpdn forums every day. All we care about is whether the computers have a realistic chance of completing the WUs while the researchers still need them and whether they have enough RAM to avoid model crashes.

As I said, I will be more than willing to provide any information you need to debug the problem you\'re having with CPDN NOW consuming all the BOINC time, where it hasn\'t done so for most of the time I\'ve been crunching CPDN.

It\'s not a bug. Your BS is working as designed.

Pointing me at the main BOINC site doesn\'t help, as your admins need to work with the BOINC developers to solve YOUR problem.

Several of us DO regularly keep an eye on the boinc_dev forum to get a wider perspective.

Not a problem for me as your attitude lead me to just abort the WU that\'s sucking all the time!

It\'s a pity from the point of view of the research that the WU didn\'t complete.

Hey guys, I\'ve got almost 300K credits with CPDN, and been contributing for a long time.. This isn\'t a \"newbie\" reporting a problem, but someone that has used \"production code\" for a long time....

Thank you for your past contribution. If you do one day decide to return your contribution will again be welcomed.

Cpdn welcomes all crunchers who can make a positive contribution to our projects and the forums.

Cpdn news
ID: 29582 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 29584 - Posted: 17 Jul 2007, 14:25:14 UTC

And for the sake of balance, may I say a word on behalf of the sometimes-silent majority. For the vast majority of crunchers, the mods do a fine job of helping us up to and beyond our expectations. When we see a firestorm of debate, we aren\'t always quick enough to say so. But thanks. And for the courtesy and patience you bring to bear on our sometimes complex and difficult-to-handle issues.
ID: 29584 · Report as offensive     Reply Quote
EclipseHA

Send message
Joined: 28 Aug 04
Posts: 42
Credit: 1,443,857
RAC: 0
Message 29614 - Posted: 19 Jul 2007, 4:06:36 UTC

Hey folks, I\'m giving you real world data, for a LINUX box..

No where in this thread has \"Linux\" even been acknowledged. We also got mods that have an RAC=0 on the production code, and while they may be running the beta, the term \"linux\" again is missing.

Here\'s the full story:

Been running CPDN for years on Linux and windows, and all has has been fine.

Been updating the Boinc CC regularly, and all has been fine.

Been mixing CPDN, Einstien, etc, with CPDN at about 25%, and all has been fine.

I had a CPDN WU finish a couple months back on the LINUX box <-note \"LINUX\".

The new WU on LINUX started using all the available CP about a week after it started. The only way new work from other projects would be downloaded was to suspend CPDN. I tried all that was suggested (resetting stuff in client_state.xml), and end result was that CPDN would dominate the machine within a few days.

Ok, so what changed in my config? The HW? nope... The OS? nope... The Boinc CC? nope... A new CPDN WU? Yes!!!!!!

The LINUX box is running FC3, and is my webserver. It\'s been constant for a long, long time, including the level of hits I get.

Seems with the new CPDN WU, something has changed. Could it be the deadline estimate has \"shrunk\" since the last CPDN WU I\'d processed on this LINUX box? (early 2006 was when I got it, IIRC), or could it be a change in the LINUX application? Or the way the \"benchmark code\" plays with the CPDN app...

Anyway, it\'s not BS that this is what I\'ve seen.

What I\'m sensing from you all is that you really don\'t know what\'s happening, and I think LINUX is a missing variable... Any mods running current production code on LINUX?

As I\'ve said, I\'ll provide you any additional information you may need, but the \"CPDN Centric\" behavior started with the last WU I got on this LINUX box - and that\'s not BS....

I\'ve killed the CPDN WU on the LINUX box, and plan on crunching to completion the CPDN WU\'s I have on windows (where everything is fine).
ID: 29614 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 29615 - Posted: 19 Jul 2007, 5:44:43 UTC
Last modified: 19 Jul 2007, 5:46:58 UTC

You may be talking about Linux, but this is NOT the Linux forum.
That\'s on the Questions and Answers board.

And no, it seems that NO moderator is running production Linux models.

But so what? The job of the moderators is to police the boards and remove anything that is not Child Friendly. It\'s NOT to be experts in all things cpdn, BOINC, or any combination of matters about DC.

Moderators were selected from long time cpdn crunchers, and these same long time cpdn crunchers were/are also the people who have been answering questions. For a long time.

What you need is someone who is doing a lot of crunching on other projects as well, and who therefore may know what the problem and answer is for multi project crunchers.
And also, it will have to be someone who also looks at this site regularly.

So you\'re best bet would be to ask about it on the BOINC/dev boards, where people from all over visit.

The only thing that I know about the problem is that the newer versions of the CC work totally differently to early versions, and anyone running a project with very long WUs, such as cpdn, and who only allocates a small percentage of time to cpdn, (perhaps less than 50%-60%), will probably be stuck on cpdn for months.
Which is something that I expect multi project people to know, just from keeping up to date with BOINC/dev postings about changes. (Such as the reason for the sudden withdrawal of 5.10.7)

ID: 29615 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 29620 - Posted: 19 Jul 2007, 12:52:18 UTC
Last modified: 19 Jul 2007, 13:02:49 UTC

I\'m sure I\'ve never seen anything on any of the forums to suggest that long term debt is conceptually different or handled in a different way on Windows, Macs or Linux.

There are already several threads about this on the boinc_dev forum where I am sure that Jorden and other Linux users will be delighted to help. Just tell him that Mo and Les referred you across. (There are Linux users on this forum giving each other excellent help in the appropriate section.)

http://boinc.berkeley.edu/dev/index.php
Further info at
http://boincfaq.mundayweb.com/index.php?language=1&view=168&sessionID=ebf3861a4daee36ebcc0de99a4b31b6a
http://boinc.ssl.berkeley.edu/sched.php
Cpdn news
ID: 29620 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 29623 - Posted: 19 Jul 2007, 14:42:47 UTC - in response to Message 29620.  

I\'m going to post on the boinc developers list that perhaps projects with long workunits running in tandem with other projects should probably use a simpler \"round-robin\" algorithm for scheduling runs.

ID: 29623 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 29626 - Posted: 19 Jul 2007, 15:21:43 UTC

Good idea. The current BS can\'t take into account that cpdn is the only project that accepts results past their deadline. Nor would we want the BS to take decisions about how far past the deadline is acceptable to Oxford.

If Berkeley can\'t offer a choice of BSs for a while, could the cpdn deadline be lengthened? That would surely make climate models demand a bit less time when they\'re first started by multi-project crunchers?


Cpdn news
ID: 29626 · Report as offensive     Reply Quote
EclipseHA

Send message
Joined: 28 Aug 04
Posts: 42
Credit: 1,443,857
RAC: 0
Message 29633 - Posted: 20 Jul 2007, 0:49:14 UTC - in response to Message 29626.  

Good idea. The current BS can\'t take into account that cpdn is the only project that accepts results past their deadline. Nor would we want the BS to take decisions about how far past the deadline is acceptable to Oxford.

If Berkeley can\'t offer a choice of BSs for a while, could the cpdn deadline be lengthened? That would surely make climate models demand a bit less time when they\'re first started by multi-project crunchers?



Thank you mo.v and Carl. Extending the deadline (as it\'s not really used anyway) could really help this problem.

As far as the CC being a bit smarter, it wouldn\'t have to be CPDN specific. It could, for example be based simply on the size of the WU. If the WU will take more than 30(?) days from beginning to end, round robin it, otherwise use the normal scheduling.

I know, like many here, I started CPDN as a \"fallback\" project - when there were only 4-5 projects, there was time work was scarce. It had a low resource share, and would crunch full bore when the other projects had issues. But it still would get a slice at other times. Now I have a share of 25% to 60% for CPDN, with CPDN getting a much bigger chunk than some projects, but I\'m not willing to have CPDN only machines.
ID: 29633 · Report as offensive     Reply Quote
old_user431016

Send message
Joined: 12 Feb 07
Posts: 2
Credit: 12,690
RAC: 0
Message 29635 - Posted: 20 Jul 2007, 6:16:16 UTC

Well I have just returned to climateprediction.net because I noticed they have units that will now run on a MacPro Intel box. I immediately noticed that while I only allowed enough resource usage for climate to run on one \'core\' it just kept downloading more and more jobs with estimated run times of 1660 hours and a completion date in 2008. However, it would start a new job every once in a while so I now have more than one with time spent on it. If I allow things to continue I will have no climate jobs that will complete before the deadline unless climate takes over all eight cores. That seems to be what this thread is alluding to. I had no intention of letting one project take over my machine. When jobs have such long duration times the idiosyncrasies of BOINC step in causing all sorts of funnies to happen like no status being shown yet running full out on the core. It makes for doing maintenance on the system difficult if one does not want to loose hours of processing. I will monitor this for a while. I have set BOINC to accept no new tasks from climate to stop the loading of the que with climate jobs.
ID: 29635 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 29636 - Posted: 20 Jul 2007, 7:21:18 UTC


Yes, that\'s a good idea : most of us use \'no more tasks\' to control the number of jobs (I think it\'s in one of the READMEs as a tip).

The problem is that if something suspends a model, Boinc decides it needs more work, and downloads another one.

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 29636 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Why is Climate Prediction grabbing all the CPU time

©2024 climateprediction.net