climateprediction.net home page
sulphur seems slower than slab

sulphur seems slower than slab

Message boards : Number crunching : sulphur seems slower than slab
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile old_user85254

Send message
Joined: 27 Jun 05
Posts: 74
Credit: 199,198
RAC: 0
Message 18070 - Posted: 11 Dec 2005, 20:55:38 UTC - in response to Message 18063.  

River
No offence intended.
I\'m just patiently saying that the projects requirements are what they are, and volunteers have to live with that.
CPDN is not open ended like SETI, etc. I think it may have been envisaged as having a 5 year life, but I\'m not even sure when that period started. It wasn\'t even clear that funding would be available for experiment 2 until about a third of the way into this year. But, (on the community forum, where most discussion takes place), one of the project people said that it had been obtained, along with funding for another persion for the team, and funding for a REALLY big, \'high resolution\', sub-project.
Which may or may not be divided up into small bits as you would prefer. No other details have come to light, possibly because of the intense work on spinup, (which I\'m now running), getting ready for exp 2, and work on the BBC project.

As you are using a large number of computers, one possiblility that would help you, is if it ever becomes possible to use cluster computing. This gets talked about now and then, but it may need a 64 bit os to work.
Carl said that the idea was possible because of the super computer origins of the programs we are running.

If you do decide to leave, keep checking back. Something may turn up. But finding where it is being talked about is the problem.



ID: 18070 · Report as offensive     Reply Quote
Profile old_user85254

Send message
Joined: 27 Jun 05
Posts: 74
Credit: 199,198
RAC: 0
Message 18071 - Posted: 11 Dec 2005, 21:26:21 UTC - in response to Message 18063.  
Last modified: 11 Dec 2005, 21:34:35 UTC

River
No offence intended.

Thanks.


I\'m just patiently saying that the projects requirements are what they are, and volunteers have to live with that.


and I guess I am saying the projects requirements have just increased a factor of three without warning and without thanks to those who were welcome a month ago.

When we say \'my box can\'t cope with this\' it feels churlish when I am told to like it or leave.

Ir does seem to me that a three-fold increase in the minimum acceptable committment to the project should be accompanied with, at least, a comment from the project recognising the fact that some users will inevitably therefore have to leave.

The project\'s requirements are what they are - but they are not what they were a month ago.


CPDN is not open ended like SETI, etc. I think it may have been envisaged as having a 5 year life, but I\'m not even sure when that period started. It wasn\'t even clear that funding would be available for experiment 2 until about a third of the way into this year.


Good point. And I guess I am saying that a separate experiment will be bound to have different requirements, and therefore it is unfair to simply port everyone across from the old experiment assuming they will accept the new requirements. Some kind of user opt in, or some kind of automated selection of the faster boxes was needed.

Moore\'s law seems to apply to science - every three years we can expect the computing needs of the scientists to quadruple, and looked at in that light the increase from slab to sulphur is just what we\'d expect.

I hope CPDN get a lot more grants in future; maybe they will need even faster boxes. If so, I hope the project team will put some thought into the responses from this time round, and figure out a more tactful way to break the news that the slower boxes are no longer considered helpful.

What is exciting and new to those who can cope, is the end of participation for those who can\'t. And \"parting is such sweet sorrow\" as the bard put it...

...

If you do decide to leave, keep checking back.


I will, thanks

Something may turn up. But finding where it is being talked about is the problem.


Which is why my ideal solution is to have a preference setting saying I will accept work up to XXXX hours predicted run time based on my benchmarks. Then I can stay connected forever, accept that I may not get any work, but then one day I\'ll look at the GUI and say \"hey, I\'m helping CPDN save the planet again!\".

But yes, I will keep checking back every few months or so.
River~~
ID: 18071 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 18074 - Posted: 11 Dec 2005, 22:08:47 UTC
Last modified: 11 Dec 2005, 22:20:10 UTC

How about joining us on the community forum, <a href=\"http://www.climateprediction.net/board/index.php\"> here?</a>
Even if you stop crunching, you can still read and discuss.
To post, you have to sign up for the board. Sorry.
And there was some discussion about the increase in model sizes ages ago. With a recruiting thread for spinup, which has ended up as a 10 page discussion. So far.

Currently being given a lot of attention, is the massive explosions in Hemel Hempstead. One of our crunchers is 2.5 miles due west.
This is in \"Movies, fun, etc.\" (Which is for misc items.) It\'s already up to 2 pages.

edit
And don\'t forget to read my post <a href=\"http://www.climateprediction.net/board/viewtopic.php?t=3376\"> here,</a> about why really powerfull computers may not be a good thing.

ID: 18074 · Report as offensive     Reply Quote
old_user56785
Avatar

Send message
Joined: 23 Feb 05
Posts: 55
Credit: 240,119
RAC: 0
Message 18075 - Posted: 11 Dec 2005, 22:54:30 UTC - in response to Message 18071.  
Last modified: 11 Dec 2005, 23:15:13 UTC

But yes, I will keep checking back every few months or so.
River~~


~~

Rome is not built in a day....

It takes time to convince people of needed changes. especially if they had one of those sleepless nights (guess I\'m right Les)
With the RAC you have at the moment it would be a loss for CPDN to see you go and I know my machines will go the same way the next change. (Coupled spinup alphas require 3.2G)

As a short term solution I see is to get your machines to max specs, provided you have the resources and authorisation to do that.

For the long term Carl has to come up with a parallel cpu/machine version for 32 bit. If not than I can predict that CPDN is going to lose quite a lot of participants and crunchers. Recently a lot of crunchers have bought new, top of the range machines, but in the pace it is going they soon have to buy mini-supercomputers.

The general computer user would not require a state of the art machine to do the work they have to do. Don\'t expect this user to buy such a machine or machines just for CPDN. Besides this, if this was the case than it would only move the cost for hardware from science to the public sector, including the downside that it is probably less cost and energy effective to crunch on all those public machines.

ID: 18075 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,403,322
RAC: 5,085
Message 18077 - Posted: 11 Dec 2005, 23:27:54 UTC - in response to Message 18075.  

...I know my machines will go the same way the next change. (Coupled spinup alphas require 3.2G)

Yes, but we\'re making assumptions here. The current spinup supposedly requires a 3.2 GHz P4 or equivalent. But that is only because it is 200 years, and they need the oceans pretty fast. Experiment 2 uses the coupled model, but only needs to run 50 years. Sulphur model right now is 75 years. So experiment 2 will likely take about the same time as a sulphur model (less years, fewer optimizations, and additional (albeit relatively fast) ocean steps).
For the long term Carl has to come up with a parallel cpu/machine version for 32 bit. If not than I can predict that CPDN is going to lose quite a lot of participants and crunchers. Recently a lot of crunchers have bought new, top of the range machines, but in the pace it is going they soon have to buy mini-supercomputers.

One would think a parallel version of the model would work okay, and a 64 bit version, so there are potential speedups there. But the nice optimizations that Carl/Tolu compiled into the latest version of sulphur won\'t be there since they can\'t get the ocean stable with those optimizations.
ID: 18077 · Report as offensive     Reply Quote
old_user56785
Avatar

Send message
Joined: 23 Feb 05
Posts: 55
Credit: 240,119
RAC: 0
Message 18082 - Posted: 12 Dec 2005, 2:55:13 UTC - in response to Message 18077.  

Geophi,
Thanks, indeed you remind me that 3.2G was due to time constraints.

One would think a parallel version of the model would work okay, and a 64 bit version, so there are potential speedups there. But the nice optimizations that Carl/Tolu compiled into the latest version of sulphur won\'t be there since they can\'t get the ocean stable with those optimizations.

I have been scetching a long term solution. I know Carl/Tolu have encountered problems, but have they given up on it (already?)

Possibly it works well on 32 bits or running ocean on an networked machine, say sort of asynchrone but controlled by the main machine.

Creative minds, create solutions. Frustrated minds throw up restrictions.
ID: 18082 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,403,322
RAC: 5,085
Message 18084 - Posted: 12 Dec 2005, 4:19:08 UTC - in response to Message 18082.  

I have been scetching a long term solution. I know Carl/Tolu have encountered problems, but have they given up on it (already?)

Already? Although the compiler is complex, there are only so many options. And the result (instability with SSE2 optimizations turned on) was not unexpected. Others before Carl had seen the same thing trying to run a coupled model on a PC. It\'s one of the reasons why the original model was 64bit.

From a post on the spinup forum...
Basically the past 4-6 weeks I have uncovered a lot of serious issues pertaining to running a coupled model as a 32-bit job a la our old slab model. The main problem is the coupling between the atmosphere & ocean is very sensitive, and the atmosphere works fine in 32-bit but the ocean wants 64-bit; so when you couple that all hell can break loose! Most annoyingly things seemed to drift and go unstable but you wouldn\'t see it until timestep 20,000 or so!

But we seem to have worked these issues out OK, the problem is in the higher optimizations such as the \"-ax\" options on the Intel Fortran compiler to use SSE2 extensions, Pentium IV stuff etc, will truncate things enough on the 32-bit floats so that eventually it wrecks things. So we are not using the \"-ax\" optimizations now; although we are able to use the version 9 compiler which seems pretty fast anyway.


Possibly it works well on 32 bits or running ocean on an networked machine, say sort of asynchrone but controlled by the main machine.

Well, you\'d have to sketch that out for me. The way the model runs is one day of atmosphere, then the same day of ocean, next day of atmosphere, next day of ocean, etc. It\'s done that way for speed purposes.

Creative minds, create solutions. Frustrated minds throw up restrictions.

True, but then again, the creative person coming up with the solution likely should have an in depth knowledge of compilers, and the intricacies of coupled climate models.
ID: 18084 · Report as offensive     Reply Quote
Profile old_user85254

Send message
Joined: 27 Jun 05
Posts: 74
Credit: 199,198
RAC: 0
Message 18089 - Posted: 12 Dec 2005, 7:43:40 UTC - in response to Message 18075.  
Last modified: 12 Dec 2005, 7:58:30 UTC

As a short term solution I see is to get your machines to max specs, provided you have the resources and authorisation to do that.


Neither. Most of the sub-GHz boxes are mine but can\'t be stretched to make sulphur viable, in my opinion. A couple of the subGHz boxes belong to friends and I can\'t presume to occupy a firend\'s computer 24/7 for half a year ahead. The 2.8 GHz boxes belong to a local charity and CPDN is running with the permission of one individual who has the discretion to allow it but the organization would not contemplate running those boxes outside the working day. And I won\'t ask for fear of rocking the boat and losing what we already have from them.


For the long term Carl has to come up with a parallel cpu/machine version for 32 bit.


I\'m with Les on this one - if the new science needs 64 bit or needs super-minis then that is what the new science needs. You can only break down a given computer task into so many chunks before it goes unstable and science that borders on the unstable is no good to anyone.

To take an extreme example, no amount of parallelization could get slab running on an infinite network of ZX spectrums. You might get the code to run but the results would be pure noise.

So if sulphur and the oceans need the faster boxes, my suggestion is that the scientists also find some different science that can usefully be done on the slower boxes and alongside the sulphur and ocean runs. One option might be to run yet more slabs in the short term and in the longer term to find other interesting questions that can be split into slab sized chunks.

Running two sets of science at once, one for fast boxes and one for slower boxes, will make the best use of the resources the project already has. It entails (sorry to keep coming back to this) some kind of selection process between boxes, whether it is user driven by prefs or automated via benchmarks and %on stats.

If not than I can predict that CPDN is going to lose quite a lot of participants and crunchers.


Agreed - and your comment suggests a lever for funding apps - \"we have a huge computing resource that is zero cost to public funds, and which we will lose if it is not re-deployed at the end of the slab runs\" could attract funding for a small, separate, research grant - all the scientists need to do is to identify some important question that could be answered in such a proposal.

And the beauty of BOINC is that even if those boxes are lost to CPDN, very few will be lost to science.

Predictor, for example, can do useful work on boxes too slow for the slab model (though it does need 128M ram). And in a year or so Orbit will be online and current estimates are that orbit WU will be very short and so very suitable for slower boxes. Orbit might be totally unnecessary, but if it finds something it would have more [pun] impact on the planet [/pun] than CPDN. If Predictor find a cure for mad cow disease or cancer then those potential benefits are not trivial either. Einstein & LHC benefit human curiosity and you can see from my stats that I personally rate that as important too. And there are now other projects on the BOINC front page that will attract other CPDN retirees.


ID: 18089 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 18090 - Posted: 12 Dec 2005, 8:52:38 UTC

The fast machines are needed for the spinup simply because of their long, 200 year run, which was estimated to take 4 months on a P4 3.2G.
When it started there was plenty of time. But the models weren\'t stable. By the time that the deadline was a bit over 4 months away, Carl thought that he had cracked it, and started asking for volunteers willing to run 24/7 on fast computers. And still the models were unstable. By the time he had it worked out, there was only a bit over 2 months to do the 4 month run.
The headaches will be for the researchers as to how much of the run will suffice to get started, although I suspect that they may use the data from the simpler data sets, which have very few of their parameters adjusted. Then, when the more complex runs start finishing, they can generate more data sets using these more complex runs. Just my idle musing on the problem.
BUT! I also think that the coupled ocean runs, while more complex, will not take much more time to complete than a sulphur model, and that the time to completion will be long, perhaps the usual year.
Which means that if you can dedicate a computer to cpdn for a few months with nothing else running, they won\'t be a problem.
It\'s just the testers who need very fast, dedicated machines.
We\'ll see come February.

ID: 18090 · Report as offensive     Reply Quote
old_user1607

Send message
Joined: 26 Aug 04
Posts: 19
Credit: 446,376
RAC: 0
Message 18137 - Posted: 13 Dec 2005, 13:37:13 UTC


Given the extra length required for these sulphur models and the termination of slab model generation, wouldn\'t it make sense to update the technical requirements page. I don\'t believe that an 800Mhz machine will be able to finish one before the deadline (based on the fact that I have a 1Ghz box that is estimating exceeding the deadline by a few days or so). Or perhaps an update to provide model specific requirements?

ID: 18137 · Report as offensive     Reply Quote
old_user56785
Avatar

Send message
Joined: 23 Feb 05
Posts: 55
Credit: 240,119
RAC: 0
Message 18153 - Posted: 13 Dec 2005, 17:03:36 UTC - in response to Message 18137.  


Given the extra length required for these sulphur models and the termination of slab model generation, wouldn\'t it make sense to update the technical requirements page. I don\'t believe that an 800Mhz machine will be able to finish one before the deadline (based on the fact that I have a 1Ghz box that is estimating exceeding the deadline by a few days or so). Or perhaps an update to provide model specific requirements?



Carl has never said that the Slabs where terminated, but he has stopped Slabs, because science wants more Sulphur-results coming their way.

I have asked the status on slabs, but I\'m afraid Carl missed that question.

As \'Time to complete\' is a bit dodgy nowadays one has to run the model for a while and calculate end-time from the progress made.
As indication: on a 1Gig PIII I expect a sulphur to complete within 200 days.
(this 24/7 and on wintel32)
((Don\'t like to say it, but it might take longer on the same box under Linux))

If Slabs are not needed anymore and science can\'t find shorter work a la slab, than the \'requirements page\' should indeed be altered.

-800MHz till 1Gig would be the minimum for 24/7 crunchers.
-Faster speed machines would be required for non 24/7 running machines.

Personally, I would also add the requirement/advise to make regular backups, run \'Network activity suspended\' and hint to be carefull with other cpu intensive applications
ID: 18153 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 18168 - Posted: 13 Dec 2005, 22:36:51 UTC - in response to Message 18153.  

I have the situation where a sulfur model has put one of my comptuers into EDF:

2005-11-27 06:55:21 [climateprediction.net] Restarting result 47wx_200297153_0 using sulphur_cycle version 4.19
2005-11-27 06:55:21 [climateprediction.net] Restarting result 1txs_000106329_1 using hadsm3 version 4.13
2005-11-27 06:55:22 [---] Suspending work fetch because computer is overcommitted.
2005-11-27 06:55:22 [---] Using earliest-deadline-first scheduling because computer is overcommitted.


I have like a -1,800,000 seconds built up because of this. Not sure if I am going to let that stand or not yet ... But, ya gotta be careful ...

ID: 18168 · Report as offensive     Reply Quote
Profile old_user85254

Send message
Joined: 27 Jun 05
Posts: 74
Credit: 199,198
RAC: 0
Message 18268 - Posted: 16 Dec 2005, 7:34:09 UTC - in response to Message 18168.  

I have the situation where a sulfur model has put one of my comptuers into EDF:

...using sulphur_cycle version 4.19
...


I have like a -1,800,000 seconds built up because of this. Not sure if I am going to let that stand or not yet ... But, ya gotta be careful ...


There was a suggestion to cancel sulphur wu using app version before 4.22 -- but I don\'t know if this was endorsed by the project team or just thrown in by a participant and I can\'t find th epost I was thinking of -- would Carl or Tolu comment please?
ID: 18268 · Report as offensive     Reply Quote
Profile old_user85254

Send message
Joined: 27 Jun 05
Posts: 74
Credit: 199,198
RAC: 0
Message 18270 - Posted: 16 Dec 2005, 7:51:46 UTC - in response to Message 18137.  
Last modified: 16 Dec 2005, 7:52:15 UTC


Given the extra length required for these sulphur models and the termination of slab model generation, wouldn\'t it make sense to update the technical requirements page.


Agree absolutely. If 800 is a sensible limit for slab then 2GHz would be about right for sulphur

I don\'t believe that an 800Mhz machine will be able to finish one before the deadline (based on the fact that I have a 1Ghz box that is estimating exceeding the deadline by a few days or so). Or perhaps an update to provide model specific requirements?


Depends if you want to be safe in all cases, or to avoid excluding machines that can do the work. Boxes at the same nominal speed vary cosiderably in their throughput. My 700 MHz PIII does over twice the throughout of my 700 MHz Celeron for example, and AMD always outdo Intel at the same clock speed.

The 800 limit seems to have been chosen so that all 800 boxes were very comfortable with slab, and the equivalent limit for sulphur might be around 2GHz.

I would suggest the following for sulphur
a) advise a 2GHz limit
b) automatically enforce a 1GHz limit so that boxes slower than that do not get given sulphur WU (this project has never enforced its advice in the past but BOINC does offer the facility to do so)
ID: 18270 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 18276 - Posted: 16 Dec 2005, 8:49:19 UTC

All of the FAQ / tech info / front end of the project, were written in pre-BOINC days.
This info is still applicable to slab, as it used by the OU course, and will be for years to come. And Win 98SE still works there.
As for the 800Meg limit, people still tried running on machines as slow as 231Mhz.

Another problem is that some people don\'t allocate much time to cp.
One person took well over a year on a fairly fast computer. Another, a similar time on a very slow machine, but running 24/7.
Usefull results can be produced by \'slow\' computers, so I don\'t see a need to complicate things more than they are. It\'s more necessary to know WHY the limits are there.
But it\'s a good subject for discussion. :)

ID: 18276 · Report as offensive     Reply Quote
Profile old_user85254

Send message
Joined: 27 Jun 05
Posts: 74
Credit: 199,198
RAC: 0
Message 18281 - Posted: 16 Dec 2005, 9:20:36 UTC - in response to Message 18276.  

All of the FAQ / tech info / front end of the project, were written in pre-BOINC days.


**Exactly**

And when increasing the minimum level of committment, the project *needs*, in my opinion, to update the words the describe the minimum committment


This info is still applicable to slab, as it used by the OU course, and will be for years to come. And Win 98SE still works there.

But on the BOINC incarnation of the project, slab is currently not being offered, so it makes no sense to still publish advice based on slab.

By all means keep the classic advice for the classic platform, I agree that would be appropriate, but surely it is time for some separate words of advice for BOINC participants now that the minimum practical hardware is so different between classic-slab and BOINC-sulphur?


As for the 800Meg limit, people still tried running on machines as slow as 231Mhz.


I\'ve successfully slabbed on 500MHz boxes, and on classic ran a model on a 200 MHz Celeron, but that one ended early due to instability and I always wondered if it would have done so on a box within the spec...


Another problem is that some people don\'t allocate much time to cp.
One person took well over a year on a fairly fast computer. Another, a similar time on a very slow machine, but running 24/7.


An issue, rather than a problem I\'d say. When updating the words, you address this issue by saying that the minimum spec machine is X GHz if on 24/7, and proportionally faster for machines that are only on part time.


Usefull results can be produced by \'slow\' computers, so I don\'t see a need to complicate things more than they are.


The complication comes when people wwho have been running borderline machines successfully on slab, or even sub-spec machines, then meet a sulphur. Many users, myself included, take a great deal of effort to read up on a project just before and just after joining, then leave it running on autopilot once it seems happy.

As the slab models come in you will continue to get people panicking when they see 6000 hour completion times, 22 million second completion times on the command line, etc etc. People will, as I did, detach such machines as a reflex action. Therefore an automated screen to prevent sulphur going to boxes below a certain threshold will save the project long waits for boxes that detach. Updated info will save users some anxiety even if they don\'t see it till after they have the first sulphur on board and wonder what is happening.

In short, if sulphur is a new experiment, then new experimental guidelines are in order.


It\'s more necessary to know WHY the limits are there.


Agreed, and if the explanation relates to an expeirment (slab) that is no longer offered on the BOINC platform, then users/donors will \"know\" the wrong limits and will \"know\" only the outdated reasons.


But it\'s a good subject for discussion. :)

I\'ve taken you at your word here... ;-)
ID: 18281 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 18294 - Posted: 16 Dec 2005, 19:40:23 UTC
Last modified: 16 Dec 2005, 20:03:26 UTC

While in Sulphur Beta test, we anticipated the necessity of expanded documentation for processing Sulphur Work Units. I started such a page and others added to it and made it better. Crandles honed it and put it in the Wiki. (We don\'t have access to the CPDN FAQ as far as I know.)

Admittedly, it isn\'t \"up front\" and easy to find. It should be. However, I don\'t fault Carl or Tolu. They\'ve been \"drinking from a firehose\" since this project began. And if I may be forgiven another cliche, while they\'re up to their butts in alligators, it\'s hard for them to work on draining the (documentation) swamp.

Edit: More on sulphur Work Unit requirements: http://boinc-doc.net/boinc-wiki/index.php?title=Sulphur_Cycle

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 18294 · Report as offensive     Reply Quote
old_user3434
Avatar

Send message
Joined: 30 Aug 04
Posts: 77
Credit: 1,785,934
RAC: 0
Message 18317 - Posted: 17 Dec 2005, 19:26:22 UTC - in response to Message 18294.  
Last modified: 17 Dec 2005, 19:40:01 UTC

While my machines should have the power to complete the Sulphur Models, the \"Computer is overcommited\" factor has hit me hard when moving to BOINC 5.2.13

From what I read, can I actually expect that all affected machines will physically restrict themself to CPDN-only over the course of months ??

That would be a pity, I\'d like the other Projects to get their fair share :(
(also, what will happen if the other Projects debt is 6 months worth of CPU time, would CPDN not run for the next 6 months then as the others work down their debt? Worst case, I would expect the Scheduling/Debt mechanism not to stabilize, but actually destabilize with ever-increasing debt cycles between CPDN and other Projects *ugh* )

PS.
Any effective and safe tips to overcome that \"overcommited\" BOINC thinks it is?

-- edit --
Just got a very nice hint from a Team Member :

Since I didn\'t run BOINC for several months (finishing SETI Classic), the efficiency values for each Computer (\"% of time BOINC client is running\") had dropped close to 0% naturally.
I completely forgot about that important scheduling factor.

With a bit of luck, the Overcommited factor will vanish as soon as my Systems are back on >95% values :)
Scientific Network : 44800 MHz - 77824 MB - 1970 GB
ID: 18317 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 18323 - Posted: 18 Dec 2005, 4:00:27 UTC

If this starts to happen there are a couple things you can try.

One, force the issue with suspending various projects to force the system to spend some time doing other projects.

The other way would be to edit the long term debt numbers.

I have the potential for a similar problem on one machine. It has a LTD of 1,084,000 seconds built up ... this machine has had both a Slab and Sulfur model running on it. The good news is that both models are 80% done with one only having 5 more days to go (another month?).
ID: 18323 · Report as offensive     Reply Quote
John McLeod VII
Avatar

Send message
Joined: 5 Aug 04
Posts: 172
Credit: 4,023,611
RAC: 0
Message 18324 - Posted: 18 Dec 2005, 4:25:23 UTC - in response to Message 18317.  

While my machines should have the power to complete the Sulphur Models, the \"Computer is overcommited\" factor has hit me hard when moving to BOINC 5.2.13

From what I read, can I actually expect that all affected machines will physically restrict themself to CPDN-only over the course of months ??

That would be a pity, I\'d like the other Projects to get their fair share :(
(also, what will happen if the other Projects debt is 6 months worth of CPU time, would CPDN not run for the next 6 months then as the others work down their debt? Worst case, I would expect the Scheduling/Debt mechanism not to stabilize, but actually destabilize with ever-increasing debt cycles between CPDN and other Projects *ugh* )

PS.
Any effective and safe tips to overcome that \"overcommited\" BOINC thinks it is?

-- edit --
Just got a very nice hint from a Team Member :

Since I didn\'t run BOINC for several months (finishing SETI Classic), the efficiency values for each Computer (\"% of time BOINC client is running\") had dropped close to 0% naturally.
I completely forgot about that important scheduling factor.

With a bit of luck, the Overcommited factor will vanish as soon as my Systems are back on >95% values :)

You can edit the valued for active_frac and On_frac in the client_state.xml file to both be 1.0 if this is closer to reality now (they will fall a bit). Make certain that you shut BOINC down before opening the file, and you start BOINC again after you save the edit.

Yes, if CPDN takes a year runing in EDF, then it would be expected not to have any work requested for (1 year / CPDN resource_frac) - 1 year. Ex: CPDN resource fraction of .5 on a host that takes a year running 24/7 would run CPDN for a year and other things for a year (unless the other project had no work available and the computer ran dry at some point - then a CPDN result would be downloaded thus delaying the balance). Repeat.


BOINC WIKI
ID: 18324 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : sulphur seems slower than slab

©2024 climateprediction.net