climateprediction.net home page
Posts by John McLeod VII

Posts by John McLeod VII

1) Message boards : Number crunching : What if my BOINC project does not finish by the deadline? (Message 24105)
Posted 26 Aug 2006 by John McLeod VII
Post:
As long as trickles are received on a somewhat regular basis, you get credit each trickle. Each trickle has vital information, so it is not wasted. Finishing after the deadline is not an issue, if the server believes your unit is still alive. I actually seen a WHOLE WU uploaded with all the trickles, at once, and the person got their credits. This was seven months after the downloaded the unit. See this WU. As you can see every trickle on that was the same date, within second of each other, from first to last. Sent date and received date were approximately seven months apart.

So the tolerance on this project is wide open. Go ahead, keep crunching your small amount. You are doing good work.

Occasionally (once every few months), the developers go through the results and see which ones have not trickled in the last 6 months or so. These are declared dead and re-issued. The person that waited 7 months to trickle was a bit lucky that there was not a check during the last month.
2) Message boards : Number crunching : Question on no internet situation (Message 23778)
Posted 26 Jul 2006 by John McLeod VII
Post:
Shouldn\'t be a problem, I think other people do this too.

The way to do this:

Stop crunching.
Copy the ENTIRE directory structure onto a CD key.
Start crunching. Don\'t do this for the final report.

On the machine with Inet connection.
Stop crunching.
Start crunching from the CD Key.
Do the connection to CPDN. This will report and possibly get a new task.
Stop crunching from the CD Key.
Start crunching.

If this was a final report with a download, take the CD Key back to the crunching machine.
Copy the ENTIRE directory structure from the CD key back to the HD.
Start crunching.
3) Questions and Answers : Windows : Model running when it is not supposed to. (Message 23756)
Posted 25 Jul 2006 by John McLeod VII
Post:
I have several projects attached to my machines. On one of them, sulphur_4.22_windows_intelx86.exe is taking 50% of the CPU time and sulphur_um_4.22_windows_intelx86.exe is taking 20% of the CPU time even though BOINC believes that another project should be running. That project is getting approximately nothing because other higher projects are running taking some CPU time.
4) Questions and Answers : Wish list : Removing hosts (Message 22196)
Posted 19 Apr 2006 by John McLeod VII
Post:
In most projects (I have not checked this one) there is a delete host link that is only available if the host has no results associated. Most projects eventually delete the link between the host and the result, and then the host can be deleted. Again, I have not checked to see if the results are ever purged in this project.
5) Questions and Answers : Preferences : Cross project ID... (Message 22195)
Posted 19 Apr 2006 by John McLeod VII
Post:
Are the email addresses exactly the same in all of the projects? Including case? Have you double checked? There was a recent change where the server would automatically change email addresses to lower case for new projects. Advice is to use lower case.

Things that must happen in order for the CPIDs to line up.

1) The email addresses must be the same.
2) There must be a path from project server to host to project server that covers all projects.
3) Join date bug:
If you join Project A and attach to host 1, and then join Project B and attach to host 2, and then join Project C and attach to hosts 1 and 2, the CPIDS will never line up. In this case, you will have to temporarily attach Project A to host 2 (at least until the CPIDs line up).
4) The hosts must attach to each of the projects at least twice.
5) The projects must publish the stats.xml files.
6) The stats site(s) must pick up the stats.xml files.

For the adventurous, it is possible to modify all of the xml files on a host to change the internal CPIDS (still won\'t help if the email addresses are different).
6) Questions and Answers : Preferences : Query (Message 22194)
Posted 19 Apr 2006 by John McLeod VII
Post:
This is how much of the CPU time will allowed to be devoted to graphics. A low number here will tend to make the graphics jerky, but the processing faster. A fast graphics card that does most of the graphics computation itself will not use much CPU time anyway.

My advice is that if you care more about crunch times, you set the graphics % fairly low (5% or so), or turn off the graphics entirely.
7) Questions and Answers : Preferences : Swapping makes computer unusable (Message 22192)
Posted 19 Apr 2006 by John McLeod VII
Post:
CPDN uses only the RAM that it absoloutely needs in order to run. There is no way of reducing the amount of RAM that CPDN uses.

If CPDN is swapping too much, I would suggest the only recourse is to detach and try some other BOINC projects (NOT BURP as their RAM requirements are even higher).
8) Questions and Answers : Getting started : Retrieve email address/pw and combine 2 accounts (Message 22190)
Posted 19 Apr 2006 by John McLeod VII
Post:
You can log on with the authenticator in the account_*.xml file (pick the one for this project).
9) Questions and Answers : Getting started : How do I access an account w/o the email address? (Message 22189)
Posted 19 Apr 2006 by John McLeod VII
Post:
If you have a machine attached to the project, open the account_*.xml file for the project, in the file, you will find an authenticator. This is the same as the userID that you would have been sent in an email. With this you should be able to log on.
10) Questions and Answers : Windows : Comments for \'Generic solutions to models\' sticky (Message 22188)
Posted 19 Apr 2006 by John McLeod VII
Post:
I have noticed one thing about the crash during heavy CPU usage. The CPDN application is split into two pieces - one that uses almost no CPU (M if I recall correctly) and one that uses all of the available CPU (UM if I recall). In any case, on a single CPU system with one CPDN WU running, multiple UM processes were started, I believe that if this could be prevented, this crash would stop happening.

The only time that CPDN crashed on that machine, multiple CPDN results were started for the same result. I believe that if the first thing that happened during the execution of UM were to create a mutex based on the name of the result, then this crash would stop.

Let us know when it has been fixed.
11) Questions and Answers : Windows : Computer Is Overcommitted (Message 22104)
Posted 16 Apr 2006 by John McLeod VII
Post:
1) Adding projects allocates less time to the projects that are already on the host. This makes EDF and NWF more likely.

2) Long Term Debt is more of Long Term CPU Balance. Negative numbers means that the project has used more than its share of the CPU. (I have a couple of hosts where the LTD for CPDN is -2,000,000 or so.

4) It is not 60% and 80 % and 80% and 100%. They are resource shares, so it is 60 / (60+80+100) or 1/4 (25%) and 80/() or 1/3 (33%) and 100 / () or 5/12 (42%). This does affect the future, but not the past as there is no instant adjustment of the CPU balances.

5) It thinks that your host cannot afford to download extra work right now because of the resource shares. If having a result at the full resource share of the next project that would be asked for work would put the result over the deadline, the host is fully committed and will not download work.
12) Questions and Answers : Windows : CPDN Crash timing. (Message 22074)
Posted 15 Apr 2006 by John McLeod VII
Post:
You could set the timer so that it only runs Boinc on the hours you know you won\'t be using the machine? (Say, 2am - 8am or whatever).

Can\'t see JM7 doing that ;-)

The machine normally has bunches of idle time, even when I am using it, and it is ONLY CPDN that has this problem. CPDN ONLY crashes when the machine is running at 100% CPU with a normal process task for long enough so that extra CPDN crunching applications are started. Setting a timer for that machine would mean that other projects would not to get to run during the time that the machine might (but usually doesn\'t) run at 100% for a couple of hours. So, no, I am not going to set a timer to turn BOINC off. It would also cut the running time for that machien down to 1/3 of the time (as in it would be running 8 hours / day).

BTW, I lost another CPDN result yesterday in exactly the same manner.
13) Message boards : Number crunching : Anyone have experience with CC 5.4.x and climate predictor here? (Message 22073)
Posted 15 Apr 2006 by John McLeod VII
Post:
Running on 5.4.0 and newer versions (now 5.4.3) and no problems so far - SpinUp, Sulphur, AM3 models on 3 machines.

No problems that weren\'t already there with earlier versions of BOINC. Notably, if the host is swamped with real work, extra CPDN crunch processes are started, and then CPDN dies, but this was happening with older versions of BOINC as well.
14) Questions and Answers : Windows : CPDN Crash timing. (Message 22047)
Posted 15 Apr 2006 by John McLeod VII
Post:
CPDN itself is split into two pieces. One that does the actual work (and it was several of these that I saw running), and one the communicates with BOINC and also with the worker (I have only ever seen one of these running). My belief is that it is the communications between the two CPDN pieces that is causing the trouble.

Yes, I did report this on the BOINC Dev list, but got absoloutely no feedback that it had been received. Yes, the machine in question went to 99% CPU usage for a couple of hours on a normal priority process - so BOINC was getting nothing. I can replicate this completely at will, and I had yet another CPDN result crash this afternoon on that machine.

It is my opinion that if this is solved, it would probably cut the number of crashed runs by about 90%. I am also giving up on CPDN on that host in frustration (until such a time as this is reported fixed).
15) Questions and Answers : Windows : CPDN Crash timing. (Message 21999)
Posted 12 Apr 2006 by John McLeod VII
Post:
I have noticed that every time that my system gets busy with other tasks (the work for which the computer is sitting on my desk) for a couple of hours, CPDN starts multiple crunching processes (the ones that actually do the work). After the system becomes less busy, CPDN crashes. This happens every time.

I believe that the process that BOINC starts starts a task to do the crunching, and if the task that BOINC starts loses commmunications with the crunching task, another crunching task is started. I also believe that a mutex that the cruncher locks when it is started would work. This way, the task that BOINC starts could check the state of the mutex. If it is abandoned, the crunching task failed, and needs to be restarted.
16) Message boards : Number crunching : sulphur seems slower than slab (Message 18324)
Posted 18 Dec 2005 by John McLeod VII
Post:
While my machines should have the power to complete the Sulphur Models, the \"Computer is overcommited\" factor has hit me hard when moving to BOINC 5.2.13

From what I read, can I actually expect that all affected machines will physically restrict themself to CPDN-only over the course of months ??

That would be a pity, I\'d like the other Projects to get their fair share :(
(also, what will happen if the other Projects debt is 6 months worth of CPU time, would CPDN not run for the next 6 months then as the others work down their debt? Worst case, I would expect the Scheduling/Debt mechanism not to stabilize, but actually destabilize with ever-increasing debt cycles between CPDN and other Projects *ugh* )

PS.
Any effective and safe tips to overcome that \"overcommited\" BOINC thinks it is?

-- edit --
Just got a very nice hint from a Team Member :

Since I didn\'t run BOINC for several months (finishing SETI Classic), the efficiency values for each Computer (\"% of time BOINC client is running\") had dropped close to 0% naturally.
I completely forgot about that important scheduling factor.

With a bit of luck, the Overcommited factor will vanish as soon as my Systems are back on >95% values :)

You can edit the valued for active_frac and On_frac in the client_state.xml file to both be 1.0 if this is closer to reality now (they will fall a bit). Make certain that you shut BOINC down before opening the file, and you start BOINC again after you save the edit.

Yes, if CPDN takes a year runing in EDF, then it would be expected not to have any work requested for (1 year / CPDN resource_frac) - 1 year. Ex: CPDN resource fraction of .5 on a host that takes a year running 24/7 would run CPDN for a year and other things for a year (unless the other project had no work available and the computer ran dry at some point - then a CPDN result would be downloaded thus delaying the balance). Repeat.
17) Message boards : Number crunching : Sulphur model hung (Message 17438)
Posted 25 Nov 2005 by John McLeod VII
Post:
Anything relevant in stdoutdae.txt or stderrdae.txt? (In the BOINC folder.)


Looking again, nothing relevant.
18) Message boards : Number crunching : Sulphur model hung (Message 17427)
Posted 25 Nov 2005 by John McLeod VII
Post:
I had a Sulphur model hang several times for no apparent reason. If I restarted BOINC, it would continue on for a while, and then hang again.

The machine is running as a service. I know other projects have gotten into the habit of displaying a dialog box on error. This is an extremely bad idea as a service does not normally have the ability to display a dialog box, and the symptom is a hang for no apparent reason.
19) Message boards : Number crunching : completion date (Message 16973)
Posted 4 Nov 2005 by John McLeod VII
Post:
Deadlines are flexible in CPDN, but if you\'re running BOINC 5.* and other projects you could find that it runs CPDN exclusively in an attempt to meet your model\'s deadline.

useing 5.2.5 ,Im not sure but I think It should have already gone into EDF mode maybe it will soon.its due 2/24\\06 and should take about 2 mos of constant crunching to complete.

With 2 months of constant crunching left and a deadline of 2/24/06, it will be late December before it needs to enter EDF. It may enter EDF a bit earlier because of other work on the system, and it may leave EDF at times because it is a little ahead.
20) Message boards : Number crunching : Sulphur (Phase 2) question (Message 16440)
Posted 4 Oct 2005 by John McLeod VII
Post:
It depends a bit on the model being run. I believe that there are some fast processing (ice balls) that get much faster through the run. And there are others that git a bit slower.


Next 20

©2020 climateprediction.net