climateprediction.net home page
Posts by David G. Pickett

Posts by David G. Pickett

1) Message boards : Cafe CPDN : Disk Leak (Message 39590)
Posted 19 Apr 2010 by David G. Pickett
Post:
My BOINC was not doing anything, and at first I thought this was all student admins on spring break at the beach, but I noticed a message about lack of disk space, and found I was maxed out under the limit. I added some more headroom, and got some work units.

It seems Climate Prediction has grown to 30 GB of disk. Is this normal, and not a disk allocation leak?
2) Questions and Answers : Wish list : Visual Fortran Run-Time Error FAQ/Fix (Message 33807)
Posted 16 May 2008 by David G. Pickett
Post:
I just reset the project each time it does this. I suspect other users just dump the project or shrug. This relationship is supposed to be lightweight on our part. I read this is i/o errors, access conflicts, boinc, time sync, etc., some postings explaining log messages that are not related.

I wish you\'d write a FAQ, if not a fix!
3) Questions and Answers : Windows : Visual Fortran Run-Time Error (Message 33779)
Posted 15 May 2008 by David G. Pickett
Post:
I just reset the project each time it does this. I suspect other users just dump the project or shrug. This relationship is supposed to be lightweight on our part. I read this is i/o errors, access conflicts, boinc, time sync, etc., some postings explaining log messages that are not related.

Don\'t you think it\'s time for a FAQ, if not a fix?
4) Questions and Answers : Windows : All errors, and since May, nothing but errors are accumulating (Message 13398)
Posted 13 Jun 2005 by David G. Pickett
Post:
> David
>
> The programs used by the other projects are just toys compared with hadsm, so
> there is no point in comparing CPDN with them.
> Also, the -5 error covers several problems not covered more exactly by other
> error codes.
> And if the program encounters a problem, it rewinds a day and tries again,
> then a month, and then a year.
> Considering that hadsm is a million+ lines of fortran written to run on 64bit
> supercomputers, getting it to work on desktop machines is a real feat.
&gt; Met office computers <a> href="http://www.meto.gov.uk/research/nwp/numerical/computers/index.html"&gt;
&gt; here.</a> Sigh.
&gt;
&gt; &gt; the lack of a good diagnosis ia troubling
&gt; Andrew pointed you to a forum containing the pages about diagnostic programs.
&gt; If you need a more exact link, try <a> href="http://www.climateprediction.net/board/viewtopic.php?t=2126"&gt; this.</a>
&gt;
&gt;
&gt;
&gt;

Aside from the lack of parallel processing units, the Athlon arithmetic abilities are about the same as any other 64 bit computer, super or otherwise, especially if you are talking about being predictable enough to program for robust calculation. Now, with a vary big problem involving prediction, maybe negative pressure is a real poswsibility, rare, but real. I do not buy that there is an undiscovered lack of predictability in the Athlon FP results. I do not buy that getting closer to the problem will improve my perspective. I started in computers in the 60's, got close enough to fix them at the gate circuit component level, got lots of work from people who were afraid of the mantissa, exponent, justify and normalize, and now I am far enough from this sort of problem to have perspective.

My take is that the Internet computing model, such as is generalized by BOINC, is much like RAID5 - lots of unreliable but redundant computers. With all the wonder-stuff running around on gossamer inside these chips, never mind the hard life some systems have had (I am deep into my second power supply, and who knows what the last one did in its death throes), who should be surprised if FP units are prone to the occasional bad result; the computational model should be reasonably robust against that, or it can never reliably do any relatively large computation.

I am just a volunteer host, and in that role, the only message this model should send me is that N of M of my calculations have been contradicted by other units and verified by a third to be wrong, so I probably have a flaky CPU. All I am getting is heresay, rumor, and condolences from other lepers.

Now, I said a third, I did not say four, so when that is said, that is called hyperbole, an appeal to emotion like "straw man" and "you're another," used when one has no real logical argument. They have courses on this in college, too, called rhetoric, I think. I just read the book. Please eschew such.

I know nobody likes to go "back to the drawing board," but maybe the Athlon intolerance is an indication of an error on one of those so many lines of code; some bit done in the wrong order in some critical cases or with too little precision. (Maybe it is just a way of beating up on friends of the cheaper underdog? Your Bentley is no good, you should have sprung for the Rolls!) Hopefully, you have read up on Horner's method (one of my favorites) and similar ways to make calculations robust, portable and fast. I am all too aware of the dangers of combining delicate computations, never mind taking these results and extrapolating from them over and over. Certainly, something as massive and delicate as climate prediction would be very sensitive to the accumulation of error. To paraphrase one researcher, climate prediction can be thrown out of whack by a bonfire at a beach party.

Well, that is as close to Johnny Storm as I want to get!
5) Questions and Answers : Windows : All errors, and since May, nothing but errors are accumulating (Message 13345)
Posted 12 Jun 2005 by David G. Pickett
Post:
&gt; If your WU crashed mid way and then all further ones fail to get
&gt; anywhere,then it looks as though something happened to your machine around 27
&gt; May.
&gt;
&gt; The trouble with -5 errors is that it is a general error code, so it could be
&gt; a software conflict or hardware. The best place to look for advice is probably
&gt; here <a href="http://www.climateprediction.net/board/index.php?c=1">here</a>,
&gt; especially in 'other problems' and 'hardware related', and also <a> href="http://www.climateprediction.net/board/viewforum.php?f=4"&gt;here</a> for
&gt; the thread on compatible software.
&gt;
&gt;
Well, somewhere back in March I went up from ME to XP, which I love except for the things (old peripherals and 16 bit apps) it can't run right enough. I think my platform is stable, it runs SETI and Protein under boinc fine, and ran seti the old way for years. I certainly stress test it enough, being a confirmed power user.

The lack of problems elsewhere and the lack of a good diagnosis ia troubling -- after all, this is just a program that does some FP arithmetic and some net IO. If it gets a bad value at a checkpoint, it should send in the traces and move on, not assume hardware error. Even if there is a hardware error, the huge pool of machines should be doing redundant calculations for verification, and if two hosts fail a unit, maybe the unit shows a flaw in the underlying program. Of course, it'd be nice to tell a user if his box failes n of m units that all processed correctly on 2 other hosts. Maybe first, you should be checking to see if you have a pattern of failing on one flavor of FP CPU. Maybe this is a BOINC shortfall. Certainly, if all these CPUs were in a room at IBM or INTEL or Sun or HPO, and some started spitting out negatives, they'd figure out whether it was hardware or software. If we can't, it says the boinc thing is not there yet!
6) Questions and Answers : Windows : Processing hung, all results are errrors (Message 13331)
Posted 11 Jun 2005 by David G. Pickett
Post:
&gt; See my reply (links really) to this thread
&gt;
&gt; <a> href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=2675"&gt;http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=2675</a>
&gt;
&gt; You have a lot of -5 errors, which are sometimes indicative of hardware
&gt; errors, RAM, CPU, Power Supply? The basic maintenance, and tests should give
&gt; you a reasonable handle on if your system is hardware stable.
&gt;

OK, My box is quite stable and runs two other boinc projects error free. If Athlons and their FP code are a known problem in your boinc code, I'd get busy finding it, as it may be a generic problem that is only detected in Athlons. If you don;t understand the problem enough to fix it, you probably don;t understand it enough to predict it's full impact.

Why do you keep taking time, and even assign credit, if you can't achieve anything?
7) Questions and Answers : Windows : All errors, and since May, nothing but errors are accumulating (Message 13330)
Posted 11 Jun 2005 by David G. Pickett
Post:
It seems nothig good is happening, as all my results since May have been 0 credit and all my results have this error:

Result ID 640114
Name 12xw_000070997_0
Workunit 426951
Created 19 Mar 2005 16:17:19 UTC
Sent 19 Mar 2005 19:08:44 UTC
Received 28 May 2005 7:27:59 UTC
Server state Over
Outcome Client error
Client state Computing
Exit status -5 (0xfffffffb)
Host ID 135493
Report deadline 2 Mar 2006 0:28:44 UTC
CPU time 1949466.45
stderr out 4.25
- exit code -5 (0xfffffffb)



Granted credit 1701.32
Client version ---
Trickle # 18

8) Questions and Answers : Windows : Processing hung, all results are errrors (Message 13216)
Posted 8 Jun 2005 by David G. Pickett
Post:
A few days back, I started getting this, so I had to unhook and resubscribe; update and reset were no good, and I don\'t know if it will start doing good work yet, as my history showa a lot of errors and no credit:

6/6/2005 10:07:25 PM|climateprediction.net|Finished upload of 0776_000014261_0_1.zip
6/6/2005 10:07:25 PM|climateprediction.net|Throughput 5512 bytes/sec
6/6/2005 10:07:25 PM|climateprediction.net|Finished upload of 0776_000014261_0_2.zip
6/6/2005 10:07:25 PM|climateprediction.net|Throughput 143234 bytes/sec
6/6/2005 10:07:25 PM|climateprediction.net|Started upload of 0776_000014261_0_3.zip
6/6/2005 10:07:25 PM|climateprediction.net|Started upload of 0776_000014261_0_4.zip
6/6/2005 10:07:28 PM|climateprediction.net|Finished upload of 0776_000014261_0_3.zip
6/6/2005 10:07:28 PM|climateprediction.net|Throughput 5011 bytes/sec
6/6/2005 10:07:28 PM|climateprediction.net|Finished upload of 0776_000014261_0_4.zip
6/6/2005 10:07:28 PM|climateprediction.net|Throughput 5512 bytes/sec
6/6/2005 10:07:28 PM|climateprediction.net|Started upload of 0776_000014261_0_5.zip
6/6/2005 10:07:30 PM|climateprediction.net|Finished upload of 0776_000014261_0_5.zip
6/6/2005 10:07:30 PM|climateprediction.net|Throughput 7262 bytes/sec
6/6/2005 10:08:11 PM||Insufficient work; requesting more
6/6/2005 10:08:11 PM|climateprediction.net|Requesting 8640.00 seconds of work
6/6/2005 10:08:11 PM|climateprediction.net|Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
6/6/2005 10:08:12 PM|climateprediction.net|Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded
6/6/2005 10:08:12 PM|climateprediction.net|Message from server: No work sent
6/6/2005 10:08:12 PM|climateprediction.net|Message from server: (reached daily quota of 1 results)
6/6/2005 10:08:12 PM|climateprediction.net|No work from project
6/6/2005 10:08:12 PM|climateprediction.net|Deferring communication with project for 22 hours, 6 minutes, and 49 seconds
6/6/2005 10:47:58 PM||Insufficient work; requesting more
6/6/2005 10:47:58 PM|climateprediction.net|Requesting 8640.00 seconds of work
6/6/2005 10:47:58 PM|climateprediction.net|Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
6/6/2005 10:47:59 PM|ProteinPredictorAtHome|Resuming result h0009A_1_59098_3 using mfoldB125 version 4.31
6/6/2005 10:47:59 PM|SETI@home|Pausing result 22ja05ab.14868.17857.73576.108_0 (left in memory)
6/6/2005 10:47:59 PM|climateprediction.net|Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded
6/6/2005 10:47:59 PM|climateprediction.net|Message from server: No work sent
6/6/2005 10:47:59 PM|climateprediction.net|Message from server: (reached daily quota of 1 results)
6/6/2005 10:47:59 PM|climateprediction.net|No work from project
6/6/2005 10:47:59 PM|climateprediction.net|Deferring communication with project for 22 hours, 3 minutes, and 15 seconds





©2024 climateprediction.net