climateprediction.net home page
Posts by EveningStarNM

Posts by EveningStarNM

1) Message boards : Number crunching : Possibly lost task result and other problems (Message 50194)
Posted 16 Sep 2014 by EveningStarNM
Post:
BOINC's thermal throttling is really very, very crude. Your "run tasks at 85% of processor capacity" will result in

Run for eight seconds, pause for 1 second
Run for eight seconds, pause for 2 seconds

or something like that...

You would be far better off limiting the number of CPU cores that BOINC is allowed to use to something like 75%, but allowing the cores in use to run at 100%...

Wow. I definitely misunderstood that "Use at most X% CPU time" setting. I kinda hate taking one of the cores out of the queue, but it seems prudent. Thanks for the information.
2) Message boards : Number crunching : Possibly lost task result and other problems (Message 50190)
Posted 15 Sep 2014 by EveningStarNM
Post:
Thank you for your reply, Iain.
The HADCM3S model as currently configured submits two trickles and two Zip files, but only one trickle appears on the Web site. If it's marked as a success it's a success.

Thanks for that explanation. Do the trickles contain any log information that can be used for debugging? While that would be more useful for the project's programmers, it might also be useful if users can alter their BOINC configurations to accommodate CPDN.
CPDN does not validate in the usual BOINC sense - i.e. require tasks to agree with each other. The models are simply too numerically complicated for that.

Then I suppose we can expect the validate state to always be "Initial". That's understandable.
The nuances of 'claimed' and 'granted' don't apply to CPDN because it doesn't validate. The credits will appear when the credit script is run. The number of credits is being reviewed for that particular model type.

Okay. I assume that credit is granted if useful results are returned regardless of the number of credits granted. All I'm really interested in is that the number of credits granted is greater than zero. That's really the only clue we have that what we're doing is beneficial.
The scientific results do not appear in the stderr log. That log provides information collected from the running of a model on a particular machine and will therefore vary from machine to machine. The log you mention simply reports that the model is constantly being suspended as you use the computer, which is the default BOINC setting.

That's exactly the kind of information that I'm interested in. I don't expect to see scientific results in STDERR, but I do want know about the system status. That machine, for instance, does nothing but run BOINC full time, and BOINC is set to always run tasks at 85% of processor capacity even when it's in use. BOINC is its only job. It doesn't even have a keyboard, mouse, or monitor attached to it. It's controlled through an RDP session. (I have too many computers at home, so I figure I might as well put the extras to work for BOINC projects). Tasks should never be suspended, and STDERR doesn't give any clues about why the suspensions were requested or how long the suspensions lasted. I'm a bit puzzled by this. I'll be grateful if you can offer any ideas about why those suspend requests might be made. Unfortunately, I did not have that machine set to keep tasks in memory when suspended, and I read a comment that suggested that might cause problems when checkpoints are saved. Could that have been the reason?
If you want to see the kind of analysis the project produces then have a look at the project's publication page. The papers there are credible, appropriate and appear in respected journals. That's as good as it can get for a BOINC project.

I expect that those papers would be even more informative if more useful results were returned. Hopefully, the bugs in the failure-prone applications will be worked out soon.
3) Message boards : Number crunching : Possibly lost task result and other problems (Message 50166)
Posted 14 Sep 2014 by EveningStarNM
Post:
I wanted to find out if the computer time I have given to CPDN was worth anything, so I spent some time examining task results. Notably, of the six tasks I'd downloaded recently, four failed with "Error while computing" (17020157, 17020615, 17020630, and 17020637). Since others have said those applications frequently fail, I've disabled them in my preferences, wondering why CPDN is wasting time distributing tasks that are unsuitable for so many BOINC volunteers.

I also discovered that one of my tasks from last March, 16161297, has status "Timed out - no response". However, that task is listed among my "Credit Pending" tasks (yes, I know that doesn't mean there's any actual credit pending), which makes it seem like it was completed. Or maybe not? Did the two of us that returned trickles return any results that are useful in any way?

It's interesting that it's a 3m8m Coupled Model Full Resolution Ocean model that didn't fail immediately. In fact, of the five systems that have tried to run that WU, two returned trickles, although both did suffer ultimate failures. Unfortunately, Stderr reports don't appear to be returned with the trickles, so it's difficult to know what happened on those machines (one of which was one of mine, but I can't find the log for that WU from so long ago). I'm hoping that something might be revealed in the trickle that will explain why the WU did not fail immediately on two Windows 7 machines, perhaps providing a clue to the CPDN programmers about how to get more useful results from this project.

But I was most interested in seeing if one of my current tasks that appeared to run to completion 17020616, was a useful expense of 30 hours of CPU time. Looking at the report page, I'm not sure. There are several issues:

1) Only one "trickle" was returned, at timestep 25,920, CPU time 60,604, even though the total CPU time for the task was only 50,955.87 secs.

2) The "Validate state" for the task is perpetually "initial".

3) No credit was claimed or granted.

4) The "Outcome" was "Success", the "Client state" was "done", and the "Exit status" was 0x0.

5) There is NOTHING in the Stderr log except a long, looooonnnng, series of "Suspended CPDN Monitor - Suspend request from BOINC..." messages.

Have the results from that effort been lost?

I'm still running an ANZ task, and it's got about 161 hours left to run. I'm considering aborting it. But before I do, I want to know if /anything/ I can do for CPDN is actually useful, or if I'm just wasting my time. So far, I am not encouraged by the results of my efforts.
4) Message boards : Number crunching : My stats page at BOINCstats/BAM is totally useless. (Message 50152)
Posted 12 Sep 2014 by EveningStarNM
Post:
Go to your account here on climateprediction.net, and click "climateprediction.net preferences", edit them, and on "Run only the selected applications", select the applications you want to run.

On your BOINC Manager Client, click update.

PS: This is not unique to climateprediction. Lots of other BOINC projects let you choose their respective applications when there are more than one.


Thanks! I don't play much with BOINC settings, and I'd never noticed that. I've just been letting it do its thing, intervening only when something crashed.
5) Message boards : Number crunching : My stats page at BOINCstats/BAM is totally useless. (Message 50149)
Posted 12 Sep 2014 by EveningStarNM
Post:
You could limit yourself to model types you trust: EU, ANZ, PNW - all fine.

If I'm not mistaken, I'd have to manually cancel the tasks that I don't trust and hope that I'll download one of those that you mentioned, right? I don't see -- and I don't expect -- anything in the project preferences that allow such fine-grained control of model selection.

The thing I like about BOINC is that, if I pick the right projects, I don't have to manage it. I don't want to have to manage it. I just want it to work without crashing my machines or wasting their time.

I'm still running a 180-hour ANZ task and a 30-hour HADCM3S 1HAB task. Frankly, I'll be surprised, but delighted, if they don't end in errors. We'll see.

Meanwhile, all 1CFP tasks should be removed from the project and not distributed. They don't appear to return good results ever.
6) Message boards : Number crunching : My stats page at BOINCstats/BAM is totally useless. (Message 50146)
Posted 12 Sep 2014 by EveningStarNM
Post:
...you could start running models again.
There's still plenty of the new hadcm3 "short" models, which take less than 24 hours to run, but have two huge zips.


Well, I tried again. I downloaded four tasks to run on a fresh and clean installation of Windows Server 2008 R2 and BOINC 7.2.42. I did upgrade VirtualBox from the bundled version to 4.3.12 because the old version kept crashing all three of the machines I ran it on, resulting in BSODs. But, so far, two of the tasks have resulted in "errors while computing". See 17020637 and 17020630. Both tasks died with too many "INITTIME: Atmosphere basis time mismatch" errors.

I don't believe in waste of any kind, including CPU cycles, so I'll go back to the projects I trust.
7) Message boards : Number crunching : My stats page at BOINCstats/BAM is totally useless. (Message 50143)
Posted 12 Sep 2014 by EveningStarNM
Post:
Sorry, but this project isn't about playing games with credits. It's wholly about climate modelling.


I should have conceded that, of course, you're right -- in exactly the same way that singing in the cotton fields wasn't the reason the slaves were picking cotton, but they sang anyway.
8) Message boards : Number crunching : My stats page at BOINCstats/BAM is totally useless. (Message 50142)
Posted 12 Sep 2014 by EveningStarNM
Post:
Sorry, but this project isn't about playing games with credits. It's wholly about climate modelling.

The current problem has been fixed, although why the script went so badly wrong still isn't known yet.

Nothing further can be done about credits on external sites, but you could start running models again.
There's still plenty of the new hadcm3 "short" models, which take less than 24 hours to run, but have two huge zips.


I'll make a deal with you: You live your life your way, and I'll live my life my way. Does that sound okay with you? But maybe I'll risk wasting CPU power on this project again. I had a lot of trouble with failed downloads and computing errors last year, so I gave up on CPDN and started giving computer time to projects that didn't exhibit such problems. I'll load a few tasks and see what happens.
9) Message boards : Number crunching : My stats page at BOINCstats/BAM is totally useless. (Message 50136)
Posted 12 Sep 2014 by EveningStarNM
Post:
After all, you might not have been given any credits at all for work done; now that does make my blood boil.


That actually happened to me with a couple of tasks on another project a while back. It was wonderful. I got a lot of mileage out of it with my friends. I was far behind them (I'm using three old machines for BOINC while they're using their latest and greatest single machines -- and kicking my ass), and it gave me something to fend off the teasing. They didn't make me buy the drinks that week even though they knew I was lying my ass off about how much credit I should have gotten.

Now that's fun! But this is going to cost me big time.
10) Message boards : Number crunching : My stats page at BOINCstats/BAM is totally useless. (Message 50131)
Posted 11 Sep 2014 by EveningStarNM
Post:
Unfortunately, our descendants /will/ have to rely on their PhDs in a whole slew of sciences, some of which haven't even been invented yet, in order to survive -- assuming, of course, that we stop dumbing down our education system and give them one they can use. But until we get our political system cleaned up and wrest control of it from the corrupt elite, there's not much we can do for them except nibble away at the edges with tiny efforts like this. Sadly, that doesn't seem likely to happen any time soon. So I'm just trying to have a little fun wherever I can find it. This friendly bet, which I was destined to lose, was a small pleasure, but an important one. (Ironically, I'm the sysadmin in the group and have more computing power available than any of them -- which made me the butt of endless jokes).
11) Message boards : Number crunching : My stats page at BOINCstats/BAM is totally useless. (Message 50129)
Posted 11 Sep 2014 by EveningStarNM
Post:
It will take months to smooth out that erroneous -- and gigantic -- bump. I am not a happy camper. I enjoyed the competition I was having with some friends, but now, instead of having BOINCstats/BAM do the comparisons for us, especially with regard to recent average credit, we'd have to do the calculations by hand for my account. Unfortunately, that's too much work for a casual thing like this, and I have to drop out of the competition.

It's not about the credit. It's about the fun in competing with friends -- even though I was never even close winning until CPDN totally f*****d up my stats. What's worse is that this seems to have been happening to others for a while.

Bummer.

Please fix it.
12) Message boards : Number crunching : No Tasks Available (Message 47429)
Posted 29 Oct 2013 by EveningStarNM
Post:
Regarding resources:

News and Announcements thread at the top of this section of the board is the reccomended place to watch for announcements of all types.



Believe me, I checked that. The last news was more than a month old.
13) Message boards : Number crunching : No Tasks Available (Message 47428)
Posted 29 Oct 2013 by EveningStarNM
Post:
N.B. This topic is widely discussed on the boards...


I saw something similar mentioned in a post created Sept. 19 ("Vanishing WUs", http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=7671#47123), but that's all I saw. I haven't checked the other boards. "Number Crunching" seemed vague enough to encompass my topic.
14) Message boards : Number crunching : No Tasks Available (Message 47424)
Posted 28 Oct 2013 by EveningStarNM
Post:
I downloaded and installed BOINC and added the climateprediction.net project several days ago, but BOINC's event log has consistently reported that the "Project has no tasks available". The server status page also shows that no tasks are ready to send. Is it true, as it seems, that this project already has all of the computing resources it needs?




©2024 climateprediction.net