climateprediction.net home page
Posts by Marc

Posts by Marc

1) Questions and Answers : Wish list : Manually expire old work units (Message 23851)
Posted 5 Aug 2006 by Marc
Post:
I\'m not sure about that, but eventually the CPDN load will expire if that\'s what you mean. It generally allows for about a year before they expire, and in fact, some of mine are coming up on that expiration date in just a couple months. Just seems like there ought to be a way to mark a workload as, \"you\'re never going to get another trickle report, so move along\". If there is a 60 day thing in there, maybe since the machine is still active, doing anothe workload, CPDN is confused as to the status of those other work units, originally assigned to the same machine.
2) Questions and Answers : Wish list : Manually expire old work units (Message 23847)
Posted 4 Aug 2006 by Marc
Post:
A feature I would like to see on climateprediction.net at some point is the ability to tell CPDN when a workunit will never be resumed. I had some less stable machines at work that couldn\'t handle CPDN, plus I had a few problems with the client at home and had to reinstall the client.

That left a few CPDN workloads unfinished. CPDN shows several of those workloads as still \"In Progress\". In reality, only the job assigned to my account last night is actually running; the rest were aborted and/or lost. All other workloads could be reassigned to another CPDN member.
3) Questions and Answers : Windows : Setting graphics view defaults (Message 23810)
Posted 30 Jul 2006 by Marc
Post:
After updating to the most recent version of the BOINC client, it seems most of the settings are now propogating correctly. Thanks!
4) Questions and Answers : Windows : Setting graphics view defaults (Message 23809)
Posted 30 Jul 2006 by Marc
Post:
Ah. Thanks for the tip.

...but apparently, that\'s insufficient. I made my changes, saved them, went back to the BOINC manager, clicked Update, waited until it said \"scheduler request succeeded\" in the Messages tab (which should mean it completed an update), and loaded the graphics. No change. None of the new settings are taking effect. Also confirmed the screen saver is also using the same settings. Checked to make sure the web site had saved my settings. It had. Went to edit the settings, they all went back to defaults. Set everything again. Saved. Updated again. Displayed graphics. Still shows temperature map, BBC panel, no rotation (fixed at Africa). What gives?
5) Questions and Answers : Windows : Setting graphics view defaults (Message 23793)
Posted 30 Jul 2006 by Marc
Post:
Okay, suppose that despite the occasional crash resulting from the screen saver, I just like the screen saver and/or graphics, and I don\'t really want to:

Start off with the temperature view and BBC side-panel
C for clouds view
F for fluffy clouds
S to start planet rotation
Z to hide the side-panel
0 to show the star field

...every single time. That\'s is my preferred view, however graphics-intensive it is, and I want to keep it that way. How do I make those settings stick?
6) Message boards : Number crunching : Domino effect leads to unrecoverable errors (Message 19064)
Posted 5 Jan 2006 by Marc
Post:
At what priority is Outlook running?


\"Normal\" priority.

How long should the BOINC core (client) waited for child processes (CPDN aplication)? If 30 minutes is not enough, should it wait even more in every case (e.g. Windows shutdown/restart)? I definitely would not waited half an hour until Windows restart. There must be a timeout limit...which was propably exeeded due to Outlook resource demands.


That\'s tricky, and I understand why 30 minutes seems like a long time. ...but for someone who leaves their workstation on overnight and over weekends, and is processing a workload that can easily take 3 months to complete, even 72 hours is not all that long. Even so, a timeout should not mean \"throw away the whole workload and start over\" it just means going back to the last checkpoint, right?

(iii) solution on OS level.

I think you\'re hinting at something I would agree with -- just because a process has slightly lower priority doesn\'t mean it should come to a screeching halt. Ideally Windows should probably execute tasks in a manner more like Linux. I don\'t know how anyone would change that.

I would first check out Outlook priority, do some maintining of Outlook (compact database, degraf disk as such application tends to fragments large files which results in slower running) etc.
What AV solution are you using? There can be a connection with e-mail client.


I have Symantec Anti-Virus Corporate Edition running, and despite my IT department warning me not to install \"unauthorized patches\" I\'ve gone ahead and run Office Update to get everything up to where Microsoft thinks they should be. Though Outlook seems stable at the moment, errors have increased. Especially errors like this:
1/4/2006 4:25:13 PM|climateprediction.net|Unrecoverable error for result sulphur_igu2_000861626_0 (<file_xfer_error> <file_name>sulphur_igu2_000861626_0_1.zip</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error><file_xfer_error> <file_name>sulphur_igu2_000861626_0_2.zip</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error><file_xfer_error> <file_name>sulphur_igu2_000861626_0_3.zip</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error><file_xfer_error> <file_name>sulphur_igu2_000861626_0_4.zip</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error><file_xfer_error> <file_name>sulphur_igu2_000861626_0_5.zip</file_name> <error_code>-161</error_code> <error_message></error_message></file_xfer_error>)
7) Message boards : Number crunching : Domino effect leads to unrecoverable errors (Message 18951)
Posted 3 Jan 2006 by Marc
Post:
Yes yes, I know unrecoverable errors happen with climateprediction.net, and that sometimes they\'re hardware-specific, sometimes work load specific, and occasionally it\'s a bug in the particular BOINC Science Project being run.

...but I think in this case you will agree that tightening the climateprediction.net code against other applications\' errors would lessen the frequency of the error I saw.

In my case, due to not knowing BOINC very well, I ended up with three climateprediction.net workloads, two of which had been deferred. I just got into the office, unlocked my screen (I leave Outlook running 24/7 so that it can process client-side e-mail rules), and found that Outlook had basically \"zombie\"\'d, eating up all the CPU resources I have on my Windows XP Pro commercial desktop. Well, maybe it\'s just redrawing or reloading content from the Exchange server? So I left to run errands for 20-30 minutes. When I came back, I found this in my BOINC (5.2.13) log:

1/3/2006 8:25:20 AM|climateprediction.net|Unrecoverable error for result 15mu_300074520_0 (There are no child processes to wait for. (0x80) - exit code 128 (0x80))
1/3/2006 8:25:20 AM||request_reschedule_cpus: process exited
1/3/2006 8:25:21 AM|climateprediction.net|Computation for result 15mu_300074520_0 finished
1/3/2006 8:25:22 AM|climateprediction.net|Restarting result 1dht_100084807_0 using hadsm3 version 413

At this point I terminated Outlook (which freed up enough CPU resources that BOINC could resume). I detached the SETI@Home project I wasn\'t using anyway and looked back at the log,... the second of three workloads had also crashed:

1/3/2006 8:53:27 AM|SETI@home|Resetting project
1/3/2006 8:53:27 AM||request_reschedule_cpus: exit_tasks
1/3/2006 8:53:28 AM|SETI@home|Detaching from project
1/3/2006 8:53:28 AM||request_reschedule_cpus: project op
1/3/2006 8:53:55 AM|climateprediction.net|Unrecoverable error for result 1dht_100084807_0 ( - exit code -5 (0xfffffffb))
1/3/2006 8:53:55 AM||request_reschedule_cpus: process exited
1/3/2006 8:53:55 AM|climateprediction.net|Computation for result 1dht_100084807_0 finished
1/3/2006 8:53:55 AM|climateprediction.net|Restarting result 1eoq_100086367_0 using hadsm3 version 413

At this point I started reading the BOINC Wiki about unrecoverable computation errors and found a note that I should report this to the particular project affected.

Wondering how I was going to clear these errors and whether those two crashed workloads would need to be manually reported before they could be removed, I decided to try automatic handling and request a server update. The update seems to have worked to report the completed segments at least.

Still, if this is indeed a resource starvation problem like I think it is (\"There are no child processes to wait for\"), the client should have waited for resources to be free, not errored out. Right?




©2024 climateprediction.net