climateprediction.net home page
CPDN you may recycle these, I know I killed them.

CPDN you may recycle these, I know I killed them.

Message boards : Number crunching : CPDN you may recycle these, I know I killed them.
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile old_user3566
Avatar

Send message
Joined: 30 Aug 04
Posts: 9
Credit: 15,780
RAC: 0
Message 7992 - Posted: 29 Jan 2005, 1:48:09 UTC
Last modified: 29 Jan 2005, 1:49:46 UTC

262958 252242 26 Sep 2004 4:14:10 UTC --- In Progress Unknown New 0.00 0.00

30569 20718 30 Aug 2004 23:18:54 UTC --- In Progress Unknown New 179644.00 453.68

<a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?userid=3566&amp;PHPSESSID=bbdead271526d179c24d7c5d8dd95971">http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?userid=3566&amp;PHPSESSID=bbdead271526d179c24d7c5d8dd95971</a>

Don't wanna make ya wait a year.
-----------------------
Click to see my tag
<a href="http://boinc.mundayweb.com/one/stats.php?userID=1049">My tag</a>
SNAFU'ed? Turn the Page! :D
ID: 7992 · Report as offensive     Reply Quote
Profile old_user11965

Send message
Joined: 4 Sep 04
Posts: 61
Credit: 80,585
RAC: 0
Message 8002 - Posted: 29 Jan 2005, 3:47:30 UTC
Last modified: 29 Jan 2005, 3:48:21 UTC

I've got a couple of results that can be recycled, too:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=139624
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=281738

In the first, the machine died a horrible death at the hand of an mutinous power supply. The second machine was detached when I realized there was no way it could possibly finish the work unit within the deadline.

trane

ID: 8002 · Report as offensive     Reply Quote
Kenneth Larsen

Send message
Joined: 26 Aug 04
Posts: 59
Credit: 438,133
RAC: 0
Message 8068 - Posted: 29 Jan 2005, 17:22:09 UTC

I have quite a few units too that may be recycled, if possible:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=25426
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=26392
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=281885
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=426705
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=450785
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=496961

Quite a lot, I know :-(
ID: 8068 · Report as offensive     Reply Quote
Profile Honza
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 390
Credit: 2,475,242
RAC: 0
Message 8072 - Posted: 29 Jan 2005, 17:36:56 UTC
Last modified: 29 Jan 2005, 17:37:34 UTC

I believe there is an automatic re-sent mechanism. Each WU can be sended to 5 users. If your model performs a crash, it should be automaticaly reported to the server database.
Still, it would be good to known the reason of crash and prevent farther computation loss.
ID: 8072 · Report as offensive     Reply Quote
Profile old_user36084
Avatar

Send message
Joined: 15 Jan 05
Posts: 31
Credit: 1,249,348
RAC: 0
Message 8076 - Posted: 29 Jan 2005, 18:48:45 UTC
Last modified: 29 Jan 2005, 18:55:02 UTC

I’ve got some WUs that BOINC downloaded after a fatal crash, but I restored BOINC to a previous state. I copied the BOINC directory before restoring, therefore can I copy the new WUs into the working directory …/projects/climate prediction.net ? Do I need to update any *.xml in the main BOINC directory for it to recognise these WUs and starts them when the present models are completed ?

BTW BOINC did not do much work on these new WUs hence the CP server thinks they are in progress and new, see <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?hostid=93819">results</a>.

These restored WUs are sending trickles to the CP server with no problems, see <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=93819">computer summary</a>. My only concern is the CP server believes these restored WUs are over with the outcome of client error; so when it completes the model will the CP server accept it ?
ID: 8076 · Report as offensive     Reply Quote
old_user1132

Send message
Joined: 25 Aug 04
Posts: 28
Credit: 6,522,252
RAC: 0
Message 8121 - Posted: 30 Jan 2005, 9:39:45 UTC - in response to Message 8076.  


&gt; My only concern is the CP server believes these restored WUs are
&gt; over with the outcome of client error; so when it completes the model will
&gt; the CP server accept it ?
&gt;

My experience has been that the CPDN central BOINC servers are pretty good at sorting out this kind of problem. Once they see the trickles coming in for a given model from the same client they seem to get merged. With model restored from a backup you start getting credit again once you pass the point of the last valid trickle held by the server.

Andrew
Andrew

<a href="http://cpdnforum.info">CPDNforum<a>
ID: 8121 · Report as offensive     Reply Quote
old_user26363

Send message
Joined: 22 Oct 04
Posts: 1
Credit: 289,691
RAC: 0
Message 8959 - Posted: 9 Feb 2005, 1:02:23 UTC

There is a need, I believe, for users to be able to manmually notify a CPDN WU as dead. This is unique to CPDN.

Most boinc projects have deadlines of a fortnight as a maximum for WUS to be crunched.. If a result is not retunred, by default the server reissues. Thus the analysis of results timescale is not massively delayed.

CPDN with it's lengthy times (necessary due to the size of WUS), can be unaware for months that a result has failed. The WU hangs in limbo on the server as "in prgress". It could take a year before it is reissued, only to suffer the same delay. That, I expect, harms CPDN's ability to analyse results.

Where there is a complete hardware failure, the client computer will not return an error message. Thus the result stays as "in progress".

I have a number of these due to issues at my end on one PC that had an intermittent CPU / motherboard fault, such that the hard disk was reformatted and so downloaded new CPDN WUs without the old being properly terminated. Such dead WUs include:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=309504
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=312529
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=317599
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=329727
ID: 8959 · Report as offensive     Reply Quote
old_user2147

Send message
Joined: 27 Aug 04
Posts: 55
Credit: 1,106,201
RAC: 0
Message 8963 - Posted: 9 Feb 2005, 1:58:46 UTC - in response to Message 8959.  
Last modified: 9 Feb 2005, 2:00:32 UTC

&gt; There is a need, I believe, for users to be able to manmually notify a CPDN WU
&gt; as dead. This is unique to CPDN.
&gt;
&gt; Most boinc projects have deadlines of a fortnight as a maximum for WUS to be
&gt; crunched.. If a result is not retunred, by default the server reissues. Thus
&gt; the analysis of results timescale is not massively delayed.
&gt;
&gt; CPDN with it's lengthy times (necessary due to the size of WUS), can be
&gt; unaware for months that a result has failed. The WU hangs in limbo on the
&gt; server as "in prgress". It could take a year before it is reissued, only to
&gt; suffer the same delay. That, I expect, harms CPDN's ability to analyse
&gt; results.
&gt;
&gt; Where there is a complete hardware failure, the client computer will not
&gt; return an error message. Thus the result stays as "in progress".
&gt;

I agree w/this post, and also feel there is an additional component supporting this argument: Since the CP WU generally takes anywhere from 3 wks, to several months in order to complete a single WU, there is an even greater probability of occurance of a computing or operator error for any given WU, due to the long WU period of time. Mains A.C. power failures, accidentaal reboots or shutdowns, hardware failures, &amp; other unexpected issues DO occur from time-to-time, and are unavoidable. If this was not the case, the commmercial computing &amp; telcom industries would not have been ardently, and unsuccessfully, chasing the elusive "five nines" for the past several decades!!!

I've run in several gauntlets for a DC project called Seventeen-or-Bust, which is a subset of GIMPS (Mersenne Primes Search). There is an area on the project board that shows all WU's (factoring exponents, actually) which are assigned to your user_ID. the same page then has a function which allows the user to "release" any of thier assigned WU's immediately back to the WU pool. This works excellently in theory &amp; in practice, and came in really handy when I fouled up a service install when I was new to the project.

JMO

Strat
ID: 8963 · Report as offensive     Reply Quote
old_user23880
Volunteer tester

Send message
Joined: 10 Oct 04
Posts: 223
Credit: 4,664
RAC: 0
Message 8965 - Posted: 9 Feb 2005, 2:28:51 UTC

I believe that there are more possible cpdn models (all the possible parameter combinations) than we can realistically ever do, even if we all recruit extra friends, family and machines. So some possible models will probably never be crunched. The important thing seems to be to complete as many models as we can so that the researchers have the largest possible data set, rather than worrying about the fate of particular failed models. They will be automatically reissued anyway. The delays before models are reissued would only matter if we were obliged to complete all possible models before a particular date.

So don't worry and just keep crunching, if possible sorting out the problems that caused the crash.
__________________________________________________

ID: 8965 · Report as offensive     Reply Quote
Profile old_user3566
Avatar

Send message
Joined: 30 Aug 04
Posts: 9
Credit: 15,780
RAC: 0
Message 8966 - Posted: 9 Feb 2005, 2:43:31 UTC - in response to Message 8965.  
Last modified: 9 Feb 2005, 2:47:17 UTC

yes, but still a waste to have 1000 or more waiting for one more 'completed' before being validated.

Edit: and to add, even those projects that have a short due date are lengthened by a considerable percentage while waiting for that last non-error completion.
-----------------------
Click to see my tag
<a href="http://boinc.mundayweb.com/one/stats.php?userID=1049">My tag</a>
SNAFU'ed? Turn the Page! :D
ID: 8966 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 8971 - Posted: 9 Feb 2005, 4:26:54 UTC

I read on an another thread (somewhere), that if the server hasn't had a trickle from a host for 6 weeks,
it labels it 'dead', and re-issues it. Up to the limit of 5 attemps, I guess.

But I haven't seen any "official" documentation.

Les
ID: 8971 · Report as offensive     Reply Quote
Profile old_user3566
Avatar

Send message
Joined: 30 Aug 04
Posts: 9
Credit: 15,780
RAC: 0
Message 9717 - Posted: 21 Feb 2005, 20:54:41 UTC - in response to Message 8971.  
Last modified: 21 Feb 2005, 21:41:56 UTC

Edit:
Never mind, 6 weeks is a long time.
-----------------------
Click to see my tag
<a href="http://boinc.mundayweb.com/one/stats.php?userID=1049">My tag</a>
SNAFU'ed? Turn the Page! :D
ID: 9717 · Report as offensive     Reply Quote

Message boards : Number crunching : CPDN you may recycle these, I know I killed them.

©2024 climateprediction.net