climateprediction.net home page
CPN WU reset?

CPN WU reset?

Message boards : Number crunching : CPN WU reset?
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user372

Send message
Joined: 7 Aug 04
Posts: 8
Credit: 121,482
RAC: 0
Message 2308 - Posted: 31 Aug 2004, 10:34:00 UTC

After a few days of constant chugging away, my computer was starting to lag up. I quit BOINC from the system tray and restarted my computer.

When it restarted, BOINC had revised the completion time of my currnet CPN WU from 770 hours remaining to 33000 hours remaining... I thought this was quite odd, as it did show the correct \'CPU Time\' that the model had been running for so far, and the WU Name was the same.

When I looked at the screensaver, I had jumped backwards to timestep 200 (was in the 30000\'s). No error messages were displayed in the messages window before or after the reset, but I seem to have lost the last 38 hours of work.

Anybody know why? Is this a bug?

-- Oh, and I just noticed that the CPN WU has blown out to 90000 hours remaining.
<img src="http://boinc.mundayweb.com/seti2/stats.php?userID=974">
<img src="http://boinc.mundayweb.com/predictor/stats.php?userID=90">
<img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=92">
ID: 2308 · Report as offensive     Reply Quote
STE\/E

Send message
Joined: 15 Aug 04
Posts: 57
Credit: 10,360,323
RAC: 1,102
Message 2313 - Posted: 31 Aug 2004, 11:38:06 UTC
Last modified: 31 Aug 2004, 11:38:32 UTC

-- Oh, and I just noticed that the CPN WU has blown out to 90000 hours remaining.
=========

WoW, your set then for the next 3333.33 Days or 9.13 Years before you will need another Work Unit... hehe ;)

<a href="http://setiweb.ssl.berkeley.edu/team_display.php?teamid=112234">JOIN TEAM</a>
<img src="http://support.alienware.com/Images/portal_page_images/alienware_logo.gif">
<a href="http://www.boinc.dk/index.php?page=user_statistics&amp;project=sah&amp;userid=36992"><img border="0" height="80" src="http://www.boinc.dk/auto.php?user=36992&amp;project=sah&amp;input=1093549747+-+btest2&amp;layout=1093549747+-+btest2.jpg"></a>

ID: 2313 · Report as offensive     Reply Quote
Jord
Avatar

Send message
Joined: 5 Aug 04
Posts: 250
Credit: 93,274
RAC: 0
Message 2315 - Posted: 31 Aug 2004, 12:02:23 UTC

Sounds more like you have a problem with your motherboard chipset drivers. Try reinstalling them.
--------------------
Jordâ„¢
<img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=2&amp;trans=off">
ID: 2315 · Report as offensive     Reply Quote
old_user372

Send message
Joined: 7 Aug 04
Posts: 8
Credit: 121,482
RAC: 0
Message 2317 - Posted: 31 Aug 2004, 12:16:21 UTC - in response to Message 2315.  

&gt; Sounds more like you have a problem with your motherboard chipset drivers. Try
&gt; reinstalling them.
&gt; --------------------
&gt; Jordâ„¢
&gt; <img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=2&amp;trans=off">
&gt;

Erm, no, its quite clearly a CPDN software fault :)
<img src="http://boinc.mundayweb.com/seti2/stats.php?userID=974">
<img src="http://boinc.mundayweb.com/predictor/stats.php?userID=90">
<img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=92">
ID: 2317 · Report as offensive     Reply Quote
Profile Andrew Hingston
Volunteer moderator

Send message
Joined: 17 Aug 04
Posts: 753
Credit: 9,804,700
RAC: 0
Message 2318 - Posted: 31 Aug 2004, 12:41:00 UTC

The rewind to an earlier stage is deliberate; the program has identified a possible calculation error and is trying again.

The fact that it is processing very slowly shows that all is still not well. If you are able to look at the graphics you will probably find that the temp is very low, or even at absolute zero. For obvious reasons this is known as a 'slow processing iceball'. You will find plenty of references on the main CPDN board.

The model should eventually terminate prematurely and upload a new WU. If you aren't that patient, you can 'reset project' and force that. This behaviour can happen because of the parameters in the model, and you may have no further trouble, but I'm afraid that this is often an indication of a hardware/system problem. Look in the technical section of the CPDN main forum and the announcements in the hardware forum, especially those on hardware tests and maintenance. I'm afraid that running a climate model is a heavy duty task for any machine, and personally I would try running the hardware tests at this stage just to eliminate the possibility of a problem there.
ID: 2318 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2184
Credit: 64,822,615
RAC: 5,275
Message 2428 - Posted: 31 Aug 2004, 22:55:44 UTC - in response to Message 2318.  

No. This is not what has happened. He went from TS 30000 to 200 (when he looked). What it has done is gone back to day 0 of the run and start over, even though he was well into the 2nd year of the model run. This happened to me (and I believe Martin Sykes a couple times) in the last couple weeks. It happened to me after a power failure shut my computer down. I was 6 trickles into it when it reset to day 0, but kept the hours that it had worked on it, and therefore had a terrible time estimating time to completion, and calculating sec/ts. There were no resets to the previous day/month/year, just back to day 0. There appears to be a problem with the software under certain conditions, where a reset to day 0 occurs (for whatever reason, but it shouldn't happen) but the time working on the model is not reset.

George

&gt; The rewind to an earlier stage is deliberate; the program has identified a
&gt; possible calculation error and is trying again.
&gt;
&gt; The fact that it is processing very slowly shows that all is still not well.
&gt; If you are able to look at the graphics you will probably find that the temp
&gt; is very low, or even at absolute zero. For obvious reasons this is known as a
&gt; 'slow processing iceball'. You will find plenty of references on the main CPDN
&gt; board.
&gt;
&gt; The model should eventually terminate prematurely and upload a new WU. If you
&gt; aren't that patient, you can 'reset project' and force that. This behaviour
&gt; can happen because of the parameters in the model, and you may have no further
&gt; trouble, but I'm afraid that this is often an indication of a hardware/system
&gt; problem. Look in the technical section of the CPDN main forum and the
&gt; announcements in the hardware forum, especially those on hardware tests and
&gt; maintenance. I'm afraid that running a climate model is a heavy duty task for
&gt; any machine, and personally I would try running the hardware tests at this
&gt; stage just to eliminate the possibility of a problem there.
&gt;
&gt;
ID: 2428 · Report as offensive     Reply Quote

Message boards : Number crunching : CPN WU reset?

©2024 cpdn.org