climateprediction.net home page
DLT=0.00 ? and then a massive value like 400

DLT=0.00 ? and then a massive value like 400

Questions and Answers : Unix/Linux : DLT=0.00 ? and then a massive value like 400
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile old_user812

Send message
Joined: 12 Aug 04
Posts: 52
Credit: 121,983
RAC: 0
Message 1719 - Posted: 26 Aug 2004, 10:22:03 UTC

I get lots of lines ending with DLT=0.00. On the other linux boxes this doesn\'t happen - there is a number there, variously 2 or 3 or 4, but on this machine I get loads of 0.00 and occasionally a very large value, like 400:

00gd_300025574 - PH 1 TS 003312 - 10/02/1811 00:00 - H:M:S=0003:04:00 AVG= 3.33 DLT= 0.00
00gd_300025574 - PH 1 TS 003313 - 10/02/1811 00:30 - H:M:S=0003:10:40 AVG= 3.45 DLT=400.85

I\'m suspicious of this because, as well as the odd value, it\'s running on the freebsd box under linux (8) emulation and I don\'t quite trust it yet. Also, though the box has been running the model for a couple of days now, I see no trickle under its identity (on the native linux boxes I see at least one every 24 hrs). Any ideas?
<img src="http://boinc.mundayweb.com/cpdn/stats.php/userID:61/trans:off/.png"></img>
ID: 1719 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 1742 - Posted: 26 Aug 2004, 14:16:15 UTC

wow, well if that's correct it's doing about 6 minutes per timestep (from 3312 to 3313 is 6 minutes 40 seconds of CPU Time)! can you paste some more lines?
ID: 1742 · Report as offensive     Reply Quote
Profile old_user812

Send message
Joined: 12 Aug 04
Posts: 52
Credit: 121,983
RAC: 0
Message 1747 - Posted: 26 Aug 2004, 14:54:29 UTC - in response to Message 1742.  
Last modified: 26 Aug 2004, 14:56:05 UTC

&gt; wow, well if that's correct it's doing about 6 minutes per timestep (from 3312
&gt; to 3313 is 6 minutes 40 seconds of CPU Time)! can you paste some more lines?
&gt;
&gt;
&gt;

Sure, here you go:

00j0_300025669 - PH 1 TS 001713 - 06/01/1811 16:30 - H:M:S=0001:11:27 AVG= 2.50 DLT= 0.00
00j0_300025669 - PH 1 TS 001714 - 06/01/1811 17:00 - H:M:S=0001:11:27 AVG= 2.50 DLT= 0.00
00j0_300025669 - PH 1 TS 001715 - 06/01/1811 17:30 - H:M:S=0001:11:27 AVG= 2.50 DLT= 0.00
00j0_300025669 - PH 1 TS 001716 - 06/01/1811 18:00 - H:M:S=0001:11:27 AVG= 2.50 DLT= 0.00
00j0_300025669 - PH 1 TS 001717 - 06/01/1811 18:30 - H:M:S=0001:11:27 AVG= 2.50 DLT= 0.00
00j0_300025669 - PH 1 TS 001718 - 06/01/1811 19:00 - H:M:S=0001:11:27 AVG= 2.50 DLT= 0.00
00j0_300025669 - PH 1 TS 001719 - 06/01/1811 19:30 - H:M:S=0001:11:27 AVG= 2.49 DLT= 0.00
00j0_300025669 - PH 1 TS 001720 - 06/01/1811 20:00 - H:M:S=0001:11:27 AVG= 2.49 DLT= 0.00
00j0_300025669 - PH 1 TS 001721 - 06/01/1811 20:30 - H:M:S=0001:11:27 AVG= 2.49 DLT= 0.00
00j0_300025669 - PH 1 TS 001722 - 06/01/1811 21:00 - H:M:S=0001:11:27 AVG= 2.49 DLT= 0.00
00j0_300025669 - PH 1 TS 001723 - 06/01/1811 21:30 - H:M:S=0001:11:27 AVG= 2.49 DLT= 0.00
00j0_300025669 - PH 1 TS 001724 - 06/01/1811 22:00 - H:M:S=0001:11:27 AVG= 2.49 DLT= 0.00
00j0_300025669 - PH 1 TS 001725 - 06/01/1811 22:30 - H:M:S=0001:11:27 AVG= 2.49 DLT= 0.00
00j0_300025669 - PH 1 TS 001726 - 06/01/1811 23:00 - H:M:S=0001:11:27 AVG= 2.48 DLT= 0.00
00j0_300025669 - PH 1 TS 001727 - 06/01/1811 23:30 - H:M:S=0001:11:27 AVG= 2.48 DLT= 0.00
00j0_300025669 - PH 1 TS 001728 - 07/01/1811 00:00 - H:M:S=0001:11:27 AVG= 2.48 DLT= 0.00
00j0_300025669 - PH 1 TS 001729 - 07/01/1811 00:30 - H:M:S=0001:17:57 AVG= 2.71 DLT=390.79

^ and here, it hangs for an age (or seems to, however cpu load is still around 100%), and it seems to have restarted

<img src="http://boinc.mundayweb.com/cpdn/stats.php/userID:61/trans:off/.png"></img>
ID: 1747 · Report as offensive     Reply Quote
Profile old_user812

Send message
Joined: 12 Aug 04
Posts: 52
Credit: 121,983
RAC: 0
Message 1750 - Posted: 26 Aug 2004, 15:57:28 UTC - in response to Message 1747.  
Last modified: 26 Aug 2004, 15:59:04 UTC

It seemed to run into a problem there. Here is some further output:

00j0_300025669 - PH 1 TS 001727 - 06/01/1811 23:30 - H:M:S=0001:11:27 AVG= 2.48 DLT= 0.00
00j0_300025669 - PH 1 TS 001728 - 07/01/1811 00:00 - H:M:S=0001:11:27 AVG= 2.48 DLT= 0.00
00j0_300025669 - PH 1 TS 001729 - 07/01/1811 00:30 - H:M:S=0001:17:57 AVG= 2.71 DLT=390.79
Model unstable -- timesteps &gt; two minutes CPU time
Model crashed...retrying...restart level 0
Preparing for restart...
Rewinding a model-day...
Starting model ID 00j0_300025669 Phase 1
Waiting for model startup, this may take a minute...
Stack size=64.00 MB
00j0_300025669 - PH 1 TS 007201 - 01/05/1811 00:30 - H:M:S=0005:29:38 AVG= 2.75 DLT= 0.02
00j0_300025669 - PH 1 TS 007202 - 01/05/1811 01:00 - H:M:S=0005:29:49 AVG= 2.75 DLT=10.96
00j0_300025669 - PH 1 TS 007203 - 01/05/1811 01:30 - H:M:S=0005:29:50 AVG= 2.75 DLT= 0.99
00j0_300025669 - PH 1 TS 007204 - 01/05/1811 02:00 - H:M:S=0005:29:51 AVG= 2.75 DLT= 1.00
00j0_300025669 - PH 1 TS 007205 - 01/05/1811 02:30 - H:M:S=0005:29:52 AVG= 2.75 DLT= 1.00
00j0_300025669 - PH 1 TS 007206 - 01/05/1811 03:00 - H:M:S=0005:29:54 AVG= 2.75 DLT= 1.96
00j0_300025669 - PH 1 TS 007207 - 01/05/1811 03:30 - H:M:S=0005:29:55 AVG= 2.75 DLT= 0.99
00j0_300025669 - PH 1 TS 007208 - 01/05/1811 04:00 - H:M:S=0005:30:05 AVG= 2.75 DLT=10.72
00j0_300025669 - PH 1 TS 007209 - 01/05/1811 04:30 - H:M:S=-2147483648:-8:-8 AVG= nan DLT= nan
00j0_300025669 - PH 1 TS 007210 - 01/05/1811 05:00 - H:M:S=0005:30:07 AVG= 2.75 DLT= nan
00j0_300025669 - PH 1 TS 007211 - 01/05/1811 05:30 - H:M:S=0005:30:08 AVG= 2.75 DLT= 1.00
00j0_300025669 - PH 1 TS 007212 - 01/05/1811 06:00 - H:M:S=0005:30:09 AVG= 2.75 DLT= 1.00
00j0_300025669 - PH 1 TS 007213 - 01/05/1811 06:30 - H:M:S=0005:30:10 AVG= 2.75 DLT= 0.99

.. so it looks like it's jumped a few months! DLT is returning values properly now, and it's chugging along...


<img src="http://boinc.mundayweb.com/cpdn/stats.php/userID:61/trans:off/.png"></img>
ID: 1750 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 93,138,107
RAC: 18,299
Message 1754 - Posted: 26 Aug 2004, 16:33:10 UTC

Greetings, fahrenheit,

Are you able to run the viz? I ask because we had some v-e-r-y slow-processing ice-balls (entire earth blue in viz) early in the 'Classic' experiment. (I had two that slowed to ~1/100th of normal speed.) Two meter hailstones and other unearthly phenomena....

At least your Model rewinds. It will do that three times, each time farther then the last, before deciding to upchuck the results to server and get a new Model to process.

Surprisingly, the failed Models can be as useful to the scientists as 'good' runs. (This part of the experiment evaluates parameter ensembles.)

The zero Deltas are strange -- not seen in Alpha or Beta, IIRC. (Perhaps caused by freebsd...?) The large Deltas are used by hadsm3 to determine whether the Model (hadsm3um) crashed.

Cheers.
________________________________________________
Indeed I tremble for my country when I reflect that God is just.
-- Thomas Jefferson
ID: 1754 · Report as offensive     Reply Quote
Profile old_user812

Send message
Joined: 12 Aug 04
Posts: 52
Credit: 121,983
RAC: 0
Message 1763 - Posted: 26 Aug 2004, 17:45:33 UTC - in response to Message 1754.  

Hi

&gt; Are you able to run the viz? I ask because we had some v-e-r-y slow-processing
&gt; ice-balls (entire earth blue in viz) early in the 'Classic' experiment.

I'll check that out when I'm @ home

&gt; At least your Model rewinds. It will do that three times, each time farther
&gt; then the last, before deciding to upchuck the results to server and get a new
&gt; Model to process.

This one wound forward! Or at least it appeared to. It got to here:

00j0_300025669 - PH 1 TS 001728 - 07/01/1811 00:00 - H:M:S=0001:11:27 AVG= 2.48 DLT= 0.00
00j0_300025669 - PH 1 TS 001729 - 07/01/1811 00:30 - H:M:S=0001:17:57 AVG= 2.71 DLT=390.79
Model unstable -- timesteps &gt; two minutes CPU time
Model crashed...retrying...restart level 0

then "rewound" to here - jumping ahead by 5472 timesteps :

Preparing for restart...
Rewinding a model-day...
Starting model ID 00j0_300025669 Phase 1
Waiting for model startup, this may take a minute...
Stack size=64.00 MB
00j0_300025669 - PH 1 TS 007201 - 01/05/1811 00:30 - H:M:S=0005:29:38 AVG= 2.75 DLT= 0.02
00j0_300025669 - PH 1 TS 007202 - 01/05/1811 01:00 - H:M:S=0005:29:49 AVG= 2.75 DLT=10.96
00j0_300025669 - PH 1 TS 007203 - 01/05/1811 01:30 - H:M:S=0005:29:50 AVG= 2.75 DLT= 0.99

Currently, it's at:

00j0_300025669 - PH 1 TS 009477 - 18/06/1811 10:30 - H:M:S=0007:08:39 AVG= 2.71 DLT= 0.00
00j0_300025669 - PH 1 TS 009478 - 18/06/1811 11:00 - H:M:S=0007:08:39 AVG= 2.71 DLT= 0.00
00j0_300025669 - PH 1 TS 009479 - 18/06/1811 11:30 - H:M:S=0007:08:39 AVG= 2.71 DLT= 0.00
00j0_300025669 - PH 1 TS 009480 - 18/06/1811 12:00 - H:M:S=0007:08:39 AVG= 2.71 DLT= 0.00
00j0_300025669 - PH 1 TS 009481 - 18/06/1811 12:30 - H:M:S=0007:08:39 AVG= 2.71 DLT= 0.00

but now I'm back to the pattern of DLT=0.00 for a long time then a single massive value. Sometimes it will crash and restart, but more oftwn it just sits there for a while (a minute) then starts crunching again.

<img src="http://boinc.mundayweb.com/cpdn/stats.php/userID:61/trans:off/.png"></img>
ID: 1763 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 1766 - Posted: 26 Aug 2004, 18:11:14 UTC - in response to Message 1763.  

hmm, that's an odd one, perhaps the Linux emulation is just doing weird things? The models are sensitive things even in the best of conditions, so when people start throwing massively overclocked or emulated PC's who knows what will happen! :-)

PS for astro -- like the TJ quote! still haven't received my mail-in ballot here yet!

ID: 1766 · Report as offensive     Reply Quote
Profile old_user812

Send message
Joined: 12 Aug 04
Posts: 52
Credit: 121,983
RAC: 0
Message 1836 - Posted: 27 Aug 2004, 7:04:33 UTC - in response to Message 1766.  

&gt; hmm, that's an odd one, perhaps the Linux emulation is just doing weird
&gt; things? The models are sensitive things even in the best of conditions, so
&gt; when people start throwing massively overclocked or emulated PC's who knows
&gt; what will happen! :-)

Yeah, I'll knock emulation on the head until CPDN knows about FreeBSD. I don't trust the way its running under emulation. I *could* tweak the makefile so that boinc builds a binary with an acceptable internal name.. what do you think?

BTW the cvs for boinc builds 4.06 - do you know if this will work with CPDN?


<img src="http://boinc.mundayweb.com/cpdn/stats.php/userID:61/trans:off/.png"></img>
ID: 1836 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 1845 - Posted: 27 Aug 2004, 7:55:39 UTC - in response to Message 1836.  

well if you build a freebsd boinc client there's still the problem that it will attach to us and we can't send anything since we don't have freebsd climate models etc. I didn't see the boinc 4.06, it should be fine. I guess I'll go see what were the changes and if it's worth us updating the links.

ID: 1845 · Report as offensive     Reply Quote
Profile Honza
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 390
Credit: 2,475,242
RAC: 0
Message 1846 - Posted: 27 Aug 2004, 8:03:38 UTC - in response to Message 1836.  

&gt; BTW the cvs for boinc builds 4.06 - do you know if this will work with CPDN?

I lost my link to CVS for BOINC - can you post?
ID: 1846 · Report as offensive     Reply Quote
Profile Honza
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 390
Credit: 2,475,242
RAC: 0
Message 1847 - Posted: 27 Aug 2004, 8:05:37 UTC - in response to Message 1845.  

&gt; well if you build a freebsd boinc client there's still the problem that it
&gt; will attach to us and we can't send anything since we don't have freebsd
&gt; climate models etc. I didn't see the boinc 4.06, it should be fine. I guess
&gt; I'll go see what were the changes and if it's worth us updating the links.
&gt;

It would be nice to tell BOINC guys to update BOINC homepage:
- couple of words about 4.0x generation
- and that CPDN is launched under BOINC (prior to SETI and Predict) under version 4.05.
ID: 1847 · Report as offensive     Reply Quote
Profile old_user812

Send message
Joined: 12 Aug 04
Posts: 52
Credit: 121,983
RAC: 0
Message 1901 - Posted: 27 Aug 2004, 17:47:58 UTC - in response to Message 1846.  

&gt; &gt; BTW the cvs for boinc builds 4.06 - do you know if this will work with
&gt; CPDN?
&gt;
&gt; I lost my link to CVS for BOINC - can you post?
&gt;

here ya go:

cvs -d :pserver:anonymous@alien.ssl.berkeley.edu:/home/cvs/cvsroot checkout boinc
<img src="http://boinc.mundayweb.com/cpdn/stats.php/userID:61/trans:off/.png"></img>
ID: 1901 · Report as offensive     Reply Quote
Profile old_user812

Send message
Joined: 12 Aug 04
Posts: 52
Credit: 121,983
RAC: 0
Message 1906 - Posted: 27 Aug 2004, 18:42:06 UTC - in response to Message 1845.  

&gt; well if you build a freebsd boinc client there's still the problem that it
&gt; will attach to us and we can't send anything since we don't have freebsd
&gt; climate models etc. I didn't see the boinc 4.06, it should be fine. I guess
&gt; I'll go see what were the changes and if it's worth us updating the links.

OK I shall build 4.06 on a linux box

Would it be possible to have a POSIX-standard climate model :)
<img src="http://boinc.mundayweb.com/cpdn/stats.php/userID:61/trans:off/.png"></img>
ID: 1906 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 1909 - Posted: 27 Aug 2004, 18:54:56 UTC

I guess if SETI@Home/BOINC is using 4.06 I will upgrade our stuff to that as well (still haven't seen anything "official" yet).

ID: 1909 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : DLT=0.00 ? and then a massive value like 400

©2019 climateprediction.net