climateprediction.net home page
Posts by old_user1204

Posts by old_user1204

1) Questions and Answers : Macintosh : Error on upload, will this/or did this cause a problem? (Message 6140)
Posted 16 Nov 2004 by old_user1204
Post:
> As long as the final trickle (phase 3, timestep 259248) went through, and
> there are no "result#_[1-5].zip" files in your boinc/climateprediction.net
> directory, it's probably OK.

The final trickle did indeed make it and there are no result*.zip files
anywhere so it looks like it's OK.

Thanks for the help. I just wanted to make sure that all those months
of computing made it to y'all. :)
2) Questions and Answers : Macintosh : Error on upload, will this/or did this cause a problem? (Message 6035)
Posted 11 Nov 2004 by old_user1204
Post:
<P>I also have this problem. I just finished a run with BOINC client 4.05 and the
<A HREF="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=26225">Result-ID #26225</A> shows...</P>
<PRE>
zip warning: Too many open files
zip warning: could not open for reading: 01dwba.ph34c10.x2.nc
zip warning: zip file empty
zip I/O error: Too many open files

zip error: Temporary file failure (zi3wIeEY)
</PRE>
<P>Indeed, one of the files "01dwba.ph36c10.x2.nc" didn't zip up correctly...</P>
<PRE>
G4 /Applications/BOINC-CPDN/projects/climateprediction.net/01dw_300026781 $ ll 01dwba.ph3*
-rw-r----- 1 strick admin 242743 8 Nov 04:25 01dwba.ph30c10.x2.nc.zip
-rw-r----- 1 strick admin 242777 8 Nov 04:25 01dwba.ph31c10.x2.nc.zip
-rw-r----- 1 strick admin 242844 8 Nov 04:25 01dwba.ph32c10.x2.nc.zip
-rw-r----- 1 strick admin 242817 8 Nov 04:25 01dwba.ph33c10.x2.nc.zip
-rw-r----- 1 strick admin 22 8 Nov 04:25 01dwba.ph34c10.x2.nc.zip
-rw-r----- 1 strick admin 264936 13 Oct 12:38 01dwba.ph36c10.x2.nc
</PRE>
<P>Is there a way to re-zip &amp; re-upload the results? Do I need to change anything in my setup (like upgrade to a newer version of the BOINC client? I could use a clue fix here. :) </P>
3) Questions and Answers : Macintosh : Cannot get a model to run (Message 2785)
Posted 3 Sep 2004 by old_user1204
Post:
<P>Looks like the model failed right out of the gate. The messages do show that it uploaded the failure results back. So in a little while your log should show boinc downloading another model to run. My question is does the new model run or does it fail in the same manner?</P>
<P>Also check the stderr_um.txt file for the failed model to see if it gave any error messages. Should be in &lt;Dir-where-BOINC-is&gt;/projects/climateprediction.net/1pmo_000100690/stderr_um.txt, that might give a clue.</P>
BCNU,<BR>
Vance
4) Questions and Answers : Macintosh : Disappearing run. Diagnostics? (Message 1884)
Posted 27 Aug 2004 by old_user1204
Post:
<P>Thanks for your help! The <I>stderr_um.txt</I> and <I>stdout_um.txt</I> files for the <I>00c4</I> project were both zero length. The ps listing showed </P>
<PRE>
G4 /Applications/BOINC-CPDN/projects/climateprediction.net/00c4_300025421 $ ps aux | grep had
strick 925 96.2 3.4 93344 45132 p1 RN Wed06PM 2220:52.21 hadsm3um_4.03_powerpc-apple-darwin 24090 912
strick 912 0.0 0.1 30216 1168 p1 SN Wed06PM 0:18.29 hadsm3_4.03_powerpc-apple-darwin 00c3_300025420
strick 913 0.0 0.1 30216 1164 p1 SN Wed06PM 0:11.18 hadsm3_4.03_powerpc-apple-darwin 00c4_300025421
strick 924 0.0 0.0 0 0 p1 ZN 31Dec69 0:00.00 (hadsm3um_4.03_po)
strick 2434 0.0 0.0 18172 340 std S+ 8:37AM 0:00.01 grep had
</PRE>
<P>So it appears that the <I>00c4</I> run did indeed die off, probably the zombie process pid 924 above. Which makes me wonder why pid 913, which was probably the parent process, didn't catch the child process exit status? If it <I>wait()</I>'ed appropriately the child should have been cleaned up. Odd.</P>
<P>Anyway, doing a CTRL-C shutdown everything cleanly and on restart the log showed, yes indeed, the <I>00c4</I> model had crashed. Then it uploaded the results to y'all and downloaded a new run.</P>
<PRE>
Starting model ID 00c4_300025421 Phase 1
Waiting for model startup, this may take a minute...
Stack size=48.00 MB
00c4_300025421 - PH 1 TS 007633 - 00/00/0000 00:00 - H:M:S=0011:53:10 AVG= 5.61 DLT= 0.00
Model crashed...retrying...restart level 2
Preparing for restart...
Rewinding a model-year...
Error: Restart files for dataout/restart.year not found
Giving up, this result exceeded crash count for available restart files.
... entries about zipping up files...
2004-08-27 08:47:40 [climateprediction.net] Unrecoverable error for result 00c4_300025421_0 (process exited with code 25
1 (0xfb))
2004-08-27 08:47:40 [climateprediction.net] Unrecoverable error for result 00c4_300025421_0 (process exited with code 25
1 (0xfb))
2004-08-27 08:47:40 [climateprediction.net] Computation for result 00c4_300025421 finished
2004-08-27 08:47:40 [climateprediction.net] Started upload of 00c4_300025421_0_1.zip
...
</PRE>
<P>So that's a wrap. Thank you very much for helping me diagnose this. Things are on-track and crunching away again.</P>
BCNU,<BR>
Vance
5) Questions and Answers : Macintosh : Disappearing run. Diagnostics? (Message 1794)
Posted 26 Aug 2004 by old_user1204
Post:
<P>I was running 2 runs on <B>boinc_4.05_powerpc-apple-darwin</B> on a PowerMac dual 1.25 GHz G4, 1.25 GB RAM MacOS 10.3.5. Runs started around 2004-08-25 18:54:30 for <I>00c3_300025420_0 using hadsm3 version 4.03</I> and <I>00c4_300025421_0 using hadsm3 version 4.03</I>.</P>
<P>Checking after coming home today 8/26, it appears that run 00c4 has disappeared. The thing is I don\'t know what diagnostics to look for to see <B>why</B> it disappeared? The log file for that model shows...</P>
<PRE>
00c4_300025421 - PH 1 TS 007633 - 10/05/1811 00:30 - H:M:S=0011:53:10 AVG= 5.61 DLT= 2.69
00c4_300025421 - PH 1 TS 007634 - 10/05/1811 01:00 - H:M:S=0011:53:23 AVG= 5.61 DLT=13.27
00c4_300025421 - PH 1 TS 007635 - 10/05/1811 01:30 - H:M:S=0011:53:25 AVG= 5.61 DLT= 1.94
00c4_300025421 - PH 1 TS 007636 - 10/05/1811 02:00 - H:M:S=0011:53:27 AVG= 5.61 DLT= 1.95
</PRE>
<P>Then nothing else, no messages or errors just that run stoped reporting. Kicking in viz on that run shows a blue planet. Checking my account on the website shows no status for the 00c4 run. So 2 questions: 1) How do I determine if this was just a \"normal\" failed model or something else (like a bug)? That is, how do I diagnose this? 2) How do I get boinc to report home to y\'all about 00c4 status or will it just do that on it\'s own in time and then download a new model?
</P>
<P>Thanks for your time.<br>
BCNU,<br>
Vance</P>




©2024 climateprediction.net