climateprediction.net home page
Posts by old_user26469

Posts by old_user26469

1) Questions and Answers : Unix/Linux : Trickles going through, but uploads not (Message 13567)
Posted 18 Jun 2005 by old_user26469
Post:
> Hey, good news. Glad you're getting somewhere. Linux and Mac users usually end
> up with problems and no answers.

:)

Anyway, uploads are working OK now (I've diked that broken nameserver out until I figure out what's wrong with it; you get used to weird bugs like this when running diverse platforms).

Arguably the HTTP proxy in use should not be part of the client state: but that's a BOINC thing, not anything CPDN can do anything about.

2) Questions and Answers : Unix/Linux : Trickles going through, but uploads not (Message 13554)
Posted 18 Jun 2005 by old_user26469
Post:
> I'll do a packet capture shortly and see what's going on at that level.

Got it. client_state.xml has preserved the name of an old HTTP proxy (renamed and moved some time ago), and boinc is using it in preference to the value of $HTTP_PROXY (which is in any case unset, as I don't want boinc to use a proxy anymore).

... and now it's trying to upload to Bern and only one of my nameservers can see it:

hades 32 ~% dig @192.168.14.14 phkup21.unibe.ch

; > DiG 9.3.1rc1 > @192.168.14.14 phkup21.unibe.ch
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER> DiG 9.3.1rc1 > @192.168.14.16 phkup21.unibe.ch
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER
3) Questions and Answers : Unix/Linux : Trickles going through, but uploads not (Message 13551)
Posted 18 Jun 2005 by old_user26469
Post:
> I'm not a Linux person, but the usual answers seem to be to check the firewall
> / proxy server settings.
> Otherwise, what has changed since the previous model? Hardware, software, ISP,
> etc.

Proxy server: none needed. Firewall, ISP, hardware: no change. OS: a minor kernel upgrade (2.6.10 -> 2.6.11.10), but that's all.

I'll do a packet capture shortly and see what's going on at that level.
4) Questions and Answers : Unix/Linux : Trickles going through, but uploads not (Message 13358)
Posted 12 Jun 2005 by old_user26469
Post:
> Is your model trying to upload to Oxford,
> cpdn-upload1.comlab.ox.ac.uk/cpdn_cgi/file_upload_handler

http://cpdn-upload1.comlab.ox.ac.uk/cpdn_cgi/file_upload_handler, I'm afraid.

I still think it must be my end, but what could it be?
5) Questions and Answers : Unix/Linux : Trickles going through, but uploads not (Message 13340)
Posted 11 Jun 2005 by old_user26469
Post:
So the day before yesterday I finished a run (with BOINC 4.19 on Linux). It\'s trickling fine (host 60701), but it\'s not uploading:

2005-06-09 17:08:08 [climateprediction.net] Temporarily failed upload of 2kui_200141547_1_1.zip
2005-06-09 17:08:08 [climateprediction.net] Backing off 1 minutes and 0 seconds on transfer of file 2kui_200141547_1_1.zip
2005-06-09 17:08:08 [climateprediction.net] Started upload of 2kui_200141547_1_3.zip
2005-06-09 17:08:23 [climateprediction.net] Temporarily failed upload of 2kui_200141547_1_2.zip
2005-06-09 17:08:23 [climateprediction.net] Backing off 1 minutes and 0 seconds on transfer of file 2kui_200141547_1_2.zip

and it\'s still waiting:

2005-06-11 17:59:59 [climateprediction.net] Temporarily failed upload of 2kui_200141547_1_1.zip
2005-06-11 17:59:59 [climateprediction.net] Backing off 3 hours, 54 minutes, and 7 seconds on transfer of file 2kui_200141547_1_1.zip

Yet scheduler RPCs are succeeding fine:
2005-06-11 07:04:49 [climateprediction.net] Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
2005-06-11 07:05:51 [climateprediction.net] Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded

Is this my end, or the remote end? If it\'s the remote end, that\'s OK: all I need to do is wait... but I fear it isn\'t, as i haven\'t seen anyone else mention this problem.
6) Questions and Answers : Unix/Linux : Is it uploaded or not? Things are unclear. (Message 7511)
Posted 23 Jan 2005 by old_user26469
Post:
> The five summary files were successfully uploaded. (Don't know why there were
> temporary failures but are probably related to server connection.)

At the very instant it reported failures with some of them, it was uploading others to the same place. Perhaps the upload server was overloaded and throttling connections, or something?

> The other 365 zipped files are stored on your computer for possible future
> use. (You can burn them to CD-R or DVD-R if you choose, or delete them for the

I have hundreds of unused CD-Rs: this seems like an excellent use for a few of them.

> It is
> recommended that you move the zip'd folder to an Archive-folder outside your
> boinc folder; that permits faster backups of the boinc folder.)

I must confess that I classify the BOINC stuff as `transient' and don't back it up in my regular schedule at all. (Everything's RAIDed anyway, so the chances of hardware failure eating anything are minimal.)

> It will take awhile for the "success" post to occur in your stats, but the
> credits should already be posted.

I've never really seen the point of credit-hunting: I'm doing this because as long as the machines are sitting idle they may as well be doing something to help people ameliorate the effects of all the power they're using. :)

> Edit: Oops, sorry, George, your post arrived while I was doing my hunt-n-peck
> typing thing. Didn't mean to step on you.

:)
7) Questions and Answers : Unix/Linux : Is it uploaded or not? Things are unclear. (Message 7504)
Posted 23 Jan 2005 by old_user26469
Post:
Well, my first upload's just finished on this i686 Linux box. Perhaps.

The total set of messages emitted during phase completion/upload (with interweaved junk from the next run removed) was:

Phase over, going into post_processing()
In pre_initialise_phase (part 1 of 3)
In initialise_phase (part 2 of 3)
Calculating global means for files .pa|.x3|.nc
Calculating regional means for .pa|.x3|.nc
Calculating global means for files .pd|.x3|.nc
Calculating regional means for .pd|.x3|.nc
Calculating global means for files .pe|.x3|.nc
Calculating regional means for .pe|.x3|.nc
Calculating global means for files .pf|.x3|.nc
Calculating regional means for .pf|.x3|.nc
2005-01-23 17:50:55 [climateprediction.net] Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
2005-01-23 17:50:55 [climateprediction.net] Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded
Calculating global means for files .pg|.x3|.nc
Calculating regional means for .pg|.x3|.nc
Post-processing successful!
Finished a complete run, now you can upload!
adding: 20qfaa.pa.gmts.x1.nc (deflated 34%)
[413 lines removed]
adding: 20qfca.phq6c10.x3.nc (deflated 57%)
adding: 20qfca.pw.8yac.x3.nc (deflated 17%)
adding: restart.day (deflated 47%)
2005-01-23 17:54:24 [climateprediction.net] Computation for result 20qf_100115223 finished
2005-01-23 17:54:25 [climateprediction.net] Starting result 35gz_100168540_1 using hadsm3 version 4.04
2005-01-23 17:54:25 [climateprediction.net] Started upload of 20qf_100115223_1_1.zip
2005-01-23 17:54:25 [climateprediction.net] Started upload of 20qf_100115223_1_2.zip
2005-01-23 17:56:26 [climateprediction.net] Temporarily failed upload of 20qf_100115223_1_2.zip
2005-01-23 17:56:26 [climateprediction.net] Backing off 1 minutes and 0 seconds on transfer of file 20qf_100115223_1_2.zip
2005-01-23 17:56:26 [climateprediction.net] Started upload of 20qf_100115223_1_3.zip
2005-01-23 17:56:30 [climateprediction.net] Finished upload of 20qf_100115223_1_1.zip
2005-01-23 17:56:30 [climateprediction.net] Throughput 24681703 bytes/sec
2005-01-23 17:56:30 [climateprediction.net] Started upload of 20qf_100115223_1_4.zip
2005-01-23 17:58:26 [climateprediction.net] Temporarily failed upload of 20qf_100115223_1_3.zip
2005-01-23 17:58:26 [climateprediction.net] Backing off 1 minutes and 0 seconds on transfer of file 20qf_100115223_1_3.zip
2005-01-23 17:58:26 [climateprediction.net] Started upload of 20qf_100115223_1_5.zip
2005-01-23 17:58:31 [climateprediction.net] Temporarily failed upload of 20qf_100115223_1_4.zip
2005-01-23 17:58:31 [climateprediction.net] Backing off 1 minutes and 0 seconds on transfer of file 20qf_100115223_1_4.zip
2005-01-23 17:58:31 [climateprediction.net] Started upload of 20qf_100115223_1_2.zip
2005-01-23 17:58:44 [climateprediction.net] Finished upload of 20qf_100115223_1_5.zip
2005-01-23 17:58:44 [climateprediction.net] Throughput 18624993 bytes/sec
2005-01-23 17:59:07 [climateprediction.net] Finished upload of 20qf_100115223_1_2.zip
2005-01-23 17:59:07 [climateprediction.net] Throughput 35330073 bytes/sec
2005-01-23 17:59:26 [climateprediction.net] Started upload of 20qf_100115223_1_3.zip
2005-01-23 17:59:31 [climateprediction.net] Started upload of 20qf_100115223_1_4.zip
2005-01-23 17:59:55 [climateprediction.net] Finished upload of 20qf_100115223_1_4.zip
2005-01-23 17:59:55 [climateprediction.net] Throughput 26055248 bytes/sec
2005-01-23 18:00:06 [climateprediction.net] Finished upload of 20qf_100115223_1_3.zip
2005-01-23 18:00:06 [climateprediction.net] Throughput 44871253 bytes/sec

Leaving aside the utterly insane bandwidth figures (45Mb/s? On a 22Kb/s-upload ADSL line?) and the gratitous temporary failures in the absence of network problems (is there a really short timeout somewhere?), 90% of the generated zip files haven't been uploaded, and are still sitting on my disk in the projects/climateprediction.net/20qf_100115223 subdirectory.

The server's stats for result 420728 still say `Server state: In progress / Client state: Unknown'.

What's going on? Are the heaps of non-uploaded files needed? Is the upload actually complete? Neither the client nor the server have given me any reason to believe that, but nothing's been uploaded for half an hour now.

Am I jumping at shadows?
8) Questions and Answers : Unix/Linux : 4.13 (Message 6874)
Posted 12 Dec 2004 by old_user26469
Post:
> if you ran the process in the background, then you have to
> manage it manually. do a ps -aux | grep boinc*

That should be either `ps -ef' or `ps aux'.
9) Questions and Answers : Unix/Linux : Model run lost due to extreme reaction to minor network problems (Message 6723)
Posted 7 Dec 2004 by old_user26469
Post:
> My initial guess is that the model was at a critical point or in the middle of
> writing to a file.

Curses. What bad luck.

> NFS I/O is expensive and this would have been more obvious when writing large
> amt. of data.

It didn't die at a day-end or a month-end, as near as I can tell: it was probably one of the big I/O spikes during the lengthy (radiosity?) computations that got interrupted.

I've cleared enough space up to avoid using NFS for the BOINC state.

> Fault recovery is not dictated by the BOINC core client but by the worker
> process.

Ah. OK, the worker died. Very well; there's not much any of us can do about that. (The error response from BOINC itself was amusing: something about the disk possibly being full, followed by, er, downloading a new model; surely if the disk were full this wouldn't be a very good thing.

`The disk is full.' *WHAM* `Well, it's fuller now!'

> The original model might appear to be ok but its probably in an inconsistent
> state by now.

Curses. :(

> The partial results ( P1 & p2 ) are available to you & us and you'll
> still earn credits for model
> years acquired until the crash..

OK, so I'll archive the intermediate model data just as if it were OK then.

(I take it the dead model directory can be removed from the CPDN tree after archiving... nothing seems to be referencing it anymore, after all.)
10) Questions and Answers : Unix/Linux : Model run lost due to extreme reaction to minor network problems (Message 6675)
Posted 7 Dec 2004 by old_user26469
Post:
I just lost a 90%-completed model run (my first) because BOINC's response to a temporary NFS-failure-induced I/O error caused by a machine reboot is to kill the model and try to download another one. The original model is perfectly OK, but BOINC refuses to process it anymore.

Might I suggest that this is a slightly excessive reaction to one and a half minutes without network service? CPDN requires vast amounts of disk space, and there are probably many sites where the BOINC directories must be NFS-mounted if BOINC is to run at all: but guaranteeing zero downtime of NFS servers over a period of months is entirely impractical.

BOINC recovers from local systems failures: it should recover from remote ones as well.
11) Questions and Answers : Unix/Linux : Change of kernel (Message 6198)
Posted 17 Nov 2004 by old_user26469
Post:
If you shut down the client cleanly, you'll lose nothing.

If you shut it down by forcibly killing it, you'll lose a few minutes' work (it saves at the start of every day --- and at the start of every month, as well, into a separate file, although I'm not quite sure why.)
12) Questions and Answers : Unix/Linux : viz 4.04 segfault immediately after mapping window (Message 5579)
Posted 23 Oct 2004 by old_user26469
Post:
Obviously the solution is

% LD_LIBRARY_PATH=. ./hadsm3viz_4.04_i686-pc-linux-gnu

i.e., use the copies of libGLU and libglut that ship with climateprediction.net (and which I hadn't noticed were there.)

Everything then works. (Sorry for the noise.)
13) Questions and Answers : Unix/Linux : viz 4.04 segfault immediately after mapping window (Message 5573)
Posted 23 Oct 2004 by old_user26469
Post:
nix@hades 447 .../.boinc/projects/climateprediction.net% ./hadsm3viz_4.04_i686-pc-linux-gnu
searching for active shmem
using model_id 31kk_100163435
Segmentation fault

This might be a C++ ABI problem:

libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0x4eee3000)
[...]
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x4ec42000)

(Everything on this Athlon IV system including GLUT is built with GCC 3.4.2 and is using the corresponding version of libstdc++, libstdc++.so.6; viz seems to be built with GCC 3.3.2, with a slightly different C++ ABI and version of listdc++, and is using things like std::basic_string which have changed internal representation between libstdc++5 and libstdc++6, and calling GLUT...)


Some rudimentary valgrind results:

nix@hades 453 .../.boinc/projects/climateprediction.net% valgrind --tool=memcheck ./hadsm3viz_4.04_i686-pc-linux-gnu
==1239== Memcheck, a memory error detector for x86-linux.
==1239== Copyright (C) 2002-2004, and GNU GPL\'d, by Julian Seward et al.
==1239== Using valgrind-2.2.0, a program supervision framework for x86-linux.
==1239== Copyright (C) 2000-2004, and GNU GPL\'d, by Julian Seward et al.
==1239== For more details, rerun with: -v
==1239==
searching for active shmem
using model_id 31kk_100163435
==1239== Syscall param ioctl(generic) contains uninitialised or unaddressable byte(s)
==1239== at 0x44DFE174: ioctl (in /lib/libc-2.3.2.so)
==1239== Address 0x52BFD7B0 is on thread 1\'s stack
==1239==
==1239== Invalid read of size 2
==1239== at 0x1BBE23BE: sigfpe_handler (in /usr/X11R6/lib/modules/dri/mach64_dri.so)
==1239== Address 0x6E is not stack\'d, malloc\'d or (recently) free\'d
==1239==
==1239== Process terminating with default action of signal 11 (SIGSEGV)
==1239== Access not within mapped region at address 0x6E
==1239== at 0x1BBE23BE: sigfpe_handler (in /usr/X11R6/lib/modules/dri/mach64_dri.so)
==1239==
==1239== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 36 from 1)


I think the first error (the ioctl() problem) is most immediately likely to be at fault.

But it might also be a good idea to build a copy of viz (at least) and preferably the other C++ parts of the system with GCC-3.4.x.




©2024 climateprediction.net