climateprediction.net home page
Posts by old_user8065

Posts by old_user8065

1) Message boards : Number crunching : trickles of finished model (Message 41433)
Posted 4 Jan 2011 by old_user8065
Post:
I wonder if any of the researchers managed to look into this issue?
2) Message boards : Number crunching : trickles of finished model (Message 41341)
Posted 22 Dec 2010 by old_user8065
Post:
Could you tell us which model type these orphaned trickles are from?


It's a FAMOUS model. Trickle files are about 4.5kB each.
3) Message boards : Number crunching : trickles of finished model (Message 41338)
Posted 22 Dec 2010 by old_user8065
Post:
OK, I don't have any backup from right before end of model run.

Makes me wonder: what would be so special from BOINC CC point of view when sending trickle files regarding presence of BOINC task that created those trickle files? I'd guess scheduler should accept trickle files and is free to do with them whatever desired.

A couple of trickle files again accumulated on this machine. I manually triggered connection to CPDN and those trickle files uploaded just fine. When I tried to upload a single stale trickle file, connection failed again.

Can I pack trickle files and send them via e-mail somewhere? I'd really like to deliver data, I just don't know how.
4) Message boards : Number crunching : trickles of finished model (Message 41332)
Posted 20 Dec 2010 by old_user8065
Post:
What can one do with trickle files of a finished model?

They got stuck on one of my computers and before I could resolve congestion, model finished, uploaded all the files. When I started to flush trickle heap, finished model got reported with success.

Now I've got something like 149 trickle files that CPDN scheduler seemingly doesn't want to accept. If a running model trickles, that one gets through without any problem. If I try to send a sigle trickle file of the finished model, connection just gets dropped:

20-Dec-2010 16:19:43 [climateprediction.net] update requested by user
20-Dec-2010 16:19:44 [climateprediction.net] Sending scheduler request: Requested by user.
20-Dec-2010 16:19:44 [climateprediction.net] Not reporting or requesting tasks
20-Dec-2010 16:19:44 [---] [http] HTTP_OP::init_post(): http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
20-Dec-2010 16:19:45 [---] [http] [ID#1] Info:  About to connect() to climateapps2.oucs.ox.ac.uk port 80 (#2)
20-Dec-2010 16:19:45 [---] [http] [ID#1] Info:    Trying 163.1.13.17...
20-Dec-2010 16:19:45 [---] [http] [ID#1] Info:  Connected to climateapps2.oucs.ox.ac.uk (163.1.13.17) port 80 (#2)
20-Dec-2010 16:19:45 [---] [http] [ID#1] Sent header to server: POST /cpdnboinc_cgi/cgi HTTP/1.1
20-Dec-2010 16:19:45 [---] [http] [ID#1] Sent header to server: User-Agent: BOINC client (i686-pc-linux-gnu 6.12.8)
20-Dec-2010 16:19:45 [---] [http] [ID#1] Sent header to server: Host: climateapps2.oucs.ox.ac.uk
20-Dec-2010 16:19:45 [---] [http] [ID#1] Sent header to server: Accept: */*
20-Dec-2010 16:19:45 [---] [http] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip
20-Dec-2010 16:19:45 [---] [http] [ID#1] Sent header to server: Content-Type: application/x-www-form-urlencoded
20-Dec-2010 16:19:45 [---] [http] [ID#1] Sent header to server: Content-Length: 15533
20-Dec-2010 16:19:45 [---] [http] [ID#1] Sent header to server: Expect: 100-continue
20-Dec-2010 16:19:45 [---] [http] [ID#1] Sent header to server:
20-Dec-2010 16:19:45 [---] [http] [ID#1] Received header from server: HTTP/1.1 100 Continue
20-Dec-2010 16:19:45 [---] [http] [ID#1] Info:  Expire cleared
20-Dec-2010 16:19:45 [---] [http] [ID#1] Info:  Closing connection #2
20-Dec-2010 16:19:45 [---] [http] HTTP error: Failure when receiving data from the peer
20-Dec-2010 16:19:46 [---] Project communication failed: attempting access to reference site
20-Dec-2010 16:19:46 [---] [http] HTTP_OP::init_get(): http://www.google.com/
20-Dec-2010 16:19:46 [climateprediction.net] Scheduler request failed: Failure when receiving data from the peer


Any thoughts? Should I continue to nurse these orphaned trickle files or just trash them?
5) Message boards : Number crunching : Which upload server? (Message 35599)
Posted 25 Nov 2008 by old_user8065
Post:
Milo, thanks for the response. I\'ll stop to worry about past mishaps and will try to avoid them in future.
6) Message boards : Number crunching : Which upload server? (Message 35565)
Posted 21 Nov 2008 by old_user8065
Post:
Perhaps someone else will correct me, but I\'m not sure how it\'s possible for zip files to go to the wrong upload server. Why do you think this happened?


It\'s a continuation of troubles I was complaining about in this thread. I\'ve found a workaround which involves tunnels and port forwarding to specific upload server and then setting forwarding host as proxy in client_state.xml (I don\'t want to go into details, there might be some corporate IT guy lurking around ). What I overlooked was that at one moment there were a couple of intermediate files waiting to be uploaded and back-off timer of a wrong file expired while the workaround was in place.

As URLs of upload handlers are the same (except for the host names of course), the request went through just fine. AFAIK HTTP servers are allowed to ignore host name part of URL (especially if they are HTTP 1.0) unless they know how to deal with it - either when they act as any kind of proxy servers or when they run in several instances (virtual servers). Somehow I don\'t expect CPDN upload servers to do any of these.
7) Message boards : Number crunching : Which upload server? (Message 35558)
Posted 20 Nov 2008 by old_user8065
Post:
Does it create a problem to project?


I assume it doesn\'t since nobody responded in a week. Or at least it\'s not the biggest problem of the project.
8) Message boards : Number crunching : Which upload server? (Message 35477)
Posted 13 Nov 2008 by old_user8065
Post:
As we can see from project\'s status page there are approximately 5 upload servers. When a client receiiiives work unit (model) to run, upload server for every expected output file is defined. As I can see from client_state.xml, upload server for different output files of the same model can be different.

Due to some problems on my side it may happen that intermediate files get uploaded to wrong upload server (eg. to climateapps3.oucs.ox.ac.uk instead of uploader.oerc.ox.ac.uk or the other way around).

Does it create a problem to project?
9) Message boards : Number crunching : Quad core phenoms (Message 35299)
Posted 17 Oct 2008 by old_user8065
Post:
My (qualified) guess is that it boils down to memory bandwidth needed to run project work units.

If science app uses memory extensively (as it is in case of CPDN and some other memory-demanding projects), there are many cache misses. Which means that CPU needs to access main RAM. This is done through memory controller.

Here comes difference: Intel CPUs until very recently shared common memory controller (on north bridge) so adding up many cores posed a lot of stress to north bridge. It doesn\'t matter the number of physical CPUs vs. number of cores per CPU, it\'s the total number that matters.

Non-vintage AMD processors feature on-chip memory controller shared by all cores on same CPU. Adding up CPU chips actually increased overall RAM bandwidth as more memory controllers get added and with some help of OS\' process scheduler this meant smaller slowdown when running multiple science apps simultaneously[*]. Using many cores per CPU makes things similar to Intel case.

* There\'s still some slow down due to fact that probability that process is executed by one CPU while it\'s memory is physically allocated in RAM controlled by other CPU is non-negligible. That in turn means that there\'s overhead of communication between memory controllers, thus slowdown.

In short: in Intel case one gets same performance when using 2 dual-core CPUs as when using 1 quad-core CPU (given the same CPU frequency and the rest of HW). In AMD case one gets better performance when using 2 dual-core CPUs than when using 1 quad-core given proper OS support and proper RAM modules distribution.

Price tag is a completely different story though. SMP capable main boards and processors tend to be slightly more expensive than non-SMP capable counterparts.
One needs to read previous sentence thinking about number of chip packages and not about number of processors as seen by OS.
10) Message boards : Number crunching : Upload troubles (Message 35160)
Posted 1 Oct 2008 by old_user8065
Post:
Huh, now I officially have another reason for corporate IT rules ...

As it turns out it\'s transparent proxy that is most probably blocking uploads. I have transfered the whole boinc data directory to a host with unobstructed internet connectivity and all the uploads went through without any problem.
11) Message boards : Number crunching : Upload troubles (Message 35149)
Posted 30 Sep 2008 by old_user8065
Post:
The HTTP debug messages suggest that something is delaying or silently blocking your larger transmissions, and running a packet sniffer (I\'d recommend Wireshark) on your system might be the only way to diagnose what\'s going wrong.


Might be. I did a trace using Wireshark and it revealed that after a certain amount of transmitted and ACKed data client starts to get perpetual ACKs of a single IP packet (45 times). Client responds with retransmissions of requested packet to no avail - after 45 ACKs of the same packet it gets single FIN ACK packet. If I can believe Wireshark (I won\'t count manually) this happens after roughly 180kB of data transmitted.

There might be a transparent HTTP proxy running at the border of our corporate network which could cause observed behaviour. I\'m much surprised to see it happen only for one host (5 other sitting on same LAN and thus sharing the same connection path don\'t exhibit the same behaviour while conencting to CPDN main) that otherwise doesn\'t have any problems connecting to other projects servers (apart from CPDN main and CPDN beta). There are many other hosts (mine or colleague\'s) sitting o same LAN that (currently) don\'t run CPDN projects and don\'t show any connectivity problems whatsoever.
12) Message boards : Number crunching : Upload troubles (Message 35142)
Posted 29 Sep 2008 by old_user8065
Post:
Reply to both Ananas\' posts ...

I\'ve managed to upload all trickles for Beta, so there are no pending connections to Beta scheduler. The log messages I posted earlier are fresh, but resemble behaviour from before.

I can not check traceroute as this host is in corporate LAN and ICMP packets get shaped. From the log it is clear that all initial handshaking between client and server occur within single second so I sincerely doubt that there are some long delays on the way.

[edit]
Instead of traceroute uploader.oerc.ox.ac.uk I ran time wget http://uploader.oerc.ox.ac.uk/ and result was 0.547 seconds wall clock time.
[/edit]
13) Message boards : Number crunching : Upload troubles (Message 35128)
Posted 29 Sep 2008 by old_user8065
Post:
Recently one of my hosts started to have troubles uploading intermediate result files. Every connection to upload server times out:

29-Sep-2008 08:43:06 [climateprediction.net] Started upload of hadsm3fub_k0op_005965676_7_1.zip
29-Sep-2008 08:43:06 [---] [http_debug] [ID#3] info: Connection #4 seems to be dead!
29-Sep-2008 08:43:06 [---] [http_debug] [ID#3] info: Closing connection #4
29-Sep-2008 08:43:06 [---] [http_debug] [ID#3] info: About to connect() to uploader.oerc.ox.ac.uk port 80 (#4)
29-Sep-2008 08:43:06 [---] [http_debug] [ID#3] info:   Trying 163.1.124.170...
29-Sep-2008 08:43:06 [---] [http_debug] [ID#3] info: Connected to uploader.oerc.ox.ac.uk (163.1.124.170) port 80 (#4)
29-Sep-2008 08:43:06 [---] [http_debug] [ID#3] Sent header to server: POST /cpdn_cgi/file_upload_handler HTTP/1.1
User-Agent: BOINC client (x86_64-pc-linux-gnu 6.2.14)
Host: uploader.oerc.ox.ac.uk
Accept: */*
Accept-Encoding: deflate, gzip
Content-Type: application/x-www-form-urlencoded
Content-Length: 285

29-Sep-2008 08:43:06 [---] [http_debug] [ID#3] Received header from server: HTTP/1.1 200 OK
29-Sep-2008 08:43:06 [---] [http_debug] [ID#3] Received header from server: Date: Mon, 29 Sep 2008 06:43:06 GMT
29-Sep-2008 08:43:06 [---] [http_debug] [ID#3] Received header from server: Server: Apache/2.2.3 (Linux/SUSE)
29-Sep-2008 08:43:06 [---] [http_debug] [ID#3] Received header from server: Transfer-Encoding: chunked
29-Sep-2008 08:43:06 [---] [http_debug] [ID#3] Received header from server: Content-Type: text/plain
29-Sep-2008 08:43:06 [---] [http_debug] [ID#3] Received header from server:
29-Sep-2008 08:43:06 [---] [http_xfer_debug] HTTP: wrote 98 bytes
29-Sep-2008 08:43:06 [---] [http_debug] [ID#3] info: Connection #4 to host uploader.oerc.ox.ac.uk left intact
29-Sep-2008 08:43:07 [---] [http_debug] [ID#4] info: Re-using existing connection! (#4) with host uploader.oerc.ox.ac.uk
29-Sep-2008 08:43:07 [---] [http_debug] [ID#4] info: Connected to uploader.oerc.ox.ac.uk (163.1.124.170) port 80 (#4)
29-Sep-2008 08:43:07 [---] [http_debug] [ID#4] Sent header to server: POST /cpdn_cgi/file_upload_handler HTTP/1.1
User-Agent: BOINC client (x86_64-pc-linux-gnu 6.2.14)
Host: uploader.oerc.ox.ac.uk
Accept: */*
Accept-Encoding: deflate, gzip
Content-Type: application/x-www-form-urlencoded
Content-Length: 3305253
Expect: 100-continue

[color=red]29-Sep-2008 08:43:07[/color] [---] [http_debug] [ID#4] Received header from server: HTTP/1.1 100 Continue
[color=red]29-Sep-2008 08:48:15[/color] [---] [http_debug] [ID#4] info: Operation too slow. Less than 10 bytes/sec transfered the last 300 seconds
29-Sep-2008 08:48:15 [---] [http_debug] [ID#4] info: Expire cleared
29-Sep-2008 08:48:15 [---] [http_debug] [ID#4] info: Closing connection #4
29-Sep-2008 08:48:15 [---] [http_debug] HTTP error: Timeout was reached
29-Sep-2008 08:48:15 [---] Project communication failed: attempting access to reference site
29-Sep-2008 08:48:15 [---] [http_debug] HTTP_OP::init_get(): http://www.google.com
29-Sep-2008 08:48:15 [climateprediction.net] Temporarily failed upload of hadsm3fub_k0op_005965676_7_1.zip: HTTP error
29-Sep-2008 08:48:15 [climateprediction.net] Backing off 40 min 58 sec on upload of hadsm3fub_k0op_005965676_7_1.zip


All times are in UTC+02.

Anything I can do about it?
14) Message boards : Number crunching : Forced download (Message 30146)
Posted 23 Aug 2007 by old_user8065
Post:
I\'m sure that I am not the only one confused by the references to alternating of app\'s according to the setting of General Prefs. I\'m running both Seti@Home and CPDN on my C2D (nominally one per core). Because I did not have Seti running when I downloaded CPDN I got 2 models. Both Seti and CPDN are set for 100% of CPU time. I have tried leaving it crunching for a number of days and it happily continues to crunch one CPDN model on one core and the allocated Seti WU\'s on the other with no sign of any switching (either one app taking over both cores or between CPDN models on the \"CPDN core\"). To ensure that neither CPDN model be chopped for not responding I have resorted to periodically suspending the running model (usually shortly after a trickle) to allow the dormant model to kick off but I would prefer not to have to resort to manual intervention. No doubt, if I left things alone, the second model would eventually get to a state where EDF would kick it off but that would mean leaving it dormant for several months.


Wording about resource share being set as percent is unfortunate. For one you can set it to a value well higher than 100, which doesn\'t really make any sense if it was percent. What you really do is set resource share and one shoud consider it with consideration of all projects. Relative resource share is then calculated as follows:

project_share (in percent) = this_project_setting / ( sum_of_settings_for_all_projects_this_machine_participates_in ) * 100%


In your case this yields in 50% for S@H and 50% for CPDN.

Now, when considering what it really means for particular machine, one has to consider also number of cores/CPUs BOINC is allowed to use. Normally it would equal to number of cores/CPUs present in machine, but user can lower it as wished.

The resource share is then observed by BOINC client scheduler (eg. BOINC programme that makes decission about what project and task to run next) on long term basis. This is done through leveraging long term debts.

In your case in simple words: BOINC makes sure that in a week time both projects (S@H and CPDN) will get each half of available CPU seconds. Available as in: whatever idle \'process\' would consume otherwise. BOINC tries not to steal CPU cycles from other processes that run in your machine.

So in your case with two cores it\'s only natural to run one project on one core and second project on other core, both all the time and both get 50% of CPU cycles available.

If, for example, you\'d decide that you want to give CPDN more CPU cycles and you\'d set resource share for CPDN to 200, then this would mean that S@H is supposed to get 33% of CPU cycles and CPDN 67%. A way to achieve this is that there\'s a CPDN WU running on one core all the time while on the other core another CPDN WU would alternate with S@H WU every now and then (depending on setting of switch between applications every). If you set it to default value of 1 hour, it\'ll run two CPDN WUs for an hour, will suspend one CPDN WU and start(resume) a S@H WU and run it for an hour. In two hours CPDN will accumulate 180 minutes of CPU time and S@H will accumulate 60 minutes of CPU time. With ratio of 3:1 rather than desired 2:1 it will continue to run S@H WU for another hour when accumulated CPU time ratio will be 2:1 (240 minutes vs. 120 minutes). At this point it might decide to run the same setup for another hour thus reaching ratio of 5:3 (vs. desired 6:3) at which point it would suspend S@H WU and resume (the previously suspended) CPDN WU.

As you can see from above it really doesn\'t make any sense to suspend one CPDN WU to run the other CPDN WU - resource shares are observed on per project basis and application switching normally only occurs to stick to resource shares division.

In event of the other CPDN WU starting to get late (deadline is getting close), then the whole BOINC client will enter Earliest Deadline First mode. I\'m not entirely sure about behaviour of BOINC CC in this mode as it differs between major versions of BOINC significantly. My observation about recent versions is the following: BOINC will temporarily disobey settings about resource shares and will run WUs (from any project) with shortest deadlines. It will continue to do so until it decides that matching a deadline is no longer in danger. It will also suspend work fetch until in EDF.

In case of mixing S@H and CPDN with S@H deadlines much shorter than those of CPDN it means that if CPDN WU gets into danger of missing deadline, it\'ll first run down S@H WU cache and when S@H WUs are done, it\'ll run CPDN exclusively. At first sight it\'s a bit counter intuitive, but makes sense: if it suspended all S@H WUs and run CPDN for a week, all S@H WUs would also start to get late ...

To summarize: you don\'t need to suspend one CPDN WU in order to get the other some CPU time. CPDN scheduler won\'t panic about it for a couple of months. Your \'first\' CPDN WU will probably get done well before that. If BOINC on your machine doesn\'t get enough CPU cycles, it\'ll get into EDF and will stop fetching new S@H work for a while and will concentrate on CPDN - and will almost certainly get itself out of troubles without your care about it.

[edit] Can\'t tipe [/edit]
15) Message boards : Number crunching : Forced download (Message 30116)
Posted 22 Aug 2007 by old_user8065
Post:
You now have a choice:
1) Unsuspend old model to allow it to finish.
2) Leave new model running while you\'re away with the old model still suspended, and finish the old model when you get back, then continue with the new model.

Using the first method will mean that both models will alternate at the rate specified in your General prefs, usually about one hour.


My experience goes a bit different: if BOINC client is attached to one project and has more than one WU in queue, then it\'ll run the one received earliest to the end. Then it\'ll start another one (received second earliest), etc. It\'ll only run WU received at some later time iff it runs into some kind of timing problem (EDF or some such): then it\'ll run WU with earliest deadline.

Alternation only occurs between different projects.

In GW3PRV\'s case, if he lets new WU to start, then BOINC will probably run the second WU until either BOINC gets restarted or BOINC needs to run benchmarks or BOINC gets into EDF mode. Most probably the benchmarking would happen first.

I\'d generally follow Les\' advice. I\'d just unsuspend the old WU right after BOINC client gets assigned new WU but before all the needed files for new WU get downloaded (which can take some time if one doesn\'t have broadband). This prevents BOINC from immediately starting the new WU. This is not really an issue, just saves some disk space.
16) Message boards : Number crunching : Support for x86_64-unknown-linux-gnu (Message 24110)
Posted 28 Aug 2006 by old_user8065
Post:
Dear project management,

I know it\'d be just too much to natively support 64-bit port of Linux. Most probably there wouldn\'t even be any gain in processing speed. However, it would be really nice to support the named platform in a backward-compatible mode: if BOINC client requests work and quoting host platform as x86_64-unknown-linux-gnu, CPDN server could regard it as i686-pc-linux-gnu and act accordingly.

Is it possible?
17) Message boards : Number crunching : New CPDN Software Version (5.15) On Site (Message 23901)
Posted 11 Aug 2006 by old_user8065
Post:
This version will also be able to run shorter workunits (whenever we make some!) -- i.e. the 160-year workunits are a bit much so we will probably go to 80-year (1920-2000 and 2000-2080) and even 40-year runs (starting 1960, 2000, 2040) in the future!


Will it be posible to opt for particular length of model runs? Preferably on per-machine basis...

I\'ve got a couple of slightly lower speed machines (2GHz+ P4) which take ages to complete a full 160-year model run. I\'d be more than happy to run CPDN on those, but I\'d prefer to have WUs that take less than 3 months to complete. I\'d run SAP experiment on those, but unfortunately they tend to have less than 1Gig of RAM :(

On the other hand I\'m willing to run the long model runs on some other (faster) machines.
18) Message boards : Number crunching : CPDN client for glibc2.2 (Message 17659)
Posted 2 Dec 2005 by old_user8065
Post:

You\'ll just have to shoot those computers. :(


ack! looks like new OS in the new year - hmm Debian?


I\'m heading for the ultimate solution - replace those old ones (which are too slow to finnish sulphur runs in decent time anyway) with new ones - installing Debian.

I\'ll replace one this month, but I don\'t know about the other one. I can\'t afford down time to re-install it from scratch. I might be forced to detach it from CPDN :(
19) Message boards : Number crunching : CPDN client for glibc2.2 (Message 17215)
Posted 16 Nov 2005 by old_user8065
Post:
Back to my problem ... this time for sulphur ...

Here\'s output from ldd:

ldd sulphur_4.21_i686-pc-linux-gnu
./sulphur_4.21_i686-pc-linux-gnu: /lib/i686/libc.so.6: version `GLIBC_2.3\' not found (required by ./sulphur_4.21_i686-pc-linux-gnu)
libm.so.6 => /lib/i686/libm.so.6 (0xb7fb5000)
libdl.so.2 => /lib/libdl.so.2 (0xb7fb1000)
libpthread.so.0 => /lib/i686/libpthread.so.0 (0xb7f9d000)
libc.so.6 => /lib/i686/libc.so.6 (0x42000000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0xb7fed000)

Any hope for statical build?

OTOH: is it possible to define a box not to fetch sulphur experiment? The problematic one is a 2-way P3@1GHz which will take forever to complete sulphur experiment...
20) Questions and Answers : Getting started : BOINC client version 5 (Message 15565)
Posted 31 Aug 2005 by old_user8065
Post:
Berkeley released new development version of BOINC core client v5.1.1. Does CPDN scheduller work with the new BOINC client since major version has changed?

I'd like to test the new version, but I have to be sure that it's compatible with CPDN.
<br /><b><i>Metod &#8658; </i></b><img src="http://www.boincstats.com/stats/banner.php?cpid=d81fe5682edc73d666e4be1537129ed2" />


Next 20

©2024 climateprediction.net