climateprediction.net home page
Posts by Ananas

Posts by Ananas

61) Message boards : Number crunching : No Data For Result (Message 37573)
Posted 29 Jul 2009 by Profile Ananas
Post:
If it is an dlopen() error from the program source code, would ldd even report it? I think it would not as it cannot know the first argument to the function.

Does the user who runs BOINC have read/execute on all library files that come with BOINC and CPDN?

As "only" the data trickle is missing, my guess would be some zlib thing to look for.


If this is a common problem, it would even be possible that everyone who already has crunched old models has that missing library from earlier downloads, but those who started with HadAM3p never received it.


edit : There are quite a few results with this problem (Google finds 246 entries) :

CPDN Monitor - Quit request from BOINC...
Unable to load library hadam3p_se_6.07_i686-apple-darwin.dylib
dlopen error: 3152085
17:09:08 (180): called boinc_finish

The interesting part is, that those I have checked have all been on Darwins (edit again ... now I found a few dlopen errors on Linuxes too)
62) Message boards : Number crunching : too many errors (Message 37571)
Posted 29 Jul 2009 by Profile Ananas
Post:
I'd really like to hide that page from people, but apparently it's not possible. :(


Modify the text constants :-)

Errors : "Too many total results"
=>
Errors : "not evaluated on WU level"

Validate state : "Workunit error - check skipped"
=>
Validate state : "no validator in use"


or something like that.

They will not be in the .po files so the scripts will not translate those new constants - but that will sure cause less confusion than the wrong error messages.

63) Message boards : Number crunching : Upload problem (Message 37483)
Posted 15 Jul 2009 by Profile Ananas
Post:
I feel your pain, Cesium. I'll have 11 or 12 zip files to upload by tomorrow plus dozens of trickles. Thankfully, I suspended all BOINC activity before the models have finished. I got 4 models scheduled to finish by tomorrow. Bad timing, I guess.



The trickles should upload as soon as you re-enable network activity.

Parts of the .zip files, that go to different servers, will upload too.

From what I can see, only the *_2.zip files go to the stuffed (and disabled) server, BOINC will retry those upload for several days, so hopefully they will find their way either to the new server or to the freed-up space on the current temporary server.

So hopefully nothing will get lost (having 20 models waiting or about to be finished myself, I sure hope that this will go well), just do not abort anything and leave the upload "retry" button alone unless the server status page shows green, as (to my knowledge) at least some BOINC versions have a limited number of upload attempts.


p.s.: The simple trickle reports during a phase are included in a scheduler contact, they are not really uploads, they are just progress reports in XML format
64) Message boards : Number crunching : HADCM3 Crashed and burnt (Message 35325)
Posted 19 Oct 2008 by Profile Ananas
Post:
You might have been hit by a very old BOINC bug that the developers unfortunately don\'t accept to be a problem, thus refusing to replace it with a reliable solution.

If for some reason a DNS query gets stuck, the BOINC core client can become unresponsive for several minutes, not updating the timestamp in the shared memory anymore.

The project application gives up waiting for this heartbeat after half a minute.

Usually the project application gets restartet - unless the problem happens a few times within a short time span.
65) Questions and Answers : Macintosh : CM3 Errors At Start (Message 35319)
Posted 19 Oct 2008 by Profile Ananas
Post:
Here is one more host that shows the same error message, it is now running 4 HadSM3 models, obviously the problem is specific for those HadCM3volc models.


p.s.: I did some more searching and found much higher tweaking values for kern.sysv.shmall, some even recommend 65536 there


One more idea (just in case ...) - did you reboot your machine after changing those kernel parameters? It requires a reboot before the new values can take effect.
66) Questions and Answers : Macintosh : CM3 Errors At Start (Message 35318)
Posted 19 Oct 2008 by Profile Ananas
Post:
... Also I don\'t know how to make the Ulimit change permenant.
When i to it from a terminal window, it reverts back to default
when i close the shell. ...


If it works similar to Unix, there should be a file with the name \".profile\" (without the quotes, but including the starting period) in the home directory of the user. This file usually contains environment and shell settings that kick in everytime you start a shell.

The leading period might make that file invisible in your GUI but you can see it with the \"a\" option of the \"ls\" command (like \"ls -la\")

But it might work different on a Mac.


There are Mac-specific BOINC teams btw., one that comes to my mind would be MacNN, maybe it would be a good idea to join their forum. I guess they will do better than us \"mac-illiterate\" moderators ;-)

edit : I found several more Mac BOINC groups using BOINCstats\' team search
67) Questions and Answers : Unix/Linux : model crash on Linux (Message 35253)
Posted 15 Oct 2008 by Profile Ananas
Post:
Graphics shouldn\'t matter, the command line client doesn\'t require that.

It might be a RAM shortage, old HadSM3 versions did run on 256MB but I\'m not sure about the current ones.
68) Questions and Answers : Macintosh : CM3 Errors At Start (Message 35251)
Posted 15 Oct 2008 by Profile Ananas
Post:
Here\'s my current Ulimit results.

...
data seg size (kbytes, -d) 6144
...
stack size (kbytes, -s) 8192
...


I\'m not familiar with Mac and I have no idea how much of which ressource those volcanic models need - but from the error message, the next thing to try would be

ulimit -s 16384
and/or
ulimit -d 16384

in the startup settings (.profile ?) of the user that runs the BOINC client

(sorry, it\'s all a bit experimental as long as no Mac people jump in)
69) Questions and Answers : Macintosh : CM3 Errors At Start (Message 35245)
Posted 15 Oct 2008 by Profile Ananas
Post:
The discussion board requires a separate signup, it has no access to the BOINC signup data. The signup link is on top of the page there (\"Register\").

Have you tried this \"ulimit -a\" from bash or sh command line (using the user ID that runs BOINC)? What does it say?


While a solution hasn\'t been found yet, you could choose a different model type here, just uncheck the checkbox near \"Application UK Met Office HADCM3\"
70) Questions and Answers : Macintosh : CM3 Errors At Start (Message 35239)
Posted 14 Oct 2008 by Profile Ananas
Post:
Is there something like \"ulimit\" on Darwin?

If so, try to set \"ulimit unlimited\" before you start BOINC.


(maybe you could post the current limits too, i.e. before you\'ve set \"unlimited\")

p.s.: ulimit without options queries or sets only the disk file size, you need to specify \"-s\", \"-d\" or \"-l\" or so (memory related), depending on the options Darwin offers.

p.p.s.: I found the manual entry for \"ulimit\", it\'s a shell builtin (sh, not csh). So (for example) \"ulimit -a\" can show all limits.
71) Questions and Answers : Windows : Task stored idle on the local disk forever? (Message 35222)
Posted 10 Oct 2008 by Profile Ananas
Post:
Several \"no heartbeat\" messages too, either BOINC didn\'t get it right when one CPU VP has been taken away and didn\'t communicate with one model anymore or they have been a result of that evil bug that makes BOINC unresponsive for several minutes when a DNS server is unreachable.

That DNS bug is a real WU killer and can destroy all work in a complete network within a few hours. If you catch such a problem quick enough, it will probably help to disable network access until the DNS server is back.


edit : If it has been a problem of the changing CPU VPs, it might help to

- reduce the \"On multiprocessors, use at most\" setting to the new count
- make BOINC contact the CPDN server
- stop BOINC
- change the CPU VP setting of the VM
- restart BOINC

The difference would be, that the additional task is not sleeping in memory anymore - a task that doesn\'t run will usually not crash so easily.

I\'m not sure if the first two steps are necessary - but in this case I would do it anyway, just in case.
72) Questions and Answers : Preferences : program crashed, disappeared, no new one being loaded (Message 35216)
Posted 10 Oct 2008 by Profile Ananas
Post:
forrtl: Access is denied.
CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5096, iMonCtr=1
Model crash detected, will try to restart..


I wonder if this could be a BOINC 6.x issue.

Les has some helpful links in his signature, those forum threads might help avoid this problem on your next model.
73) Questions and Answers : Unix/Linux : Download temporarily failed for a week now (Message 35195)
Posted 8 Oct 2008 by Profile Ananas
Post:
The same workunit downloaded correctly to 4 or 5 other computers including one with Linux like yours. ...


Don\'t draw the wrong conclusions from that. The files that failed could have been part of the model type, but independant from the result.

So successful downloads can also mean, that those boxes already had the missing files from previous workunits of the same type.
74) Questions and Answers : Unix/Linux : Download temporarily failed for a week now (Message 35186)
Posted 6 Oct 2008 by Profile Ananas
Post:
... My computer is now humming on both cpus.


Good luck with your new pet :-)
75) Questions and Answers : Unix/Linux : Download temporarily failed for a week now (Message 35179)
Posted 6 Oct 2008 by Profile Ananas
Post:
I tried to download them from the climateapps2.oucs.ox.ac.uk download directory with wget - result is a 404 error (i.e. not found).

Does the message tab tell any reason why the download failed? If it is a 404 error / not found, retry will not help at all.
76) Message boards : Number crunching : Size of the WU in ClimatePrediction..... (Message 35173)
Posted 4 Oct 2008 by Profile Ananas
Post:
We love all newbies. But when you said \'I\'ve updated my preferences to 1 model year.\' what exactly did you mean?


My guess : HadAM3 => Duration = 1 model year

HadAM3 are quite demanding though (RAM requirements), so if you have trouble running those, choose HadSM3 instead.
77) Message boards : Number crunching : Size of the WU in ClimatePrediction..... (Message 35154)
Posted 30 Sep 2008 by Profile Ananas
Post:
BOINC needs to learn the duration correction factor for the combination of this project with your computer. The estimated time is set for computers with only the basic x86 command set I guess - and it\'s better to set it a bit too high so computers pull a bit less work than they can handle - rather than too much.

A few models later, your BOINC client will know (and estimate) better :-)

The slowest PC I have crunched CPDN models with is a Dual Pentium IIIs/1266 btw., quite thrilling ;-)

(that computer isn\'t visible in my host list anymore as I recycled the Host ID)
78) Questions and Answers : Windows : \"Reason: no work from project\" (Message 35139)
Posted 29 Sep 2008 by Profile Ananas
Post:
http://boincview.amanheis.de/ can be used to watch and control multiple hosts. It can replace the BOINC manager or be used in addition to the BOINC manager (default is, that it uses port 31416 then).

You need to enable remote control for your computers, either through a command line switch or through the files gui_rpc_auth.cfg (containing the BOINC remote control password) and remote_hosts.cfg (containing the list of computer IPs/names that are allowed to control, one line each).

Advantage over the BOINC manager is, that you can see all hosts on one page without having to switch, different WU stati have different colors and it shows some informations that the BOINC manager does not show. It has an optional result history too, so you can see what your computers returned while you didn\'t watch them.


p.s.: Some IP changes happened in the CPDN network, server side, not on your hosts. BOINC resolves the IPs from the names only once and then caches the IPs. That means, after an IP change (server side) it has to be restarted in order to make it resolve the new IPs of the project servers.
79) Questions and Answers : Windows : \"Reason: no work from project\" (Message 35136)
Posted 29 Sep 2008 by Profile Ananas
Post:
If you run several projects and CPDN had piled up a huge amount of negative Long Term Debits (LTD) in client_state.xml, this can make your computers not request work for quite a long time.

I reset those LTD now and then - but that requires editing of the client_state.xml file, there is no standard feature that allows resetting them (besides resetting the whole project).

Especially CPDN tends to collect negative LTD (caused by the long running models) - if you don\'t reset them, the only way to make your host ask for new work is to suspend all other projects for a minute.

BOINCview can show you how many LTD a project has collected over the time.


If you run only CPDN, a reason I could think of would be the IP changes in the CPDN network - BOINC caches IPs for a long time (or even forever?), but then you should have seen a message about connection problems. A BOINC restart fixes that IP cache problem - but one rarely restarts headless crunchers ;-)


p.s.: My personal opinion is, that the LTDs are a misconcept, as they can interfer with the short term debits, if one of them is negative for a project while the other value is positive for the same project
80) Message boards : Number crunching : Upload troubles (Message 35133)
Posted 29 Sep 2008 by Profile Ananas
Post:
... one more thing to check :

Does your Beta project directory still contain trickle files larger than 700,000 bytes?

I am not sure what BOINC does with those trickle files, after a model has crashed. It might still want to upload them.

So if any of those huge trickle_up*.xml files are there (BOINC/projects/cpdnbeta.oerc.ox.ac.uk/), you need to delete them as BOINC cannot handle them.

(you didn\'t have a HadSM3P 6.00 model on beta, so there should not be any of those big things - but who knows)


Previous 20 · Next 20

©2024 climateprediction.net