climateprediction.net home page
Posts by Ananas

Posts by Ananas

21) Message boards : Number crunching : ANZ model upload problems. (Message 48578)
Posted 27 Mar 2014 by Profile Ananas
Post:
With the large upload files and the high server load here, broken uploads can easily happen - here on the server, on the ISP or who knows where else those bites can disappear on their way (maybe the NSA eats some too).

The timeout of the upload handler seems to be somewhat longer than the retry delay of the BOINC core client. I had it too lately for a few times, it always fixed itself after some time.
22) Message boards : Number crunching : Credit updates? (Message 48577)
Posted 27 Mar 2014 by Profile Ananas
Post:
... Do you think that the credit will at some stage be accounted for? ...

CPDN does not calculate the credits incremental like other projects. Instead, they always calculate all models and trickles you ever returned to the project.

That's why I'm optimistic that nothing will be lost once they get the time for fixing and/or reinstalling the scripts that add up the credits.
23) Questions and Answers : Preferences : No work Sent (Message 48514)
Posted 24 Mar 2014 by Profile Ananas
Post:
... and the other crashed (without supplying a useful error code).
...

As the model has 2 more identical crashes, it is sure a problem with the workunit, not a problem with the computers trying to run it.
24) Message boards : Number crunching : New tasks, Charts (Message 48498)
Posted 23 Mar 2014 by Profile Ananas
Post:
In FF 26.0 it cuts the legend off :

data.addColumn('number', ' hadam3p (Global model only) Tasks ready to send')

shows (behind the blue square) as :
hadam3p
(Global mo...

with trailing periods, so max-width is probably somewhat too short.

p.s.: nice work, now we need the same for the beta server :-)
25) Message boards : Number crunching : Credit updates? (Message 48468)
Posted 20 Mar 2014 by Profile Ananas
Post:
maybe some stats sites need .htaccess with "Options +Indexes" in order to check the timestamps of the the files? I know that there is a file that contains the date of the last update and the filenames, but who knows how the importers work.
26) Message boards : Number crunching : Task ... exited with zero status but no 'finished' file (Message 48414)
Posted 15 Mar 2014 by Profile Ananas
Post:
Developers options (users cannot do that) :

Switching off the heartbeat check : http://boinc.berkeley.edu/trac/wiki/OptionsApi

Works fine in the wrapper for RNA-World :-)
27) Message boards : Number crunching : Task ... exited with zero status but no 'finished' file (Message 48413)
Posted 15 Mar 2014 by Profile Ananas
Post:
A very nice solution would be if the BOINC user could create a file in the project folder that serves as a flag to disable the heartbeat checking in the BOINC project API for a specific project.

As this is not supported from Berkeley, it has to be re-done with each update on the API version. But ... as this is not supported from Berkeley, they cannot remove it from the sources either ;-)

In this case, the user himself would be reliable for identifying dead (e.g. looping or stuck) tasks, but for people who know what they are doing it would help a lot.

p.s.: Leaving tasks in memory is always a good idea, especially for CPDN, it makes the results run smoother. It does not help much when the BOINC core client is too busy to send the heartbeat though, as the project API enforces an exit in this case.

On one of my machines, that has a slow HDD, unpacking a CPDN workunit keeps the core client busy for about 1.5 minutes, but the project API allows only 30 seconds. Trouble with the name to IP resolution (usually an ISP problem on client side) has the potential for keeping the core client busy for several minutes too.

p.p.s.: On Windows, you can probably improve your HDD speed by disabling the windows index service for the HDD where you have your BOINC files - and don't use file system compression. And - if BOINC has its own partition - you could try FAT32 instead of NTFS. NTFS is less likely to be damaged when a power failure occurs but it is slower.
28) Message boards : Number crunching : Microsoft Visual C++ Runtime Error (Message 48319)
Posted 7 Mar 2014 by Profile Ananas
Post:
If you run BOINC on a "headless cruncher", you could consider to disable Windows Error Reporting (completely, i.e. including critical errors!).

This setting is system-wide so it isn't recommended for your work computers, but for a box used only for crunching, the advantage is, that the cores/tasks aren't blocked anymore until someone confirms the message.
29) Message boards : Number crunching : No Tasks Available (Message 48318)
Posted 7 Mar 2014 by Profile Ananas
Post:
...I plan to reset the project in a day or so when my current hadcm3n_obpu task completes.

I doubt that a project reset has the same effect as a BOINC client restart. If there is an outdated IP cached, it will probably survive the reset, whereas a client restart always re-evaluates all IPs.
30) Message boards : Number crunching : MORE DOWNLOAD ERRORS (Message 48298)
Posted 5 Mar 2014 by Profile Ananas
Post:
No, this is a brand new problem with a brand new batch of models for a brand new experiment.

The problem is being discussed in this thread for some reason.
It should have been in a new thread, but that's how it goes. :(
...


In the other thread, Dave Jackson has posted that a core client restart fixed his download problem, which sounds very much as if it might have been a problem of an old cached IP, so the download went to an outdated server.

... it cannot hurt to have the possible solution in the right thread too ;-)
31) Message boards : Number crunching : No Tasks Available (Message 48256)
Posted 3 Mar 2014 by Profile Ananas
Post:
One ghost WU at 15:28, one arrived properly at 16:14, both WUs are brandnew, generated today.

Ghost WUs are usually a sign for server or network overload, which could explain temporary HTTP errors.

A permanent HTTP error usually means that the file actually does not exist on the server or has insufficient access permissions for web users so this is usually not a client side or communication problem.

Might be bad timing, if the files arrived _after_ the scheduler knew about the fresh results.

p.s.: Just in theory, another possible reason for such a permanent download error would be if the download server IP has been cached by your BOINC client some time ago but in the meantime the IP has changed and the old IP points to a still existing web server. In this case only a restart of the BOINC client would help.
32) Message boards : Number crunching : WU config error : Maximum CPU time exceeded (Message 47925)
Posted 3 Jan 2014 by Profile Ananas
Post:
I did such a move once when I switched that box to Ubuntu for some testing but this time everything has been just standard.

The trickle times where within the normal time/speed range for that box, even somewhat faster than the previous model on the same box.
33) Message boards : Number crunching : WU config error : Maximum CPU time exceeded (Message 47922)
Posted 2 Jan 2014 by Profile Ananas
Post:
at about half the model hadcm3n_4jfo_2020_40_008402337

Maximum CPU time exceeded

Mine got further than the previous deliveries so I would have had a chance to finish it :-/

This might be a series of errors as usually all models of a series have the same FPOPS_BOUND value.

p.s.: I'm not using any tweaks that influence the benchmark result so the calculated FPOPS are standard BOINC, no power saving mode either - that's why it can only be a WU configuration problem.
34) Questions and Answers : Windows : Windows 8.1, a caution... (Message 47546)
Posted 12 Nov 2013 by Profile Ananas
Post:
If it is just an installer problem, maybe just copy the stuff. CC 6.x might need a registry entry with the data path that you could try to export with regedit and import with doubleclick on the exported .reg file.

My prehistoric 5.10.28 (which I use with BoincView) doesn't even require a registry key so I usually just copy what's needed or unzip it from the install archive.
35) Message boards : Number crunching : Uploading issues - RAPIT tasks (Message 47216)
Posted 30 Sep 2013 by Profile Ananas
Post:
...
Or it could be an Apache configuration issue on the new server - a timeout, perhaps?

An Apache with his calumet full of THC, what do you expect?


Sorry, I couldn't resist
36) Message boards : Number crunching : Compute Errors / Bad Work Units? (Message 47214)
Posted 30 Sep 2013 by Profile Ananas
Post:
...
Make sure you have "leave applications in memory when suspended" OFF.
...

Not in all cases - a box with 10GB physical RAM that runs 24/7 can affort to leave the stuff in memory and a restart from checkpoint is always more risky than just restarting a suspended task.
37) Message boards : Number crunching : Compute Errors / Bad Work Units? (Message 47212)
Posted 30 Sep 2013 by Profile Ananas
Post:
My computer, 1281635, has not completed a cpdn workunit in recent memory. ...

Refreshing your memory ... the project has been happy with this one :-)

p.s.: CPDN results are sometimes hard stuff, so a (basically) reliable host that does not trash workunit after workunit within really short time should always have a chance to complete some results. If I were you, I would keep trying (I trashed way more than a handful too btw.).
38) Message boards : Number crunching : still don't get credits since last breakdown (Message 47055)
Posted 16 Sep 2013 by Profile Ananas
Post:
Credits delay has already been seen in this project, as soon as they have their scripts working again, they will be able to rollup all informations either from start or from the point where the delay started. It always worked like that and I'm optimistic that this time will not be different.

I highly doubt that anything will get lost.

p.s.: If my information is still valid, credits calculation in this project is a full run over all models and all trickles ever crunched anyway, so a normal standard call will collect everything from the beginning of (CPDN) time up to now. This is why (unlike in other projects) team member movements always moves all member credits to the new team btw.
39) Message boards : Number crunching : Upload Failure (Message 43995)
Posted 11 Apr 2012 by Profile Ananas
Post:
I think that's a server side problem, possibly because the server load was too high at the time. In which case it'll fix itself after a while.


It is a server side problem, the scheduler (parser) tries to read 256 bytes from sched_request.xml and doesn't get those.

There is no syntax or sanity check yet, just reading stuff into the buffer fails.

It already happened on 3 boxes for me, two of which got work in the meantime, one still struggling.

The file handle is most likely not null because it does check that (a bunch of statements before trying fgets() though).

Unfortunately they don't report errno so it's not so easy to tell the exact reason.

p.s.: the upload error and the scheduler error are not necessarily related (2 different programs) but the chance is high that the same thing causes them. The fgets() problem has been reported in a bunch of other projects like lhc, simap and seti
40) Message boards : Number crunching : No resubmission (Message 43975)
Posted 6 Apr 2012 by Profile Ananas
Post:
I received the resubmission of two results that have the status "No resubmission" - sounds somehow unplanned ;-)


Btw.: Why does "Status" translate as "Rang" in the german language setting? "Rang" is "Rank", not "Status". "Status" should just translate as "Status".


Previous 20 · Next 20

©2024 climateprediction.net