climateprediction.net home page
Posts by old_user40785

Posts by old_user40785

1) Questions and Answers : Unix/Linux : boinc deferring communication with project for 11 hours..... (Message 12048)
Posted 22 Apr 2005 by old_user40785
Post:
> In fact, as far as I know but I'm not sure, when you have a new Wu, the
> informations about the old Wu are simply deleted from the xml files.

ouch

> This is why I was asking you if you had a backup (the only way to have the xml
> files of the previous Wu)

no backup, I didn't foresee this event, no thought that it was possible to to fall over in such a way. If boinc site mentined this, I could have prepared for it in advance. The data files are still there ~550MB in directory under ~/.boinc/projects/climateprediction.net/
and xml files with same name in ~/.boinc/projects/

> It is not possible to restart a Wu if the information in the xml files are not
> present because there are mdk5 signatures attached to the files and if these
> signature are not OK, the model will be rejected by the upload servers.

bummer

> Even if you could restart your model, you wouldn't be able to upload it.
> For me, your Wu is lost.

should the data in crashed project be wiped or wait in case a way is found to restore it? I'll wait....

> For your present Wu, do a backup once a week, or just before change of phase
> so as your problem doesn't happen again.

ok, will setup to do rsync of .boinc directory every 24hrs to another disk
thanks

> Bye
>
2) Questions and Answers : Unix/Linux : boinc deferring communication with project for 11 hours..... (Message 12032)
Posted 22 Apr 2005 by old_user40785
Post:
> Hi,
> THe first Wu is marked as "Outcome: Client error " on the web site.
> Boinc will not start this wu again, except if you have a backup of the whole
> BOINC directory made before the crash of the wu.
> This is because all the information about the first Wu are contained in the
> XML files especially client_state.xml.
> As long as the first Wu is marked as error in the XML files, boinc will not
> crunch it.
>
>

so how can the Wu be restored back to health from where it left off?
or does it have to be restarted from 1810?
like editing the xml file?

this seems to shortsighted to have all this work just stopped because it ran out of disk space...phase 3 is almost finished, probably about 10% to go.

it's not like a disk crash which understandably difficult to recover
(which was reporting 1.6Gb free before it failed, but thats another issue I need to take up with the ext3 formum)

3) Questions and Answers : Unix/Linux : boinc deferring communication with project for 11 hours..... (Message 12018)
Posted 21 Apr 2005 by old_user40785
Post:
> Well, go in your account and check that you have enough disk space: at least 1
> GB / CPDN Wu

it now has 7.5Gb

> Did you try a reboot, sometimes it solves problems with BOINC when it is stuck
> in a bad loop.

rebooted and restarted BOINC...
BOINC stoped running Wu and is now running a new Wu.
The first Wu was about 90% completed and keen to get it finished...

here is the url of first Wu, maybe that can help to restart it
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=503724

> If nothing works, I would suggest that you remove BOINC from your machine,
> reboot and install BOINC again: it is a drastic solution but I see no other
> one.

later...


> Or you can wait that someone else gives you a better advice :o)
>
> Note that the CPDN servers are not working very well presently and lot of
> users have difficulties to contact the schedulers and upload their models.
> 4.12 models are known to be instable on Linux machine too.

this is using 4.19/ hadsm3 version 4.13
output from this morning after starting new Wu:

russell@athlonbox:~/.boinc$ boinc
2005-04-22 07:56:08 [---] Starting BOINC client version 4.19 for i686-pc-linux-g
nu
2005-04-22 07:56:08 [climateprediction.net] Project prefs: no separate prefs for
home; using your defaults
2005-04-22 07:56:08 [climateprediction.net] Host ID is 93024
2005-04-22 07:56:08 [---] General prefs: from climateprediction.net (last modifi
ed 2005-01-28 02:16:23)
2005-04-22 07:56:08 [---] General prefs: using separate prefs for home
2005-04-22 07:56:08 [climateprediction.net] Resuming computation for result 39br
_200173586_0 using hadsm3 version 4.13
Starting model in /mnt/hdd8/boinc/projects/climateprediction.net...
Created shared memory region key = 26015
Env Used=LD_LIBRARY_PATH=/mnt/hdd8/boinc/projects/climateprediction.net:/usr/loc
al/lib:/usr/lib:/lib
Starting model ID 39br_200173586 Phase 1
Stack size=48.00 MB
Waiting for model startup, this may take a minute...
39br_200173586 - PH 1 TS 000289 - 07/12/1810 00:30 - H:M:S=0000:26:51 AVG= 5.57
DLT= 0.00

> I have no other ideas :o(
>

thanks for your help, BOINC is now working, just not on the 1st Wu, and restarted 2nd. It would be nice to know how to complete the 1st.
4) Questions and Answers : Unix/Linux : boinc deferring communication with project for 11 hours..... (Message 12008)
Posted 21 Apr 2005 by old_user40785
Post:
> I meant ./boinc -update_prefs http://climateprediction.net

that gave the same result, and boinc went into hibernation.

this is the rest of the error message:

No heartbeat from core client - exiting
zip I/O error: No space left on device

zip error: Could not create output file (../3vjp_000202668_0_1.zip)



boinc ran out of disk space and I spent last night sorting it out
5) Questions and Answers : Unix/Linux : boinc deferring communication with project for 11 hours..... (Message 12001)
Posted 21 Apr 2005 by old_user40785
Post:
> Hi,
> Did you try to stop BOINC and start it again with:
>
> ./boinc -return_results_immediately or
> ./boinc -update_prefs [URL of the project]
>
>
>

none of the these commands worked.
what you mean about the URL of the project?
I tried URL of host, result, workunit and account. Each of these didn't work.
I found this on the workunit page in the sterr out:

4.19
process got signal 11

3
11

No heartbeat from core client - exiting
No heartbeat from core client - exiting
No heartbeat from core client - exiting
......

Each time this is attempted, boinc decreases the time when it will restart, which by now is in ~7 hrs, so eventually it seems it will restart.
6) Questions and Answers : Unix/Linux : boinc deferring communication with project for 11 hours..... (Message 11984)
Posted 21 Apr 2005 by old_user40785
Post:
This computer was disconnected from ISP for ~12hrs and boinc went into hibernation.
After reconncetion to isp boinc still hibernates.
How can boinc be woken up and resume calculations?
boinc was running well up until this point

System:
Linux 2.6.6, Debian/testing, athlon 1.2 GHz

the output:
:~/.boinc$boinc
2005-04-21 19:05:20 [---] Starting BOINC client version 4.19 for i686-pc-linux-gnu
2005-04-21 19:05:20 [climateprediction.net] Project prefs: no separate prefs for home; using your defaults
2005-04-21 19:05:20 [climateprediction.net] Host ID is 93024
2005-04-21 19:05:20 [---] General prefs: from climateprediction.net (last modified 2005-01-28 02:16:23)
2005-04-21 19:05:20 [---] General prefs: using separate prefs for home
2005-04-21 19:05:20 [climateprediction.net] Deferring communication with project for 11 hours, 19 minutes, and 55 seconds
2005-04-21 19:05:20 [climateprediction.net] Deferring communication with project for 11 hours, 19 minutes, and 55 seconds
7) Questions and Answers : Unix/Linux : There was work but you don\'t have enough disk space allocated (Message 7763)
Posted 27 Jan 2005 by old_user40785
Post:
I used a symbolic link to get around space limit on /

There is only 337M in / so I made a dir called boinc on mounted drive /mnt/hdd8 which has 27G free and copied the binary there
eg mkdir /mnt/hdd8/boinc
then made a symbolic link from home directory to /mnt/hdd8/boinc
eg ln -s /mnt/hdd8/boinc .boinc
then cd to .boinc
then ran binary and its up and running!




©2024 climateprediction.net