climateprediction.net home page
Forever repeating \"Communication deferred\"

Forever repeating \"Communication deferred\"

Questions and Answers : Unix/Linux : Forever repeating \"Communication deferred\"
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user535979

Send message
Joined: 10 Sep 08
Posts: 3
Credit: 3,184,211
RAC: 0
Message 37957 - Posted: 7 Sep 2009, 7:36:58 UTC

I upgraded to BOINC 6.6.36 and ran it briefly. The new version downloaded four new ClimatePrediction tasks, it had lost the four half-finished tasks from the old BOINC.

Soon I decided to switch back to BOINC 6.2.18, because the new version didn\'t do what I wanted it to do (run gpugrid.net) and also didn\'t start automatically on reboot like the old version.

After I had switched back, the Tasks tab showed my old ClimatePrediction tasks, from before the first switch. But after a few seconds all four of them had status \"Computation error\".

Ever since, the Projects tab shows status \"Communication deferred\" counting down. When it reaches zero, it shows \"---\" for a couple of seconds, and then goes back to \"Communication deferred\" with a new countdown. The Messages tab does not show any communication attempt, it doesn\'t add any message at all below the messages already there. The tracing function of my firewall reveals that there is no communication attempt. All I get is this countdown, the brief pause, and back to the countdown.

How can I get it to start doing something useful again? I suppose the best thing is if I can get it to start all four tasks from the beginning, so that the computations aren\'t lost. Presumably I should also make the four tasks that were downloaded by the new BOINC visible to the old BOINC so they can be processed. I have backup copies of both version\'s \"projects\" folders.

I run 64-bit Ubuntu on an Intel quad core.

When \"Computation error\" first appeared, the Messages tab did show messages about the problem:

[error] Failed to open init file slots/0/init_data.xml
[error] Failed to open init file slots/3/init_data.xml
[error] Failed to open init file slots/1/init_data.xml
[error] Failed to open init file slots/2/init_data.xml
Computation for task hadsm3mh_ksz9_006315793_1 finished
Output file hadsm3mh_ksz9_006315793_1_3.zip for task hadsm3mh_ksz9_006315793_1 absent
Output file hadsm3mh_ksz9_006315793_1_4.zip for task hadsm3mh_ksz9_006315793_1 absent
Computation for task hadsm3mh_kqe9_006312445_8 finished
Output file hadsm3mh_kqe9_006312445_8_2.zip for task hadsm3mh_kqe9_006312445_8 absent
Output file hadsm3mh_kqe9_006312445_8_3.zip for task hadsm3mh_kqe9_006312445_8 absent
Output file hadsm3mh_kqe9_006312445_8_4.zip for task hadsm3mh_kqe9_006312445_8 absent
Computation for task hadam3p_me9b_1981_2_1006339641_2 finished
Output file hadam3p_me9b_1981_2_1006339641_2_1.zip for task hadam3p_me9b_1981_2_1006339641_2 absent
Output file hadam3p_me9b_1981_2_1006339641_2_2.zip for task hadam3p_me9b_1981_2_1006339641_2 absent
Output file hadam3p_me9b_1981_2_1006339641_2_3.zip for task hadam3p_me9b_1981_2_1006339641_2 absent
Computation for task hadsm3mh_kt1j_006315875_5 finished
Output file hadsm3mh_kt1j_006315875_5_1.zip for task hadsm3mh_kt1j_006315875_5 absent
Output file hadsm3mh_kt1j_006315875_5_2.zip for task hadsm3mh_kt1j_006315875_5 absent
Output file hadsm3mh_kt1j_006315875_5_3.zip for task hadsm3mh_kt1j_006315875_5 absent
Output file hadsm3mh_kt1j_006315875_5_4.zip for task hadsm3mh_kt1j_006315875_5 absent
ID: 37957 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 37958 - Posted: 7 Sep 2009, 8:35:48 UTC

There are a couple of problems here.

1) When 6.2.18 (and a couple more versions) were released, the default setting was for the protected (service) mode. But this didn\'t allow the cuda apps to run, so the next big change was to make the default setting \"standard\" (non-service) mode. I\'m not sure which version started this, but it WAS in version 6.6.*

So, to get 6.6.36 to start automatically, you need to look during install for the screen with the Advanced button. Clicking this will take you to an extra step (screen) with options to select where the 2 parts (programs) & (writeable data) are installed, as well as the option to install as standard or service mode.

The correct way to change a lot of things with version 6, is to un-install, (which moves all of the data back to a default location), and then re-install with the new options/version. Not doing this will cause BOINC to forget where the models are/were.

2) I\'m not sure about the error messages. It\'s possible that something got corrupted when you were swapping things around.
Don\'t worry about messages saying that output (zip) files can\'t be found.
These only get created at the end of the model, so if something goes wrong before then, they won\'t have been created yet, but BOINC just knows their names, not when they get created.

As to using backups, you could try this and see what happens.
Or you could Reset the project (which erases all the project stuff), and start again.


Backups: Here
ID: 37958 · Report as offensive     Reply Quote
old_user535979

Send message
Joined: 10 Sep 08
Posts: 3
Credit: 3,184,211
RAC: 0
Message 37968 - Posted: 9 Sep 2009, 4:06:27 UTC - in response to Message 37958.  

Thanks for the answer! And sorry about taking so long before thanking you.

Unfortunately I can\'t follow your advice. The installer for 6.6.36 on 64-bit Linux doesn\'t have any screens, so there\'s no place where I might look for an Advanced button.

With the regular Ubuntu installer for the old BOINC (the same as for any program), installing and uninstalling is impressively easy, even Windows can\'t compete with Ubuntu in this regard. By contrast, installing 6.6.36 on 64-bit Linux is sort of shrouded in mystery and enigma... (*)

Consequently I spent a few hours struggling with both versions of BOINC. In the end I hunted down every single file and folder everywhere on my computer that had \"boinc\" in its name and removed them all (keeping copies in a backup folder). After this, two clicks on Ubuntu\'s beautifully simple default installer gave me a fully working old BOINC 6.2.18.

I\'ll use this old version until the Ubuntu repository offers me its very nice click-click-done, upgraded.

During my struggles, twice more my BOINC fetched new work units, and now the web page for my computer lists 13 unfinished work units labelled Server state Over and Outcome Client detached.

I understand that lost work units are a problem, so I\'ll try to get my BOINC to finish these 13 work units if this is useful.

Do you know if ClimatePrediction can still receive a work unit despite the label Server state Over?

One way to restore from backup is to forbid new tasks and wait for the current batch to finish, and then copy a backup version of boinc-client/projects/climateprediction.net/ into the active folder, and then wait for those work units to finish, and repeat. But this gets inefficient because my four processor cores won\'t finish all at the same time.

Suppose I don\'t wait for the current batch to finish. Suppose today I copy the entire contents of the backup boinc-client/projects/climateprediction.net/ into the active boinc-client/projects/climateprediction.net/. Would this cause confusion? Or would it allow BOINC to finish all the work units?

Of course I could just try this and see what happens, but I don\'t know if I risk destroying the work units that I have working now.

Again, thanks for helping!

------------------------

(*) Just in case you\'re interested in the details of the installation mysteries: The download page gives no instructions on how to install. The installer is a .sh file (like a Windows .BAT file), but if you try to inspect it, you get a message saying that it contains binary rather than text. Yet it turns out that you can run it like a regular .sh file. (Maybe I\'m alone in finding this strange, maybe it\'s a Linux convention that I\'m unaware of.) The program is installed wherever you happen to call the .sh file (but maybe this is again some Linux convention that I\'m unaware of). The installer gives you a single line of text and ends, it says \"use /home/username/BOINC/run_manager to start BOINC\", but if you follow this instruction, the BOINC manager starts and then waits silently forever and doesn\'t work. That\'s because you have to start the client too. If you install BOINC as the root user, or if you ever (even once) run the client as the root user, later any attempt to run the client as the regular user will fail mysteriously. When you need to uninstall there\'s no uninstaller, and all the instructions that I found on how to uninstall BOINC are about antique versions.

In my various attempts with various strange problems, probably I inadvertently mixed old and new versions of manager and client, and/or mixed root and regular files, and/or stumbled upon other combinations of mistakes, before I learned about the various possible problems. With what I learned, I\'m sure I could now install 6.6.36 without problems, but it just isn\'t worth more time.
ID: 37968 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 37970 - Posted: 9 Sep 2009, 4:34:24 UTC

The messages on the page for the list of models, and on the pages of the individual models, are BOINC default messages for various situations. This project doesn\'t use any of the BOINC software for processing returned data, so the messages can be ignored.

*************
Suppose I don\'t wait for the current batch to finish. Suppose today I copy the entire contents of the backup boinc-client/projects/climateprediction.net/ into the active boinc-client/projects/climateprediction.net/. Would this cause confusion? Or would it allow BOINC to finish all the work units?

A good way to waste time and destroy the running models.
Make a backup of the CURRENT work, (the BOINC data folder and sub folders), before trying it.

The file client_state.xml contains lots of sections for each model. Basically, it\'s a \"To Do\" list for the models that are waiting to run/running/just finished and waiting to report.

If you try your idea, you\'ll end up with the current folders, plus those folders that aren\'t common to both the running and backup areas, and then BOINC won\'t be able to match all of the project folders with it\'s To Do list.
e.g. Which of the 2 client_state.xml files ends up in the working area? Etc.

*************

One common mistake is accidentally upgrading a 32 bit version of BOINC with a 64 bit version, or vice versa.


Backups: Here
ID: 37970 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 37971 - Posted: 9 Sep 2009, 5:10:26 UTC

Ingvar, don\'t worry about models reclassified by Boinc as \'Client detached\'. Boinc reclassifies models this way whenever we restore a backup (and in certain other circumstances). It doesn\'t matter and the models can still be completed.

The \'Over\' classification can also be ignored. For example, if a model crashes it is classified as Over but if the person restores a backup from before the crash the model can still be continued. The Over word remains there even after successful completion!

So you can ignore these distracting classifications. If you complete the models the researchers will receive the files from you that they need and you will get your credits.

Some Boinc task classifications are a complete mystery to me. Sometimes we see a model that crashed and was never completed classified as Over - Success - Done.
Cpdn news
ID: 37971 · Report as offensive     Reply Quote
old_user535979

Send message
Joined: 10 Sep 08
Posts: 3
Credit: 3,184,211
RAC: 0
Message 37981 - Posted: 10 Sep 2009, 5:48:01 UTC

Thank you both for the useful information.
ID: 37981 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 37985 - Posted: 11 Sep 2009, 23:12:33 UTC

Ingvar, I\'ve also gotten computation errors on an Ubuntu 64-bit machine running BOINC 6.2.18, 64-bit when I run HADSM3MH\'s to 6~10% completion. I reinstalled ia32-libs, libstdc++6, freeglut3 and upgraded to 6.4.5. It\'s been running well for 41 hours under a test account. 6.4.5 is the version which will be released with Ubuntu 9.10, and it installs fine under 9.04. Download the client and manager here or from any other Ubuntu package mirror. Uninstall 6.2.18 first and ignore the version warning.

If I encounter any more problems I\'ll post them here and then try a 32 bit version of BOINC.
ID: 37985 · Report as offensive     Reply Quote
WynX

Send message
Joined: 16 Oct 04
Posts: 7
Credit: 17,862,385
RAC: 308,272
Message 38298 - Posted: 11 Nov 2009, 10:35:00 UTC
Last modified: 11 Nov 2009, 10:37:14 UTC

I\'ve encountered more or less the same symtomps after running 1 succesfull batch of models (dual core E8400). When requesting new models the same thing occurred with the Output file XXX absent message.
I am running version 6.2.14 x64 version on Debian here.

It seems to have solved it for me by changing the client path in the /etc/init.d/boinc-client file.

Changed:
BOINC_CLIENT=/usr/bin/boinc

to

BOINC_CLIENT=/usr/bin/boinc_client

(since i noticed there were 2 versions of the client)
I dont know how the second one got there or when (because it ran fine for 1 run), but maybe somebody is helped with this info.
ID: 38298 · Report as offensive     Reply Quote
WynX

Send message
Joined: 16 Oct 04
Posts: 7
Credit: 17,862,385
RAC: 308,272
Message 38304 - Posted: 12 Nov 2009, 9:39:21 UTC

Update: no that doesnt fix the problem (I didnt know yet because of the climatepreciction daily limit)
ID: 38304 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 38305 - Posted: 12 Nov 2009, 16:22:15 UTC

Output files are absent because the model has crashed before it gets to the point where they are created, so that message is meaningless.
You\'ve got your computer hidden, so you\'ll have to look at the error messages on each model to find out the real reason for the crash.

The daily limit is to stop \"faulty\" computers wasting lots of models, so if you\'re getting this message, you need to find and fix the problem.


Backups: Here
ID: 38305 · Report as offensive     Reply Quote
WynX

Send message
Joined: 16 Oct 04
Posts: 7
Credit: 17,862,385
RAC: 308,272
Message 38307 - Posted: 13 Nov 2009, 7:43:37 UTC

Thx for the reply. Yes i understand the model crashes but i dont think its on computation (it has 0 seconds runtime).

Runtime error: 127 (0x7f)

Found this info; http://www.boinc-wiki.info/Unrecoverable_error_for_result_%27(result)%27_(process_exited_with_code_127_(0x7f))
basically this confirms that it doesnt happen in computation itself but before the files are unpacked....havent found solution though...
ID: 38307 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 38308 - Posted: 13 Nov 2009, 7:48:30 UTC

Are you running 64bit OS and BOINC without having the 32bit binaries installed as well?


Backups: Here
ID: 38308 · Report as offensive     Reply Quote
WynX

Send message
Joined: 16 Oct 04
Posts: 7
Credit: 17,862,385
RAC: 308,272
Message 38311 - Posted: 14 Nov 2009, 18:44:39 UTC

I just installed this from the debian repo\'s (lenny). I also reinstalled but same thing occurs (purged earlier install)...
ID: 38311 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 38312 - Posted: 14 Nov 2009, 19:05:56 UTC

Not all 64 bit distro\'s also install the 32 bit files, which is why I asked.
These files are needed for a 64 bit system to able to run these 32 bit models.

Please read the stickied post at the top of this Linux section.


Backups: Here
ID: 38312 · Report as offensive     Reply Quote
WynX

Send message
Joined: 16 Oct 04
Posts: 7
Credit: 17,862,385
RAC: 308,272
Message 38317 - Posted: 16 Nov 2009, 10:33:25 UTC

Thx Bayliss, this solved the problem!
ID: 38317 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : Forever repeating \"Communication deferred\"

©2024 climateprediction.net