Questions and Answers :
Windows :
Unable to communicate with project server...
Message board moderation
Author | Message |
---|---|
Send message Joined: 19 Jan 07 Posts: 9 Credit: 2,233,821 RAC: 0 |
I\'m getting the following set of messages on two of my machines (all the others are fine). Any ideas what may be happening, what machine is it actually trying to contact for instance... 05/07/2007 13:36:22|climateprediction.net|Sending scheduler request: Requested by user 05/07/2007 13:36:22|climateprediction.net|(not requesting new work or reporting completed tasks) 05/07/2007 13:36:23||Project communication failed: attempting access to reference site 05/07/2007 13:36:24||Access to reference site succeeded - project servers may be temporarily down. 05/07/2007 13:36:25|climateprediction.net|Scheduler request failed: failed sending data to the peer 05/07/2007 13:36:25|climateprediction.net|Deferring communication for 24 min 21 sec 05/07/2007 13:36:25|climateprediction.net|Reason: scheduler request failed Any help gratefully appreciated! --Richard |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Hi Richard Which ever server it is, trickle or upload, (upload if you have a zip file in the Transfers tab), one of them is down at present. You can see the status by clicking on \"Server Stats in the menu to the left of here. |
Send message Joined: 19 Jan 07 Posts: 9 Credit: 2,233,821 RAC: 0 |
This would be trickle, no files to upload. Hmm, all servers SAY they\'re up. Wonder if I have got something corrupt that\'s directing me to an invalid server or something.... --Richard (edit) Hmm, netstat tells me that I tried to contact a machine called \'targhee.open.ac.uk\' Is this expected? |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
Hmm, netstat tells me that I tried to contact a machine called \'targhee.open.ac.uk\' Is this expected? Definitely not. Scheduler requests should be going to climateapps2.oucs.ox.ac.uk Check the file master_climateprediction.net.xml in your BOINC directory. Line 14 in the file should read as follows: <scheduler> http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi </scheduler> BOINC requests a reload of the file after 10 scheduler request failures, so you could always force a reload by doing a few manual project updates. "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
An upload server certainly isn\'t running at the moment: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/server_status.php The \'targhee.open.ac.uk\' server will be the one at the Open Uni in Milton Keynes that hosts the independent forum. You must have looked in there. Just suspend network activity and wait...... Cpdn news |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The server that was down (uploadatm), is now working again. Just to be quite clear about it, THIS is the page where you should end up after clicking on the Server Stats link on the left. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
It should really say Server Status. I have a list somewhere of typos that members have pointed out on the cpdn website but the guys in Oxford seem so busy that I\'ve never dared send it. Cpdn news |
Send message Joined: 19 Jan 07 Posts: 9 Credit: 2,233,821 RAC: 0 |
Definitely a mystery. The scheduler listed in the master file is indeed climateapps2, and if I ping it, the ping works. The server status for it seems to be green.. but still I can\'t connect :(. The machine next to it works just fine. I\'m stumped I\'m afraid :( --Richard |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
Do the two machines have the same firewall and firewall settings? (The other thing to check is proxy servers, but most home machines don\'t use them). I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 5 Nov 06 Posts: 24 Credit: 548,923 RAC: 0 |
Same problem here : 9-7-2007 15:09:27|climateprediction.net|Sending scheduler request: Requested by user 9-7-2007 15:09:27|climateprediction.net|(not requesting new work or reporting completed tasks) 9-7-2007 15:09:50||Project communication failed: attempting access to reference site 9-7-2007 15:09:51||Access to reference site succeeded - project servers may be temporarily down. 9-7-2007 15:09:53|climateprediction.net|Scheduler request failed: couldn\'t connect to server 9-7-2007 15:09:53|climateprediction.net|Deferring communication for 1 min 0 sec 9-7-2007 15:09:53|climateprediction.net|Reason: scheduler request failed even 20 manual tries and all the automatic tries to connect ... nothing. i was allready wondering why i suddenly get lower much lower credits all the time. the only thing that has changed firewall/settings/programs wise is just that i installed 5.10.13 because the lower versions where buggy. |
Send message Joined: 19 Jan 07 Posts: 9 Credit: 2,233,821 RAC: 0 |
Well, I fixed my problem, somehow :). It was indeed (trying) to communicate with targhee, not climateapps2, why I have no idea, since in the master_climateprediction.xml file the scheduler was specified as climateapps2. However it had something to do with the client_state.xml file. Just to try, I restored a backup from before when the problem started and sure enough the problem went away. Now when I did an update it tried to connect to climateapps2 and all was fine. So I restored the latest backup, but overlaid that with the climate_state.xml file from the working backup, with a little hand editing to update fields within it. Now when I fired it up, it was back to its latest point and when i did an update it connected fine and uploaded the 4 queues trickles it had by that time. Unfortunate side effect though is that my server status is now Over Unknown New (but that\'s no worse than restoring from a backup anyway) But how did \'targhee\' get embedded in that file? And why was BOINC stubbornly trying to use that instead of climateapps2. That\'s still a mystery, but at least I\'m running again now! --Richard |
Send message Joined: 9 Jan 07 Posts: 497 Credit: 342,899 RAC: 0 |
|
Send message Joined: 5 Nov 06 Posts: 24 Credit: 548,923 RAC: 0 |
well i checked your solution but no go in this file the link is : <scheduler_url>http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi</scheduler_url> If i copy this into my browser i have no problem at all to connect to that server and page and it shows me the following info: <?xml version=\"1.0\" encoding=\"ISO-8859-1\" ?> - <scheduler_reply> <scheduler_version>509</scheduler_version> <master_url>http://climateprediction.net/</master_url> <message priority=\"low\">Incomplete request received.</message> </scheduler_reply> So i guess something else is buggy on cpdn because my other machine has now contacted this server as well So any one else some bright ideas besides firewall or proxy (not the problem anyway) I stopped this project because its allready more then 100 hours running and i see no point in running until the problem is fixed This is the 5th time i have trouble running cpdn. |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
But how did \'targhee\' get embedded in that file? And why was BOINC stubbornly trying to use that instead of climateapps2. When the master file is downloaded BOINC extracts all the <scheduler> tags and writes them to the <project> section of client_state.xml (as <scheduler_url> tags and after clearing out the current tags). So the files should always have the same set of scheduler names. BOINC writes to client_state.xml in many places (e.g. after every checkpoint) and the in-memory copy of the project scheduler set will be written to the file. Which suggests that something caused the real scheduler value to be overwritten. What\'s recorded in stdoutdae.txt between the start of the last successful scheduler request and the first failure? "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 5 Nov 06 Posts: 24 Credit: 548,923 RAC: 0 |
Me amazed i have no clue what happened but my machine needed a reboot for an update from my burning software. After the reboot the cpdn finally found the right server again i don\'t see any significant changes in the xml files maybe it got messed because i have cpdn seasonal running as well. Seems the problem got solved by itself |
Send message Joined: 19 Jan 07 Posts: 9 Credit: 2,233,821 RAC: 0 |
Sorry about the slow reply, been sick for the last few days. Can\'t see anything odd in stdoutae.txt between the last good trickle and the first fail 2007-07-02 13:59:50 [climateprediction.net] Restarting task hadcm3inct_cmmd_1920_160_25869521_2 using hadcm3i version 540 2007-07-03 00:08:55 [climateprediction.net] Sending scheduler request: To send trickle-up message 2007-07-03 00:08:55 [climateprediction.net] (not requesting new work or reporting completed tasks) 2007-07-03 00:09:00 [climateprediction.net] Scheduler RPC succeeded [server version 509] 2007-07-03 02:12:19 [---] Exit requested by user To pause/resume tasks hit CTRL-C, to exit hit CTRL-BREAK 2007-07-03 02:12:31 [---] Starting BOINC client version 5.8.16 for windows_intelx86 2007-07-03 02:12:31 [---] log flags: task, file_xfer, sched_ops 2007-07-03 02:12:31 [---] Libraries: libcurl/7.16.0 OpenSSL/0.9.8a zlib/1.2.3 2007-07-03 02:12:31 [---] Data directory: C:\\Program Files\\BOINC 2007-07-03 02:12:31 [---] Processor: 1 GenuineIntel Intel(R) Pentium(R) 4 CPU 2.40GHz [x86 Family 15 Model 2 Stepping 7] [fpu tsc sse mmx] 2007-07-03 02:12:31 [---] Memory: 1021.99 MB physical, 2.40 GB virtual 2007-07-03 02:12:31 [---] Disk: 74.50 GB total, 14.00 GB free 2007-07-03 02:12:31 [climateprediction.net] URL: http://climateprediction.net/; Computer ID: 675496; location: (none); project prefs: default 2007-07-03 02:12:31 [---] General prefs: from http://bbc.cpdn.org/ (last modified 2006-03-23 11:26:33) 2007-07-03 02:12:31 [---] Host location: none 2007-07-03 02:12:31 [---] General prefs: using your defaults 2007-07-03 02:12:31 [climateprediction.net] Restarting task hadcm3inct_cmmd_1920_160_25869521_2 using hadcm3i version 540 2007-07-03 14:23:33 [---] Exit requested by user To pause/resume tasks hit CTRL-C, to exit hit CTRL-BREAK 2007-07-03 14:23:47 [---] Starting BOINC client version 5.8.16 for windows_intelx86 2007-07-03 14:23:47 [---] log flags: task, file_xfer, sched_ops 2007-07-03 14:23:47 [---] Libraries: libcurl/7.16.0 OpenSSL/0.9.8a zlib/1.2.3 2007-07-03 14:23:47 [---] Data directory: C:\\Program Files\\BOINC 2007-07-03 14:23:47 [---] Processor: 1 GenuineIntel Intel(R) Pentium(R) 4 CPU 2.40GHz [x86 Family 15 Model 2 Stepping 7] [fpu tsc sse mmx] 2007-07-03 14:23:47 [---] Memory: 1021.99 MB physical, 2.40 GB virtual 2007-07-03 14:23:47 [---] Disk: 74.50 GB total, 13.84 GB free 2007-07-03 14:23:47 [climateprediction.net] URL: http://climateprediction.net/; Computer ID: 675496; location: (none); project prefs: default 2007-07-03 14:23:47 [---] General prefs: from http://bbc.cpdn.org/ (last modified 2006-03-23 11:26:33) 2007-07-03 14:23:47 [---] Host location: none 2007-07-03 14:23:47 [---] General prefs: using your defaults 2007-07-03 14:23:47 [climateprediction.net] Restarting task hadcm3inct_cmmd_1920_160_25869521_2 using hadcm3i version 540 2007-07-04 02:37:41 [---] Exit requested by user To pause/resume tasks hit CTRL-C, to exit hit CTRL-BREAK 2007-07-04 02:37:54 [---] Starting BOINC client version 5.8.16 for windows_intelx86 2007-07-04 02:37:54 [---] log flags: task, file_xfer, sched_ops 2007-07-04 02:37:54 [---] Libraries: libcurl/7.16.0 OpenSSL/0.9.8a zlib/1.2.3 2007-07-04 02:37:54 [---] Data directory: C:\\Program Files\\BOINC 2007-07-04 02:37:54 [---] Processor: 1 GenuineIntel Intel(R) Pentium(R) 4 CPU 2.40GHz [x86 Family 15 Model 2 Stepping 7] [fpu tsc sse mmx] 2007-07-04 02:37:54 [---] Memory: 1021.99 MB physical, 2.40 GB virtual 2007-07-04 02:37:54 [---] Disk: 74.50 GB total, 13.72 GB free 2007-07-04 02:37:54 [climateprediction.net] URL: http://climateprediction.net/; Computer ID: 675496; location: (none); project prefs: default 2007-07-04 02:37:54 [---] General prefs: from http://bbc.cpdn.org/ (last modified 2006-03-23 11:26:33) 2007-07-04 02:37:54 [---] Host location: none 2007-07-04 02:37:54 [---] General prefs: using your defaults 2007-07-04 02:37:54 [climateprediction.net] Restarting task hadcm3inct_cmmd_1920_160_25869521_2 using hadcm3i version 540 2007-07-04 05:27:21 [climateprediction.net] Sending scheduler request: To send trickle-up message 2007-07-04 05:27:21 [climateprediction.net] (not requesting new work or reporting completed tasks) 2007-07-04 05:27:22 [---] Project communication failed: attempting access to reference site 2007-07-04 05:27:23 [---] Access to reference site succeeded - project servers may be temporarily down. 2007-07-04 05:27:26 [climateprediction.net] Scheduler request failed: failed sending data to the peer 2007-07-04 05:27:26 [climateprediction.net] Deferring communication for 1 min 0 sec Note that the exits are caused by backups (which happen every 12 hours) --Richard |
©2024 climateprediction.net