climateprediction.net home page
Project communication failed: attempting access to reference site
Project communication failed: attempting access to reference site
log in

Advanced search

Message boards : Number crunching : Project communication failed: attempting access to reference site

Previous · 1 · 2 · 3 · Next
Author Message
Profile Thyme Lawn
Volunteer moderator
Send message
Joined: 5 Aug 04
Posts: 1276
Credit: 15,490,488
RAC: 4,728
Message 54808 - Posted: 21 Sep 2016, 10:22:50 UTC - in response to Message 54807.

The message logs show that 4,540,690 bytes are being transferred for Vitalii's wah2_mex25_c0fh_199012_13_410_010609033_1_5.zip file and 69,646,991 bytes for Lockley's wah2_mex25_c0il_199112_13_410_010609145_0_12.zip file. What are their sizes showing as on BOINC Manager's Transfers tab? If they're larger it'll indicate that the server has partially accepted them.
____________
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer

Profile Byron Leigh Hatch @ team Carl Sagan
Avatar
Send message
Joined: 17 Aug 04
Posts: 280
Credit: 43,277,847
RAC: 5,991
Message 54810 - Posted: 21 Sep 2016, 14:20:45 UTC



my computer just now successfuly uploaded a mex25 zip file.

I'm posting this information in the hope,
that my BOINC files might have some info that would help?

Here's my log of mex25 with checked http_debug & http_xfer_debug on:

http://pastebin.com/W8RRa1hF

the HTTP1.0 is on

Click here to see my <cc_config>

my computer:
Win 10 pro x64 Edition Build 1511 (10.00.10586.00)
BOINC client version 7.6.22

wah2_mex25_c0k5_199212_13_410_010609201_2




Lockleys
Send message
Joined: 13 Jan 07
Posts: 183
Credit: 9,541,689
RAC: 4,256
Message 54811 - Posted: 21 Sep 2016, 14:44:55 UTC - in response to Message 54808.

Sadly, I'm now away from the PC for a couple of days. I'll have to look at the zip size when I get back. Probably Friday, UK time.

Following on from Byron Leigh's post, I should say that I have run a number of MEX models to successful conclusion and upload. This one is the only one to have been stuck and I have kept it without aborting it in unreasonable hope only.

Profile Byron Leigh Hatch @ team Carl Sagan
Avatar
Send message
Joined: 17 Aug 04
Posts: 280
Credit: 43,277,847
RAC: 5,991
Message 54814 - Posted: 21 Sep 2016, 16:56:24 UTC
Last modified: 21 Sep 2016, 17:05:28 UTC

-


Some more information that may help?

my event Log of mex25 --- with --- http_debug - and - http_xfer_debug --- on:

http://pastebin.com/0ZfJ0155

-

Profile Thyme Lawn
Volunteer moderator
Send message
Joined: 5 Aug 04
Posts: 1276
Credit: 15,490,488
RAC: 4,728
Message 54816 - Posted: 21 Sep 2016, 18:21:05 UTC

Andy has increased the HTTP timeout on upload6. Could anyone who has a stuck mex25 upload please check if that allows it to complete.
____________
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer

Vitalii Koshura
Send message
Joined: 25 Mar 09
Posts: 11
Credit: 2,277,510
RAC: 2,521
Message 54818 - Posted: 21 Sep 2016, 21:16:16 UTC

Hello.

I've tried to increase the timeout

<http_transfer_timeout>3000</http_transfer_timeout> <http_transfer_timeout_bps>100</http_transfer_timeout_bps>

The result is the same: http://pastebin.com/PvbJ04HC
Here's the unfinished transfer:
https://yadi.sk/i/Wir4FQBUvWaNC
The unsent file was last modified on August 10h 2016 but as I see from the trickle events there was a transfers later (but I do not know if those transfers contains unfinished one):
http://pastebin.com/BNphiX1n
Maybe this can bring some light into the whole situation.

Thank you all for your help!

Profile Thyme Lawn
Volunteer moderator
Send message
Joined: 5 Aug 04
Posts: 1276
Credit: 15,490,488
RAC: 4,728
Message 54819 - Posted: 21 Sep 2016, 22:31:17 UTC - in response to Message 54818.

Thanks a lot Vitalii, your image makes everything much clearer.

Your stuck file is 105.26MB, the server has already received 100.99MB and BOINC can't send the last 4,540,690 bytes of the file. This is very conclusively pointing towards the cause being a file size limit on the server. The project team have been notified and it should hopefully be sorted out tomorrow.
____________
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer

Vitalii Koshura
Send message
Joined: 25 Mar 09
Posts: 11
Credit: 2,277,510
RAC: 2,521
Message 54820 - Posted: 21 Sep 2016, 22:41:57 UTC - in response to Message 54819.

Thank you!

Lockleys
Send message
Joined: 13 Jan 07
Posts: 183
Credit: 9,541,689
RAC: 4,256
Message 54823 - Posted: 23 Sep 2016, 15:06:47 UTC

Now that I have returned from roaming and am with my PC, I can confirm that my MEX stuck upload is 105.67MB and has frozen after 37.64% has previously uploaded.

Lockleys
Send message
Joined: 13 Jan 07
Posts: 183
Credit: 9,541,689
RAC: 4,256
Message 54824 - Posted: 23 Sep 2016, 17:18:44 UTC - in response to Message 54823.

Now that I have returned from roaming and am with my PC, I can confirm that my MEX stuck upload is 105.67MB and has frozen after 37.64% has previously uploaded.

To clarify. It is still stuck at 37.64%.

Les Bayliss
Volunteer moderator
Send message
Joined: 5 Sep 04
Posts: 6866
Credit: 20,843,205
RAC: 216
Message 54825 - Posted: 23 Sep 2016, 21:06:25 UTC - in response to Message 54824.

Starting to have difficulty finding straws to clutch at.
Maybe Suspend BOINC, Exit, then restart BOINC.

Profile astroWX
Volunteer moderator
Send message
Joined: 5 Aug 04
Posts: 1494
Credit: 92,619,120
RAC: 34,303
Message 54826 - Posted: 23 Sep 2016, 22:11:17 UTC - in response to Message 54825.

There may not be useful straws available, Les. My experience with MEX25 was that failed uploads tended to remain failed uploads regardless of suggested workarounds. My "solution" (perhaps mistaken) was to abort partially-uploaded files after multiple failures in the assumption that the receiving end has software to identify aborted partially-uploaded files and staff on the receiving-server would look into possible reasons.

Fortunately (for me) recent tasks are concentrated in 'sas' and 'sam' batches with the occasional 'eu' retread thrown in.

I think we have to hope Andy's communication with techs in Mexico will solve problems at receiver's end.
____________
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.

Lockleys
Send message
Joined: 13 Jan 07
Posts: 183
Credit: 9,541,689
RAC: 4,256
Message 54828 - Posted: 24 Sep 2016, 4:17:57 UTC - in response to Message 54825.

I've done that (suspend BOINC, exit, restart) several times, including Windows reboot, since the stuck zip appeared. I have even taken a backup and moved it to a different PC in a different town and tried the upload there. No benefit.

Vitalii Koshura
Send message
Joined: 25 Mar 09
Posts: 11
Credit: 2,277,510
RAC: 2,521
Message 54830 - Posted: 24 Sep 2016, 10:19:53 UTC - in response to Message 54828.

This is definitely a server issue because I'm also tried to restart my PC and the results is still the same.

Les Bayliss
Volunteer moderator
Send message
Joined: 5 Sep 04
Posts: 6866
Credit: 20,843,205
RAC: 216
Message 54834 - Posted: 25 Sep 2016, 2:16:24 UTC

My interest in the restart is due to this message, just after Andy had increased the server time out.
I had hoped that the restart would cause BOINC to try again from the start of the file.

As this hasn't had any effect, I have one last idea, and then it's back to: multiple problems in multiple places, starting at the user's computer and going all the way to the server. Which is probably unsolvable.

John Eric Hopkinson
Send message
Joined: 27 Jan 05
Posts: 74
Credit: 1,028,011
RAC: 0
Message 54835 - Posted: 25 Sep 2016, 13:24:45 UTC - in response to Message 54834.

Les et al:
I may have a partial answer from a very recent similar situation in which my computer was shut down by Windows 10, in the wee hours of a morning last week.
On a previous occasion, about two weeks prior to the latest incident, Win10 did the same thing, causing the failure of two wu,s, and apoplexy for me.
Not understanding how to control Win10 has caused me a lot of grief, however, on the most recent occasion, two of three wu,s restarted, but one hung up in the transfer processor. If I knew how to post photos I would send a copy of a picture showing "restart_sas50_s01z_1991_-1201_rd00.....etc?"
It hesitated several times and I forced the restart, and all threee wu,s from that issue are progressing well, except that the one quoted above is retarded, and I am not sure what will happen on completions of its partners.
On two previous notifications from WIN10, I have set a scheduled time, or suspended the program during upgrades etc. with no reaction from the tasks.
So Les you may be correct in assuming that several factors are involved in failures, and that maybe we need to accomodate ourselves to the idio(t)syncrosies of our masters at M$.
____________

Vitalii Koshura
Send message
Joined: 25 Mar 09
Posts: 11
Credit: 2,277,510
RAC: 2,521
Message 54895 - Posted: 9 Oct 2016, 8:46:26 UTC

Hello. Two days ago this file was successfully uploaded. I do not know what was happened. I changed nothing from my side. Seems this was server-related issue and now it is resolved.
Thanks

Lockleys
Send message
Joined: 13 Jan 07
Posts: 183
Credit: 9,541,689
RAC: 4,256
Message 54896 - Posted: 9 Oct 2016, 10:31:36 UTC

Unlike Vitalii's, my outstanding upload is still stuck. Curiouser and curiouser.

Profile Byron Leigh Hatch @ team Carl Sagan
Avatar
Send message
Joined: 17 Aug 04
Posts: 280
Credit: 43,277,847
RAC: 5,991
Message 54897 - Posted: 9 Oct 2016, 15:56:24 UTC

Hello Vitalii Koshura, I'm happy that your file was successfully uploaded. Hello Lockleys, I'm sorry to hear that your outstanding upload is still stuck. I hope someone will be able to figure this out. Like you say it gets curiouser and curiouser. My wah2_mex25 successfully uploaded.

Les Bayliss
Volunteer moderator
Send message
Joined: 5 Sep 04
Posts: 6866
Credit: 20,843,205
RAC: 216
Message 54900 - Posted: 9 Oct 2016, 23:54:03 UTC - in response to Message 54896.

Lockleys

Could you provide a link to the task that hasn't "completed' please.
And is there a zip stuck, or is it not "reporting"?

(The later is a new problem, which has been reported.)

Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Project communication failed: attempting access to reference site


Main page · Your account · Message boards


Copyright © 2019 climateprediction.net