climateprediction.net (CPDN) home page
Thread 'Stuck upload issue'

Thread 'Stuck upload issue'

Message boards : Number crunching : Stuck upload issue
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 57766 - Posted: 2 Feb 2018, 8:11:47 UTC - in response to Message 57764.  

I have had a stuck upload from batch 691 for so many weeks I have lost track.


My stuck upload (cam25) is also from batch 691. It's the restart.zip upload and it's stuck at 27.64%. Do we abort these uploads or continue to have patience?
ID: 57766 · Report as offensive     Reply Quote
BetelgeuseFive

Send message
Joined: 31 Aug 04
Posts: 10
Credit: 2,538,005
RAC: 0
Message 57785 - Posted: 17 Feb 2018, 9:18:38 UTC

I still have two stuck uploads from the following tasks:

https://www.cpdn.org/cpdnboinc/result.php?resultid=20919737
https://www.cpdn.org/cpdnboinc/result.php?resultid=20919722

Both uploads are appr. 105 Mb in size.
One of them is stuck at 53 Mb the other one at 46 Mb.

As no one seems to be willing to look into this or tell us what to do about it I have set no new tasks for CPDN until this is resolved.

Tom

17/02/2018 10:15:07 | climateprediction.net | Started upload of wah2_cam25_a03s_200405_18_689_011368580_0_r616315434_17.zip
17/02/2018 10:15:07 | climateprediction.net | Started upload of wah2_cam25_a047_200405_18_689_011368595_0_r2068114942_7.zip
17/02/2018 10:15:29 | | Project communication failed: attempting access to reference site

17/02/2018 10:15:29 | climateprediction.net | Temporarily failed upload of wah2_cam25_a03s_200405_18_689_011368580_0_r616315434_17.zip: transient HTTP error
17/02/2018 10:15:29 | climateprediction.net | Backing off 05:16:01 on upload of wah2_cam25_a03s_200405_18_689_011368580_0_r616315434_17.zip
17/02/2018 10:15:30 | | Internet access OK - project servers may be temporarily down.
17/02/2018 10:15:34 | | Project communication failed: attempting access to reference site
17/02/2018 10:15:34 | climateprediction.net | Temporarily failed upload of wah2_cam25_a047_200405_18_689_011368595_0_r2068114942_7.zip: transient HTTP error
17/02/2018 10:15:34 | climateprediction.net | Backing off 03:50:57 on upload of wah2_cam25_a047_200405_18_689_011368595_0_r2068114942_7.zip
17/02/2018 10:15:35 | | Internet access OK - project servers may be temporarily down.
ID: 57785 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 57786 - Posted: 17 Feb 2018, 10:57:12 UTC

Yes, I would like to add my frustration with these CAM25 stuck uploads. Does anybody have a clue about these?

I know from earlier posts that these are uploaded to a server that is distant from Oxford so it isn't of itself an Oxford problem. And previous posts have said that Oxford doesn't have the capacity to divert the CAM25 uploads to their servers. I am tempted to abort the transfers to clear it from my end but I am reluctant to abort if it can be solved.

Any advice would be welcome.
ID: 57786 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 57791 - Posted: 17 Feb 2018, 12:40:31 UTC - in response to Message 57786.  

I don't know if any of the current problems are down to the recent outage of the virtual machine at Oxford and probably won't be able to find out till Monday. The only tasks I have at the moment are SAS50's which seem to be fine. I will post the question to the Oxford Team however.
ID: 57791 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 57794 - Posted: 17 Feb 2018, 13:01:44 UTC - in response to Message 57791.  

Dave, My stuck uploads have been stuck since before the recent outage and are still stuck.
ID: 57794 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 57795 - Posted: 17 Feb 2018, 14:01:17 UTC - in response to Message 57794.  

I will request that someone nudges the people in Mexico. Don't know how much effect it will have though.
ID: 57795 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 57797 - Posted: 17 Feb 2018, 18:53:14 UTC
Last modified: 17 Feb 2018, 18:58:18 UTC

A more recent CAM25 has completed and fully uploaded successfully. The successful task was batch 694. The stuck upload task is batch 691. That may help.

I note that BetelgeuseFive's stuck uploads are from batch 689. Perhaps someone in Mexico has gone to sleep regarding the earlier batches.

EDIT. I also have a later CAM25 model from the same batch, 691, which has finished and fully uploaded successfully.
ID: 57797 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,151,719
RAC: 15,407
Message 57798 - Posted: 18 Feb 2018, 0:03:27 UTC - in response to Message 57797.  

Mine from batch 691 is still stuck - from Jan6th!!
ID: 57798 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 57806 - Posted: 19 Feb 2018, 10:28:50 UTC

For anyone happy with playing with their config files, might be worth looking at this post which is to do with stuck uploads on another Mexican batch https://www.cpdn.org/cpdnboinc/forum_thread.php?id=8251#54582

From discussion following my reporting of the issue still being unresolved, ideas around are that on one previous batch to a different server, the thing that got the stuck ones going again was the possibly extreme option of rebooting the server causing the uploads to start again from scratch. Also possibly something to do with size of uploads. Andy is going to liaise with those in Mexico, presumably this afternoon seeing as they are six hours behind UK.
ID: 57806 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 57807 - Posted: 19 Feb 2018, 12:59:47 UTC - in response to Message 57806.  

<quote> For anyone happy with playing with their config files, might be worth looking at this post which is to do with stuck uploads on another Mexican batch https://www.cpdn.org/cpdnboinc/forum_thread.php?id=8251#54582 <unquote>

I have been running with that fix installed since Les Bayliss first published and have never taken it out again. So it can't be that.
ID: 57807 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 57819 - Posted: 19 Feb 2018, 21:35:46 UTC
Last modified: 19 Feb 2018, 21:37:19 UTC

@WB8ILI, @Lockleys, @Alan K, Does part of the message/event log about this have "locked by file_upload_handler PID=" in the output? Just trying to make sure this is the same problem as other cam25 upload problems here and on the dev site. Thanks.
ID: 57819 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,860,147
RAC: 4,891
Message 57820 - Posted: 19 Feb 2018, 21:50:49 UTC - in response to Message 57819.  

@WB8ILI, @Lockleys, @Alan K, Does part of the message/event log about this have "locked by file_upload_handler PID=" in the output? Just trying to make sure this is the same problem as other cam25 upload problems here and on the dev site. Thanks.

Note that for my stuck CAM25, which showed that error on the first upload failure, the error messages are now only “transient HTTP error” etc. The rest of that model’s uploads cleared without difficulty.
ID: 57820 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 57823 - Posted: 19 Feb 2018, 23:24:35 UTC - in response to Message 57819.  

@WB8ILI, @Lockleys, @Alan K, Does part of the message/event log about this have "locked by file_upload_handler PID=" in the output? Just trying to make sure this is the same problem as other cam25 upload problems here and on the dev site. Thanks.

I haven't seen this message in the Event Log. Just 19/02/2018 21:28:22 | climateprediction.net | Temporarily failed upload of wah2_cam25_a05e_200405_18_691_011369743_0_r1614740798_restart.zip: transient HTTP error
ID: 57823 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,151,719
RAC: 15,407
Message 57832 - Posted: 20 Feb 2018, 17:27:17 UTC - in response to Message 57819.  
Last modified: 20 Feb 2018, 17:28:06 UTC

Don't know as my log file doesn't go back to 6th Jan when the problem first occured. Just getting the transient HTTP messages now.
ID: 57832 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,860,147
RAC: 4,891
Message 57895 - Posted: 5 Mar 2018, 23:18:55 UTC

I lost patience with my stuck CAM25 Zip file and aborted the upload. The model is now available for download here so someone can now get it right ...
ID: 57895 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 57964 - Posted: 21 Mar 2018, 7:13:35 UTC

Zips for both 708 and 709 are not uploading for me. with http debug enabled all I get is

21/03/2018 07:08:51 | climateprediction.net | Started upload of wah2_eu25_qi1y_200712_13_709_011481271_1_r1628680715_1.zip
21/03/2018 07:08:52 | | [http_xfer] [ID#18] HTTP: wrote 93 bytes
21/03/2018 07:09:14 | | [http_xfer] [ID#18] HTTP: wrote 221 bytes
21/03/2018 07:09:15 | climateprediction.net | Backing off 00:24:36 on upload of wah2_eu25_qi1y_200712_13_709_011481271_1_r1628680715_1.zip
ID: 57964 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 57965 - Posted: 21 Mar 2018, 7:37:55 UTC - in response to Message 57964.  

Zips for both 708 and 709 are not uploading for me. with http debug enabled all I get is

21/03/2018 07:08:51 | climateprediction.net | Started upload of wah2_eu25_qi1y_200712_13_709_011481271_1_r1628680715_1.zip
21/03/2018 07:08:52 | | [http_xfer] [ID#18] HTTP: wrote 93 bytes
21/03/2018 07:09:14 | | [http_xfer] [ID#18] HTTP: wrote 221 bytes
21/03/2018 07:09:15 | climateprediction.net | Backing off 00:24:36 on upload of wah2_eu25_qi1y_200712_13_709_011481271_1_r1628680715_1.zip


My 708 and 709 zips uploaded as normal during the UK night.
ID: 57965 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 57966 - Posted: 21 Mar 2018, 14:54:42 UTC

My 708 and 709 zips uploaded as normal during the UK night.


I am still getting the same on 709. on 708 I am now getting the internet access OK project servers may be down.
ID: 57966 · Report as offensive     Reply Quote
ProfilePDW

Send message
Joined: 29 Nov 17
Posts: 82
Credit: 15,317,720
RAC: 80,270
Message 58007 - Posted: 29 Mar 2018, 7:24:17 UTC - in response to Message 57595.  

I still have one task that is stuck uploading to upload6, it has sent 6.66/104.92MB so far...

06/01/2018 08:54:02 | climateprediction.net | Started upload of wah2_cam25_a03y_200405_18_689_011368586_0_r102364835_9.zip

This one is still stuck, I'll post when it has gone rather than keep saying it hasn't.

This one has finally uploaded successfully !
ID: 58007 · Report as offensive     Reply Quote
Chairmanmeow

Send message
Joined: 8 Dec 05
Posts: 3
Credit: 732,203
RAC: 951
Message 58538 - Posted: 4 Aug 2018, 20:24:13 UTC

These two have been stuck for a couple of weeks trying to upload. I doubt they ever sent a single trickle either.. Looks like it hasn't worked since May! Meow


General
URL
http://ithaqua.oerc.ox.ac.uk/cpdnboinc/
User name
Chairmanmeow
Team name
Project Blue Book
Resource share
100
Scheduler RPC deferred for
23:59:06
Disk usage
240.62 MB
Computer ID
1460825
Suspended via GUI
no
Don't request tasks
no
Trickle-up pending
yes
Host location
home
Tasks completed
2
Tasks failed
0
Credit
User
443,499 total, 326.90 average
Host
0 total, 0.00 average
Scheduling
Scheduling priority
-0.00
Last scheduler reply
5/23/2018 11:16:16 PM
[/img]
ID: 58538 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Stuck upload issue

©2024 cpdn.org