climateprediction.net home page
The uploads are stuck

The uploads are stuck

Message boards : Number crunching : The uploads are stuck
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 25 · Next

AuthorMessage
Stony666

Send message
Joined: 9 Feb 21
Posts: 9
Credit: 10,334,808
RAC: 880,522
Message 67699 - Posted: 14 Jan 2023, 12:05:33 UTC - in response to Message 67695.  
Last modified: 14 Jan 2023, 12:05:57 UTC

I have changed the count of uploads from 2 to 1 now. Maybe I can see something...
ID: 67699 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 52,932,477
RAC: 8,823
Message 67701 - Posted: 14 Jan 2023, 13:04:48 UTC
Last modified: 14 Jan 2023, 13:14:01 UTC

I have changed the count of uploads from 2 to 1 now. Maybe I can see something...


I didn't change anything, I just occasionally clicked the 'retry pending transfers' option in Menu->Tools.

Not requesting tasks: too many uploads in progress


Anyone know what the uploads limit is? I guess it must be a few thousand files.
ID: 67701 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 925
Credit: 34,100,818
RAC: 11,270
Message 67703 - Posted: 14 Jan 2023, 13:27:24 UTC - in response to Message 67701.  

Anyone know what the uploads limit is? I guess it must be a few thousand files.
It's not a file limit, it's a task limit. Twice as many tasks uploading as you have CPU cores.

Watch out for tasks which are held up because just one or two files got stuck while uploading, and haven't retried yet. I can't work out what system BOINC uses to decide what to upload yet, but I've found a technique which seems to help. Go through this sequence:

  • Suspend network activity (BOINC Manager, Advanced view, Activity menu)
  • Retry all transfers (Tools Menu)
  • Allow network activity

That just cleared the last six tasks due by 25 January, on one machine.

ID: 67703 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 52,932,477
RAC: 8,823
Message 67704 - Posted: 14 Jan 2023, 13:42:35 UTC - in response to Message 67703.  

Okay thanks.
ID: 67704 · Report as offensive     Reply Quote
Stony666

Send message
Joined: 9 Feb 21
Posts: 9
Credit: 10,334,808
RAC: 880,522
Message 67705 - Posted: 14 Jan 2023, 14:08:24 UTC
Last modified: 14 Jan 2023, 14:08:36 UTC

Does somebody know when full bandwith will be available again?

What is the status of workload transfer away from the upload server?
ID: 67705 · Report as offensive     Reply Quote
Profile Landjunge

Send message
Joined: 17 Aug 07
Posts: 8
Credit: 35,353,184
RAC: 1,966,142
Message 67708 - Posted: 14 Jan 2023, 15:08:45 UTC - in response to Message 67705.  

Does somebody know when full bandwith will be available again?

What is the status of workload transfer away from the upload server?


That would be my question too. Still 250 completed WUs to upload.
ID: 67708 · Report as offensive     Reply Quote
nairb

Send message
Joined: 3 Sep 04
Posts: 105
Credit: 5,646,090
RAC: 102,785
Message 67710 - Posted: 14 Jan 2023, 16:00:38 UTC

For some reason I seem to be blessed with endless uploads. It worked all thru the night and now keeps up with the output of 4 running w/u.
It does seem that once a connection is made it keeps uploading the zips until they are all gone....... lucky me..
ID: 67710 · Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 5 Aug 04
Posts: 170
Credit: 9,709,240
RAC: 13,737
Message 67712 - Posted: 14 Jan 2023, 16:33:12 UTC

I have stopped testing on my machines, will only crunch what has already been downloaded and wait until the backlog has cleared.

Afterwards it will be time for a new try


Supporting BOINC, a great concept !
ID: 67712 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 925
Credit: 34,100,818
RAC: 11,270
Message 67716 - Posted: 14 Jan 2023, 17:17:11 UTC - in response to Message 67703.  

I've found a technique which seems to help. Go through this sequence:

  • Suspend network activity (BOINC Manager, Advanced view, Activity menu)
  • Retry all transfers (Tools Menu)
  • Allow network activity

That just cleared the last six tasks due by 25 January, on one machine.

And it's just worked on my second machine as well - tied off the loose ends from 14 tasks in a single hour.

14/01/2023 17:12:27 | climateprediction.net | Reporting 14 completed tasks
I usually wait until both connections have stalled, and the queue has gone into 'project backoff'. Not sure if that's a significant part of the procedure, but it can't spoil it.
ID: 67716 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 87
Credit: 32,677,418
RAC: 29,639
Message 67718 - Posted: 14 Jan 2023, 17:41:02 UTC - in response to Message 67705.  

I also would like to know if there is any estimate of when the full backlog will clear at whatever rate upload servers can take. For the past few times the upload servers are up, a few people were consistently able to upload while others were not. This likely depends on latency and routes to the server. The lucky ones would start fetching, crunching and upload more, while others like me, can't upload at all until enough of the backlog is cleared leaving more bandwidth available. So far I've only seen a few trickles up but that's it. Since there is no scheduling involved in competing for the upload, there is a possibility that the recovery will simply not be fast enough to get to the bottom of the list, not just within the deadline, but even long after deadline.
ID: 67718 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1045
Credit: 16,506,818
RAC: 5,001
Message 67721 - Posted: 14 Jan 2023, 18:09:46 UTC - in response to Message 67718.  

This likely depends on latency and routes to the server. The lucky ones would start fetching, crunching and upload more, while others like me, can't upload at all until enough of the backlog is cleared leaving more bandwidth available. So far I've only seen a few trickles up but that's it.


You might be right.

I am getting pretty good response from the upload server, though not as good as it was about 10 (?) days ago. I have high speed (75 megabit/second) fiber optic Internet connection, but I am in USA and the server is in England. So right now, traceroute does not make it all the way to the server. It did recently.

But notice the big delay from New York to London. Step 8 to step 9. This is usual and unchanged. IIRC, the server is at about step 22.

$ traceroute upload11.cpdn.org
traceroute to upload11.cpdn.org (192.171.169.187), 30 hops max, 60 byte packets
 1  Fios_Quantum_Gateway.fios-router.home (192.168.0.1)  0.311 ms  0.444 ms  1.175 ms
 2  lo0-100.NWRKNJ-VFTTP-309.verizon-gni.net (71.127.205.1)  5.103 ms  4.998 ms  7.423 ms
 3  at-0-0-0-1717.ALT2-CORE-RTR2.verizon-gni.net (100.41.5.70)  9.942 ms at-0-0-0-1716.ALT2-CORE-RTR1.verizon-gni.net (100.41.5.68)  10.210 ms at-0-0-0-1717.ALT2-CORE-RTR2.verizon-gni.net (100.41.5.70)  10.022 ms
 4  0.csi1.NWRKNJ02-MSE01-BB-SU1.ALTER.NET (140.222.4.104)  10.105 ms 0.csi1.NBWKNJNB-MSE01-BB-SU1.ALTER.NET (140.222.4.106)  10.360 ms  10.248 ms
 5  * * *
 6  * * *
 7  * nyk-b2-link.ip.twelve99.net (80.239.192.36)  8.351 ms  6.766 ms
 8  nyk-bb1-link.ip.twelve99.net (62.115.135.160)  11.306 ms  7.742 ms *
 9  * ldn-bb4-link.ip.twelve99.net (62.115.112.245)  82.895 ms  82.778 ms
10  ldn-b2-link.ip.twelve99.net (62.115.122.189)  82.672 ms ldn-b2-link.ip.twelve99.net (62.115.120.239)  77.529 ms  78.626 ms
11  jisc-ic345131-ldn-b2.ip.twelve99-cust.net (62.115.175.131)  75.521 ms  77.428 ms  75.628 ms
12  ae24.londhx-sbr1.ja.net (146.97.35.197)  75.679 ms  78.389 ms  78.327 ms
13  ae29.londpg-sbr2.ja.net (146.97.33.2)  77.522 ms  77.470 ms  74.948 ms
14  ae31.erdiss-sbr2.ja.net (146.97.33.22)  82.243 ms  87.314 ms  82.204 ms
15  * * *
16  ral-r26.ja.net (146.97.41.34)  80.929 ms  83.553 ms  79.803 ms
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *

ID: 67721 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 87
Credit: 32,677,418
RAC: 29,639
Message 67725 - Posted: 14 Jan 2023, 23:52:13 UTC - in response to Message 67721.  
Last modified: 15 Jan 2023, 0:05:17 UTC

Thanks for the traceroute output. The server doesn't respond to ICMP packets, probably blocked for security. ral-r26.ja.net seems to be the last hop everyone sees from traceroute. Your latency is pretty low, compared to my 140-150ms to reach ral-r26.ja.net.
I have a loop running to kick retry every 30 minutes in case boinc client backs off for hours. So far the success rate is one trickle file per hour. That's not going to finish uploading a single WU by deadline unless the pipe gets magically unclogged. On the other hand, that's almost double the rate I had yesterday so perhaps there is still some hope. ¯\_(ツ)_/¯
ID: 67725 · Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 2 Oct 06
Posts: 52
Credit: 26,209,214
RAC: 3,355
Message 67727 - Posted: 15 Jan 2023, 4:14:00 UTC

I have yet to have a single task to upload, since the beginning of this back in December. It's frustrating to see so many other get at lease SOME uploads to complete, if not ALL. Also, it's not clear to me that the project has the resources and management to ever get in front of this. Again, frustrating.
ID: 67727 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1045
Credit: 16,506,818
RAC: 5,001
Message 67728 - Posted: 15 Jan 2023, 6:47:11 UTC - in response to Message 67727.  

I have yet to have a single task to upload, since the beginning of this back in December. It's frustrating to see so many other get at lease SOME uploads to complete, if not ALL. Also, it's not clear to me that the project has the resources and management to ever get in front of this. Again, frustrating.


I am not an expert, but I am not having much trouble uploading those 14 Megabyte zip files. I run five of the oifs_43r3_ps tasks at a time, so I produce a lot of them. Right now, they do not go up smoothly as they are produced, but a bunch go up at about the same time often enough that the number that remain is not growing with time. Now I have a high speed (fiber-optic) link to the Internet here near New York City, but the hop to England seems to cost 70 to 80 milliseconds to cross the ocean.

If you cannot get any uploads to compete in two weeks, something is wrong either with your boinc settings, your system configuration, or the actual version of your boinc client. (My boinc client is 7.20.2, and yours should not be older than that.)

If I am not mistaken, you should not just wait for it to start working. It seems to me something else is responsible for your (non-)results.
ID: 67728 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 243
Credit: 11,411,542
RAC: 26,913
Message 67729 - Posted: 15 Jan 2023, 8:39:57 UTC - in response to Message 67727.  
Last modified: 15 Jan 2023, 8:42:17 UTC

I have yet to have a single task to upload, since the beginning of this back in December. ...


Actually it looks like several of your PCs have been able to upload and report a number of tasks back on Jan 3 & 4, which I believe was the first time the upload server was up after the holidays. It's just doesn't seem like you've been able to upload (at least report, can't tell the upload part) since then. Possibly the main reason being that your PCs have made no contact with the project going back to end of December beginning of January. Only one of them contacted the project recently, Jan 13, and it has 12 tasks in progress, which is not bad, I'd expect them not take long to upload once you get a connection slot.

If those machines are turned on, I'd start by checking the internet connection of your machines as well as the various BOINC network settings. In BOINC manager, try Update in Projects and Retry Now in Transfers and check the Event log, maybe even turn on some debug flags to see more detail if needed. The way things have been going recently, once BOINC has a constant internet connection, you should be able to upload files consistently but it'll be very intermittent.
ID: 67729 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 52,932,477
RAC: 8,823
Message 67730 - Posted: 15 Jan 2023, 8:41:34 UTC
Last modified: 15 Jan 2023, 8:44:15 UTC

Zombie we were in a similar situation just yesterday, with no uploads since the end of the year.

What changed things for us I believe is that we got militant with clicking the Retry option.

Every 5 mins if the status was 'backing off..' we hit the 'Retry pending transfers' in Menu->Tools

After an hour or so of constant retrying we then got a connection and since then all hosts much of the time now have a connection.
ID: 67730 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,379,331
RAC: 3,596
Message 67732 - Posted: 15 Jan 2023, 9:17:54 UTC

I suspect I am not the only one with this issue. - I have now cleared half my backlog so am down to 12 tasks to upload over a slow connection. Looking at the uploads, I will often see fifty or more uploads going through sequentially finishing with 122.zip but the task still doesn't report because there are a few zips that have got missed and don't go through till another two or three tasks have uploaded so while the number of zips going through might suggest at times I should have cleared four or five tasks, none are ready to report. Other times four or five finish uploading within the same hour to get reported at once. So if zips are uploading but you don't see tasks reporting as quickly as you expect, just give them a bit more time. Various foibles of different connections I suspect will mean some see this more and some less than I do, apart from normal statistical variation as well.
ID: 67732 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 243
Credit: 11,411,542
RAC: 26,913
Message 67736 - Posted: 15 Jan 2023, 10:53:53 UTC - in response to Message 67732.  

The upload pattern seems to be as follows. Uploads are done for tasks in due date order and files are uploaded in sequential order. If any of the files don't go through for some reason, they're moved to the back of the queue and will be attempted again after everything else gets an attempt. This might make it look like the uploads are somewhat random. At some point the client will likely loose a connection slot. When it regains it the process starts from the beginning, i.e. start with files for tasks that are due the soonest. At this point it looks like "clean up" is happening but it's not purposeful in that way but rather the upload pattern starting from the beginning.

So far it's been good progress for me, under 90GB of files left.
ID: 67736 · Report as offensive     Reply Quote
Stony666

Send message
Joined: 9 Feb 21
Posts: 9
Credit: 10,334,808
RAC: 880,522
Message 67738 - Posted: 15 Jan 2023, 11:19:09 UTC - in response to Message 67736.  

Hi.

Does somebody know when full bandwith will be available again?

After updating, rebooting of every part of my chain from box to router nothing has changed.

Beginning of january I was able to upload some WUs. Since this time not one single file finds the way to the server.
I have reconfigured boinc on my boxes on daily base with settings that should help from the forum posts. Now, enough is enough.

I will wait until the first WUs are out of time. Then I will remove all finished WUs from my boxes incl. this project.
I hope, that some real information and not expectations from the responsible team finds the way here into the thread before I will do this.

And not to forget...

Does somebody know when full bandwith will be available again?

Regards Jörg
ID: 67738 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,379,331
RAC: 3,596
Message 67740 - Posted: 15 Jan 2023, 11:35:34 UTC - in response to Message 67738.  

Hi.

Does somebody know when full bandwith will be available again?

After updating, rebooting of every part of my chain from box to router nothing has changed.

Beginning of january I was able to upload some WUs. Since this time not one single file finds the way to the server.
I have reconfigured boinc on my boxes on daily base with settings that should help from the forum posts. Now, enough is enough.

I will wait until the first WUs are out of time. Then I will remove all finished WUs from my boxes incl. this project.
I hope, that some real information and not expectations from the responsible team finds the way here into the thread before I will do this.

And not to forget...

Does somebody know when full bandwith will be available again?

Regards Jörg
Last time, it took about 48 hours to clear the backlog of data on the server. I have a very slow upload connection and an over half way through my backlog of 24 tasks worth of zips. I would guess that most if not all of those with fast connections will have cleared their backlog by now so slots to connect tot he server should be getting relatively easy to capture.
ID: 67740 · Report as offensive     Reply Quote
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 25 · Next

Message boards : Number crunching : The uploads are stuck

©2024 climateprediction.net