climateprediction.net (CPDN) home page
Thread 'The uploads are stuck'

Thread 'The uploads are stuck'

Message boards : Number crunching : The uploads are stuck
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 25 · Next

AuthorMessage
gemini8

Send message
Joined: 4 Dec 15
Posts: 52
Credit: 2,496,117
RAC: 1,410
Message 67537 - Posted: 11 Jan 2023, 11:45:42 UTC

Mine go up with about 1.5 MiB/s, meaning around twelve seconds per file.
Nice it's working again!
- - - - - - - - - -
Greetings, Jens
ID: 67537 · Report as offensive     Reply Quote
ProfileConan
Avatar

Send message
Joined: 6 Jul 06
Posts: 147
Credit: 3,615,496
RAC: 420
Message 67538 - Posted: 11 Jan 2023, 11:56:52 UTC - in response to Message 67525.  
Last modified: 11 Jan 2023, 12:13:06 UTC

Yes I am still seeing "connect(): failed" messages on all upload tries.

But I still have 4 work units running and I am no where near filling up any disks, so no problem here.

Conan


It has changed to "transient HTTP error" now so still not working here yet (Australia).

Server Status has not changed yet, still showing nothing.

Conan

PS: Some files are now moving, so possibly due to the load, some fail then must retry later, others are going through, some as low as 17 kB/s to as high as 1,700 kB/s.
ID: 67538 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 67539 - Posted: 11 Jan 2023, 12:29:59 UTC

Right now I cannot ping them ...

$ ping -c 5 upload11.cpdn.org
PING upload11.cpdn.org (192.171.169.187) 56(84) bytes of data.

--- upload11.cpdn.org ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4116ms
$ ping -c 5 upload11.cpdn.org
PING upload11.cpdn.org (192.171.169.187) 56(84) bytes of data.

--- upload11.cpdn.org ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4116ms


Time to get up, get dressed, make breakfast.
ID: 67539 · Report as offensive     Reply Quote
leloft

Send message
Joined: 7 Jun 17
Posts: 23
Credit: 44,434,789
RAC: 2,600,991
Message 67540 - Posted: 11 Jan 2023, 13:02:08 UTC

Hi. I'm seeing an error message that there is insufficient space on one of my hosts from the project update process, but df, boinccmd and boinctui all report that there is over 17GB available. No movement on all four hosts, three of which are in the 'too many uploads' loop.

update requested by user
11-Jan-2023 12:27:39 [climateprediction.net] Sending scheduler request: Requested by user.
11-Jan-2023 12:27:39 [climateprediction.net] Requesting new tasks for CPU
11-Jan-2023 12:27:41 [climateprediction.net] Scheduler request completed: got 0 new tasks
11-Jan-2023 12:27:41 [climateprediction.net] No tasks sent
11-Jan-2023 12:27:41 [climateprediction.net] OpenIFS 43r3 Perturbed Surface needs 38146.97MB more disk space. You currently have 0.00 MB available and it needs 38146.97 MB.
11-Jan-2023 12:27:41 [climateprediction.net] OpenIFS 43r3 Perturbed Surface needs 7168.00MB more disk space. You currently have 0.00 MB available and it needs 7168.00 MB.
11-Jan-2023 12:27:41 [climateprediction.net] Project requested delay of 3636 seconds


boinccmd --get_disk_usage
======== Disk usage ========
total: 47000.71MB
free: 18054.40MB
1) -----------
master URL: https://climateprediction.net/
disk usage: 26511.11MB

Any ideas?

fraser
ID: 67540 · Report as offensive     Reply Quote
bullschuck

Send message
Joined: 22 May 21
Posts: 39
Credit: 1,225,873
RAC: 3,852
Message 67541 - Posted: 11 Jan 2023, 13:44:28 UTC - in response to Message 67539.  

[quote]Right now I cannot ping them ...

$ ping -c 5 upload11.cpdn.org
PING upload11.cpdn.org (192.171.169.187) 56(84) bytes of data.

--- upload11.cpdn.org ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4116ms
$ ping -c 5 upload11.cpdn.org
PING upload11.cpdn.org (192.171.169.187) 56(84) bytes of data.

--- upload11.cpdn.org ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4116ms


Yup. Same here. No uploads. No pings.

Bull
ID: 67541 · Report as offensive     Reply Quote
nairb

Send message
Joined: 3 Sep 04
Posts: 105
Credit: 5,646,090
RAC: 102,785
Message 67542 - Posted: 11 Jan 2023, 13:53:52 UTC

Well, here in slow land, one zip file at a time is being uploaded....... Huuurrray.
It's happening as I write this.
ID: 67542 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,580,037
RAC: 14,757
Message 67543 - Posted: 11 Jan 2023, 14:20:11 UTC - in response to Message 67540.  

HI Fraser,
I suggest removing any boinc limits on disk space (temporarily if need be). In the boincmgr app (or equiv for boinccmd), untick to remove any disk limits for: 'Use no more than', 'Leave at least', & 'Use no more than'. If those are all disabled, the messages about insufficient disk should disappear.

I'm puzzled boinc gave you the tasks if there wasn't enough memory. Did you by any chance change your disk limits lately?

If that doesn't work, let us know.
Hi. I'm seeing an error message that there is insufficient space on one of my hosts from the project update process, but df, boinccmd and boinctui all report that there is over 17GB available. No movement on all four hosts, three of which are in the 'too many uploads' loop.

update requested by user
11-Jan-2023 12:27:39 [climateprediction.net] Sending scheduler request: Requested by user.
11-Jan-2023 12:27:39 [climateprediction.net] Requesting new tasks for CPU
11-Jan-2023 12:27:41 [climateprediction.net] Scheduler request completed: got 0 new tasks
11-Jan-2023 12:27:41 [climateprediction.net] No tasks sent
11-Jan-2023 12:27:41 [climateprediction.net] OpenIFS 43r3 Perturbed Surface needs 38146.97MB more disk space. You currently have 0.00 MB available and it needs 38146.97 MB.
11-Jan-2023 12:27:41 [climateprediction.net] OpenIFS 43r3 Perturbed Surface needs 7168.00MB more disk space. You currently have 0.00 MB available and it needs 7168.00 MB.
11-Jan-2023 12:27:41 [climateprediction.net] Project requested delay of 3636 seconds


boinccmd --get_disk_usage
======== Disk usage ========
total: 47000.71MB
free: 18054.40MB
1) -----------
master URL: https://climateprediction.net/
disk usage: 26511.11MB

Any ideas?

fraser
ID: 67543 · Report as offensive     Reply Quote
leloft

Send message
Joined: 7 Jun 17
Posts: 23
Credit: 44,434,789
RAC: 2,600,991
Message 67544 - Posted: 11 Jan 2023, 15:02:26 UTC - in response to Message 67543.  

Thanks for your reply.

HI Fraser,
I suggest removing any boinc limits on disk space (temporarily if need be). In the boincmgr app (or equiv for boinccmd), untick to remove any disk limits for: 'Use no more than', 'Leave at least', & 'Use no more than'. If those are all disabled, the messages about insufficient disk should disappear.

There are no limits on disk space: /var/lib/boinc-client has its own 46G partition. These restrictions have been 'unticked' in the account preferences for all 'locations' for a while (days/weeks) since the upload issues went long term.

I'm puzzled boinc gave you the tasks if there wasn't enough memory. Did you by any chance change your disk limits lately?

It's not a memory issue, the refusal was based on disk space. I've checked to see if it was a swap issue but swap is at 0.45% (of 12G). Host has 16G RAM, of which 10.5% (1.7G) in use.

If that doesn't work, let us know.

It's not working, but I haven't changed anything, so no surprises there. The host is this one ID: 1523000. If you want any logs, let me know and I'll send you the last 12 hours worth. I'll report any changes if it clears itself.

Best
fraser
ID: 67544 · Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 2 Oct 06
Posts: 54
Credit: 27,309,613
RAC: 28,128
Message 67546 - Posted: 11 Jan 2023, 15:33:35 UTC - in response to Message 67541.  

Yup. Same here. No uploads. No pings.

Bull


Same here. Not sure if it is the same problem or a new problem. But whatever the case, it is not fixed.
ID: 67546 · Report as offensive     Reply Quote
gemini8

Send message
Joined: 4 Dec 15
Posts: 52
Credit: 2,496,117
RAC: 1,410
Message 67547 - Posted: 11 Jan 2023, 16:03:34 UTC

The uploads seem to behave like an on/off relationship between my clients and the server.
When they do upload they seem fine.
But sometimes they just won't. -shrug-

I'll baby-sit one of the two machines with tasks as I want to shut it down soon, and it's ok by me if the other just takes its time.
- - - - - - - - - -
Greetings, Jens
ID: 67547 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 67549 - Posted: 11 Jan 2023, 16:46:15 UTC

I have uploaded and reported one task from my VM Ubuntu guest under Ubuntu host. It has three more to finish uploading. I suspended uploads from the host machine till these are cleared to reduce the number of connections to the server. Changing to internet access always, the host machine has only managed to get one out of four (the maximum I have allowed) uploads going. This suggests to me that there is still a problem with congestion and the number of machines trying to upload zips and once the backlog has cleared a bit things should improve. (Something over 1,000 tasks have reported since it started working again but there are still a lot to go!)
ID: 67549 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,580,037
RAC: 14,757
Message 67551 - Posted: 11 Jan 2023, 17:47:05 UTC - in response to Message 67547.  

The uploads seem to behave like an on/off relationship between my clients and the server.
When they do upload they seem fine.
But sometimes they just won't. -shrug-
It's going to take time. I have 20,000 files to upload, scale that up to >700 clients etc......

The upload server seems stable, I've not heard of any issues from CPDN, I'm guessing Dave & the other moderators haven't. So, all good.
ID: 67551 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,580,037
RAC: 14,757
Message 67552 - Posted: 11 Jan 2023, 17:57:25 UTC - in response to Message 67544.  

leloft wrote:
There are no limits on disk space: /var/lib/boinc-client has its own 46G partition. These restrictions have been 'unticked' in the account preferences for all 'locations' for a while (days/weeks) since the upload issues went long term.
Fraser, my brain wasn't quite in gear this morning from the excitement of my uploads starting again.

So here's the issue (going back to your original post):
11-Jan-2023 12:27:39 [climateprediction.net] Requesting new tasks for CPU
11-Jan-2023 12:27:41 [climateprediction.net] Scheduler request completed: got 0 new tasks
11-Jan-2023 12:27:41 [climateprediction.net] No tasks sent
11-Jan-2023 12:27:41 [climateprediction.net] OpenIFS 43r3 Perturbed Surface needs 38146.97MB more disk space. You currently have 0.00 MB available and it needs 38146.97 MB.
11-Jan-2023 12:27:41 [climateprediction.net] OpenIFS 43r3 Perturbed Surface needs 7168.00MB more disk space. You currently have 0.00 MB available and it needs 7168.00 MB.
The client requested more tasks but the server said no because there's not enough space.

Note how the first message says the task needs another ~38Gb, the second says only 7Gb. In your first message the free disk space is ~18Gb, that's clearly not enough for the first task which claims it wants 38Gb. It should be enough for the second task which wants 7Gb but you don't get that either. My guess here is that the server tried to send you both, added up their total space, and then said it couldn't send either.

The first message 'needs 38Gb' tells me that's a resent task from batch 950, because this batch had a mistake in the disk size requirement, it was ~9-10 times too high. This was corrected for later batches to be ~7Gb.

I think you were just unlucky you got a resend from the first batch. I suspect if you try again, you might get a couple of 'corrected' tasks from the other batches. Try it?

Cheers, Glenn
ID: 67552 · Report as offensive     Reply Quote
Stony666

Send message
Joined: 9 Feb 21
Posts: 9
Credit: 10,689,509
RAC: 3,567
Message 67554 - Posted: 11 Jan 2023, 18:25:47 UTC - in response to Message 67552.  

Hi,

only for information...

I have waited nearly two weeks until I post here because I had the hope that it could be fixed in a few days.

My hosts resides in Germany. They are in 3 different locations and providers.

On all boxes:
69972 climateprediction.net 11.01.2023 19:11:41 Temporarily failed upload of oifs_43r3_ps_0932_2013050100_123_982_12199576_0_r933785131_43.zip: transient HTTP error

traceroute on all boxes show

  traceroute upload11.cpdn.org
traceroute to upload11.cpdn.org (192.171.169.187), 30 hops max, 60 byte packets
 1  45.84.199.3 (45.84.199.3)  0.444 ms  0.421 ms  0.413 ms
 2  45.135.200.25 (45.135.200.25)  0.407 ms  0.527 ms  0.392 ms
 3  unn-84-17-33-58.cdn77.com (84.17.33.58)  0.385 ms unn-84-17-33-62.cdn77.com (84.17.33.62)  0.505 ms unn-84-17-33-60.cdn77.com (84.17.33.60)  0.497 ms
 4  ae12-460.fra20.core-backbone.com (5.56.19.81)  0.489 ms  0.678 ms ae22-449.fra10.core-backbone.com (5.56.19.237)  0.547 ms
 5  ae3-2072.lon10.core-backbone.com (80.255.15.166)  10.697 ms  10.815 ms  10.675 ms
 6  linx-gw1.ja.net (195.66.224.15)  10.661 ms  10.698 ms  10.676 ms
 7  ae23.londtt-sbr1.ja.net (146.97.35.169)  13.399 ms  13.390 ms  13.382 ms
 8  ae27.erdiss-sbr2.ja.net (146.97.33.14)  18.783 ms  18.594 ms  18.572 ms
 9  * * *
10  ral-r26.ja.net (146.97.41.34)  19.316 ms  19.311 ms  19.179 ms
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *


All boxes try no more then four transmissions at the same time. No box triggeres the server via a script.

From my point of you, nothing is ok!
When thinking about the time since when we have the problem... what is the support doing all the time.
I have more than 400 WUs to upload. Should i send it via USB stick or is there a chance to upload... before the 20th of January where all are running out? :(

Cheers
ID: 67554 · Report as offensive     Reply Quote
leloft

Send message
Joined: 7 Jun 17
Posts: 23
Credit: 44,434,789
RAC: 2,600,991
Message 67555 - Posted: 11 Jan 2023, 18:34:49 UTC - in response to Message 67552.  


I think you were just unlucky you got a resend from the first batch. I suspect if you try again, you might get a couple of 'corrected' tasks from the other batches. Try it?

Doubly unlucky: I've just had the same refusal from both the first machine and now a second one, both refer to the same value 7168.00 MB. The good news is that one of the hosts has managed to upload 8 tasks.
I'll keep trying, but I'm limited by the 3636 seconds rule.

Best
fraser
ID: 67555 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 67556 - Posted: 11 Jan 2023, 18:41:40 UTC

I am now getting "Project servers may be temporarily down" again. Shame as I was making use of the fact that I have over 15GB of my 20GB allowance on my phone to upload ten times faster than my bored band can manage.
ID: 67556 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 127
Credit: 42,638,592
RAC: 76,985
Message 67557 - Posted: 11 Jan 2023, 18:44:31 UTC

Best case: Server is saturated and we just need to be patient and wait for our turn.
Worse case: It was not the storage at first place and the actual issue is still unknown yet.

TBH, it would be nice if the server status page is more useful, like showing bandwidth usage, etc. Then it would much easier to know if it's making progress.
ID: 67557 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 161
Credit: 81,522,141
RAC: 1,164
Message 67558 - Posted: 11 Jan 2023, 18:44:32 UTC

Fix for - Need more disk space. You currently have 0.00 MB available.

In the BOINC Manager, Options -> Computing Preferences -> Disk and memory -

Check the box "Use no more than" and put a number in the number box equal to about 3/4 of your disk size (or some other number you are comfortable with).

If you leave it this box UNCHECKED, it is the same as having it checked with 100 (GB) in the number box.

At least that is how it works for me.
ID: 67558 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 67559 - Posted: 11 Jan 2023, 18:51:05 UTC - in response to Message 67556.  

I am now getting "Project servers may be temporarily down" again. Shame as I was making use of the fact that I have over 15GB of my 20GB allowance on my phone to upload ten times faster than my bored band can manage.


Must have been saturation. Now working again. I have only one task left on VM. Once they have gone I can concentrate on the host machine.
ID: 67559 · Report as offensive     Reply Quote
gemini8

Send message
Joined: 4 Dec 15
Posts: 52
Credit: 2,496,117
RAC: 1,410
Message 67561 - Posted: 11 Jan 2023, 19:03:57 UTC

And my first machine is done.
Second one seems to have made a lot of ground, so maybe it'll be done when I sit down in the living-room.
- - - - - - - - - -
Greetings, Jens
ID: 67561 · Report as offensive     Reply Quote
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 25 · Next

Message boards : Number crunching : The uploads are stuck

©2024 cpdn.org