climateprediction.net home page
The uploads are stuck

The uploads are stuck

Message boards : Number crunching : The uploads are stuck
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 25 · Next

AuthorMessage
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4523
Credit: 18,537,237
RAC: 7,960
Message 67392 - Posted: 6 Jan 2023, 12:04:40 UTC

Latest update from JASMIN Support:

'We are anticipating that it will take until Monday to get the machine back now. Sorry for the inconvenience.'


Latest communication to Andy. Sorry folks.
ID: 67392 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1036
Credit: 16,134,123
RAC: 12,670
Message 67393 - Posted: 6 Jan 2023, 12:13:17 UTC - in response to Message 67391.  
Last modified: 6 Jan 2023, 12:16:45 UTC

Well, my new 2 TB SSD arrived by post this morning - that's probably a faster data transmission rate than the internet, just at the moment.

The question is - dare I attempt to install and mount it as BOINC's data drive, while I'm still hosting around 90 results waiting to upload and report? On balance, I think probably not.
Create a second client instance on the same host, put it's dir on the new SSD. That will also get you past the 'no more tasks due to too many uploads in progress' problem if we have to wait until (at least) Monday for uploads to start.
ID: 67393 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1058
Credit: 36,476,152
RAC: 12,739
Message 67394 - Posted: 6 Jan 2023, 12:27:39 UTC - in response to Message 67393.  

Create a second client instance on the same host, put it's dir on the new SSD. That will also get you past the 'no more tasks due to too many uploads in progress' problem if we have to wait until (at least) Monday for uploads to start.
That's a thought. Just been doing the sums: some 1.8+ GB per task, 5 tasks every 14 hours, 3+ days to Monday evening. I make that about 50 GB: disk space free, 66.37 GB. (I've already circumvented the 'too many uploads' gotcha - disk will fill up long before that kicks in again)

I think I can limp through until Monday daytime, and start making plans for - at worst - an orderly shutdown then.
ID: 67394 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1036
Credit: 16,134,123
RAC: 12,670
Message 67395 - Posted: 6 Jan 2023, 13:27:26 UTC - in response to Message 67394.  
Last modified: 6 Jan 2023, 13:28:35 UTC

Create a second client instance on the same host, put it's dir on the new SSD. That will also get you past the 'no more tasks due to too many uploads in progress' problem if we have to wait until (at least) Monday for uploads to start.
That's a thought. Just been doing the sums: some 1.8+ GB per task, 5 tasks every 14 hours, 3+ days to Monday evening. I make that about 50 GB: disk space free, 66.37 GB. (I've already circumvented the 'too many uploads' gotcha - disk will fill up long before that kicks in again)
How did you circumvent the 'no tasks coz too many uploads'? I got around it by using a 2nd boinc client (the original one is now just trying to upload), is there an easier/better way?
ID: 67395 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1058
Credit: 36,476,152
RAC: 12,739
Message 67396 - Posted: 6 Jan 2023, 14:18:24 UTC - in response to Message 67395.  
Last modified: 6 Jan 2023, 14:45:06 UTC

I'll send you a PM.

Sent.
ID: 67396 · Report as offensive     Reply Quote
xii5ku

Send message
Joined: 27 Mar 21
Posts: 79
Credit: 78,302,757
RAC: 1,077
Message 67397 - Posted: 6 Jan 2023, 16:30:03 UTC - in response to Message 67395.  
Last modified: 6 Jan 2023, 16:33:10 UTC

Glenn Carver wrote:
How did you circumvent the 'no tasks coz too many uploads'?
Since "too many" means 2 * "number of logical CPUs usable by BOINC", the workaround is to increase the latter. Documentation: Client configuration, 1.1.2 Options. (Before increasing this, it's certainly desirable to limit the number of simultaneously running tasks via app_config.xml's project_max_concurrent or respective per-application parameters. Also, perhaps depending on client version, keep an eye on the work buffer size, to avoid that the client then downloads too much work.)
ID: 67397 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1118
Credit: 17,177,237
RAC: 2,478
Message 67398 - Posted: 6 Jan 2023, 19:14:22 UTC - in response to Message 67391.  

Well, my new 2 TB SSD arrived by post this morning - that's probably a faster data transmission rate than the internet, just at the moment.


Reminds me of the early days of e-mail when the stuff went by UNIX uucp dial-up connection to a nearby other location. And e-mail from USA went to Europe, Japan, Australia, etc. on rolls of magnetic tape handed to a stewardess on an airplane going to the right place. Same with UseNet.

I do not wish for those good old days.
ID: 67398 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1036
Credit: 16,134,123
RAC: 12,670
Message 67400 - Posted: 6 Jan 2023, 22:01:18 UTC - in response to Message 67397.  
Last modified: 6 Jan 2023, 22:01:33 UTC

Glenn Carver wrote:
How did you circumvent the 'no tasks coz too many uploads'?
Since "too many" means 2 * "number of logical CPUs usable by BOINC", the workaround is to increase the latter. Documentation: Client configuration, 1.1.2 Options. (Before increasing this, it's certainly desirable to limit the number of simultaneously running tasks via app_config.xml's project_max_concurrent or respective per-application parameters. Also, perhaps depending on client version, keep an eye on the work buffer size, to avoid that the client then downloads too much work.)
Yes, thanks.

I think I'll stick to running multiple client instances on the same host, it's cleaner and has the advantage I can put them on different disks.
ID: 67400 · Report as offensive     Reply Quote
rjs5

Send message
Joined: 16 Jun 05
Posts: 16
Credit: 19,274,928
RAC: 2,960
Message 67410 - Posted: 7 Jan 2023, 8:55:40 UTC - in response to Message 67392.  

Latest update from JASMIN Support:

'We are anticipating that it will take until Monday to get the machine back now. Sorry for the inconvenience.'


Latest communication to Andy. Sorry folks.


The UPLOADS went fine for a day or so and the behavior is the same " transient HTTP error" as before.

Sat 07 Jan 2023 12:53:46 AM PST | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_0547_1993050100_123_962_12179191_1_r1371948884_42.zip: transient HTTP error
ID: 67410 · Report as offensive     Reply Quote
Helmer Bryd

Send message
Joined: 16 Aug 04
Posts: 156
Credit: 9,035,872
RAC: 2,928
Message 67415 - Posted: 7 Jan 2023, 18:25:09 UTC - in response to Message 67392.  

Latest update from JASMIN Support:

'We are anticipating that it will take until Monday to get the machine back now. Sorry for the inconvenience.'


Latest communication to Andy. Sorry folks.


Monday.. what week,month?
It was a mistake trying this again, see if I'll bother uploading my 30-40 results, other things need my bandwith
/out
ID: 67415 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4523
Credit: 18,537,237
RAC: 7,960
Message 67416 - Posted: 7 Jan 2023, 18:42:05 UTC - in response to Message 67415.  

Monday.. what week,month?
It was a mistake trying this again, see if I'll bother uploading my 30-40 results, other things need my bandwith
/out
I fully expect uploads to be going again by 17:00 Monday at the latest.
ID: 67416 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1118
Credit: 17,177,237
RAC: 2,478
Message 67417 - Posted: 7 Jan 2023, 18:48:26 UTC - in response to Message 67415.  

Latest update from JASMIN Support:

'We are anticipating that it will take until Monday to get the machine back now. Sorry for the inconvenience.'



Latest communication to Andy. Sorry folks.



Monday.. what week,month?
It was a mistake trying this again, see if I'll bother uploading my 30-40 results, other things need my bandwith


Probably the Monday January 9, 2023.

You do not have to bother uploading. The boinc client will try from time-to-time unless you find a way to stop it. It will fail until they get the upload server running again.

In my experience, it ran very very fast during the two periods a few days ago when it was up. I was sending one or two of those 14 Megabyte files every six seconds or so. But I do have a 75 Megabit/second fiber-optic Internet connection. YMMV.
ID: 67417 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 125
Credit: 40,593,080
RAC: 57,050
Message 67418 - Posted: 8 Jan 2023, 8:23:42 UTC - in response to Message 67417.  

I hope the upload server will finally stick this time. I only got a few trickle files uploaded that day, not even completing a single WU. Most of time it's not able to connect at all even as others reported success. Assuming the same thing going to happen, the server better be up for long enough to deplete others' uploads before it (hopefully) comes to my turn. :-(
ID: 67418 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1118
Credit: 17,177,237
RAC: 2,478
Message 67421 - Posted: 8 Jan 2023, 11:02:26 UTC - in response to Message 67418.  

I hope the upload server will finally stick this time. I only got a few trickle files uploaded that day, not even completing a single WU. Most of time it's not able to connect at all even as others reported success. Assuming the same thing going to happen, the server better be up for long enough to deplete others' uploads before it (hopefully) comes to my turn. :-(


There are probably a few thousand of us all hoping the same thing. It has been a bad coupla weeks for all of us, both the people running the tasks, and especially the people involved on the CPDN team. The fact that the customer support people in charge of the real computer system at the server keep only banker's hours is a big negative. They do not work holidays, nights, or weekends, work a short day on Fridays, ...

One consolation is that when it is really up, it works really well and fast. I have a 9375.00 KByte/second fiber-optic connection to the Internet, and it sometines runs a little faster than even that. I was uploading a 14 megabyte zip file every six seconds when the system was working well that day when it came up and worked for a few hours at a time. The results here reflect the fact that uploading these days has been zero to very bad lately.
Average upload rate 	115.21 KB/sec
Average download rate 	7376.9 KB/sec

Do not frustrate yourself today: There will be no one there to help today. Customer support should start about 9AM UST tomorrow and may get it really going this time. They think the really know what the problem is and have to replace a big RAID disk subsystem. They must copy all the data from the flakey one onto the new one, then install the new one, check it all out and, if all works out, put the thing on line and turn on their part of the Internet again.

It is tough for everyone.
ID: 67421 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 125
Credit: 40,593,080
RAC: 57,050
Message 67437 - Posted: 8 Jan 2023, 19:16:47 UTC - in response to Message 67421.  

Oh, I am not that frustrated at this point. Since I've hit the upload limit, the machines have been crunching other projects. CPDN doesn't always have WUs to send out anyway. I just want to communicate that not everyone got lucky during those brief periods. That has some implications.
1) There are completed WUs approaching deadline soon. I have WUs due in 11 days and 11 days no longer look that long given the upload server has been down for two weeks. If somehow the new storage array again has problems, or some new issues showing up which is not that uncommon for new systems, we probably need server to extend deadlines to not waste work.
2) The admins shouldn't assume once the upload server is up, everyone is good. It might take a while to fully drain pending uploads before things go back to normal.
ID: 67437 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4523
Credit: 18,537,237
RAC: 7,960
Message 67438 - Posted: 8 Jan 2023, 20:06:54 UTC

2) The admins shouldn't assume once the upload server is up, everyone is good. It might take a while to fully drain pending uploads before things go back to normal.


Unless the support people at JASMIN are wrong, the reconfiguration should be fairly straightforward at 0900 UK time tomorrow and it does have a capacity that outstrips the old system by a factor of a thousand or more. As long as it stays up, I would expect that there will be far more delays due to slow broadband connections like mine than there will be from the project end. But I have seen predictions from CPDN and other projects about things being fixed pushed back in time more often than I care to remember so yes, not assuming anything yet though I am hopeful.
ID: 67438 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1118
Credit: 17,177,237
RAC: 2,478
Message 67440 - Posted: 8 Jan 2023, 20:24:38 UTC - in response to Message 67438.  

As long as it stays up, I would expect that there will be far more delays due to slow broadband connections like mine than there will be from the project end.


Oh! Goody! Verizon FiOS Internet access says it is good for 75 MegaBits/second up and down, but today it is delivering better.
I had suspended running my CPDN stuff, but I resumed two in anticipation...
Timestamp 	    Download 	Upload 	   Latency Jitter Quality Score Test Server
1/8/2023 15:11:15   78.52 Mbps  89.82 Mbps 4 ms    1 ms	  Excellent     speedgauge2.optonline.net.prod.hosts.ooklaserver.net

ID: 67440 · Report as offensive     Reply Quote
leloft

Send message
Joined: 7 Jun 17
Posts: 23
Credit: 44,434,789
RAC: 2,600,991
Message 67451 - Posted: 9 Jan 2023, 10:15:56 UTC - in response to Message 67438.  

[quote]2) The admins shouldn't assume once the upload server is up, everyone is good. It might take a while to fully drain pending uploads before things go back to normal.


Hello. A related issue: On one of my hosts, the boinc-client partition is full and so the state file cannot be written and boinc exits. There are over 120 WU waiting to upload (174G), but if the state file cannot be written, I am concerned that these uploads will not get initiated when the upload server comes back online. Can anyone suggest a workaround for this? My first instinct is to move a chunk of the data to an adjacent partition, but I do not know how to ensure that the data structures will remain intact. In other words: How do I move exactly 100% of half of the completed WU from the data directory to a holding directory? I have done this several times (including twice this week) to move an entire data partition to a new, bigger one using rsync -a which works reliably, but as I do not know how to move these completed WU in their entirety, I'd appreciate some feedback.

Many thanks
ID: 67451 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1036
Credit: 16,134,123
RAC: 12,670
Message 67453 - Posted: 9 Jan 2023, 10:48:24 UTC - in response to Message 67451.  
Last modified: 9 Jan 2023, 10:48:42 UTC

[quote]Hello. A related issue: On one of my hosts, the boinc-client partition is full and so the state file cannot be written and boinc exits. There are over 120 WU waiting to upload (174G), but if the state file cannot be written, I am concerned that these uploads will not get initiated when the upload server comes back online. Can anyone suggest a workaround for this? My first instinct is to move a chunk of the data to an adjacent partition, but I do not know how to ensure that the data structures will remain intact. In other words: How do I move exactly 100% of half of the completed WU from the data directory to a holding directory? I have done this several times (including twice this week) to move an entire data partition to a new, bigger one using rsync -a which works reliably, but as I do not know how to move these completed WU in their entirety, I'd appreciate some feedback.
My first thought would be to use 'rsync -a' to move the directory contents to another partiion (or attach an external drive) and then just soft link all the workunit files back to their original location. I am not sure if boinc cares if the files are links though, it could check but my guess is it doesn't. Maybe someone more knowledgeable about boinc would know.
ID: 67453 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1036
Credit: 16,134,123
RAC: 12,670
Message 67454 - Posted: 9 Jan 2023, 10:53:12 UTC

Upload server update 9/1/23 10:49GMT
From a meeting this morning with CPDN they do not expect the upload server to be available until 17:00GMT TOMORROW (10th) at the earliest. The server itself is running, but they have to move many Tbs of data but also want to monitor the newly configured server to check it is stable. As already said, these are issues caused by the cloud provider, not CPDN themselves.
ID: 67454 · Report as offensive     Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 25 · Next

Message boards : Number crunching : The uploads are stuck

©2024 cpdn.org