Message boards : Number crunching : The uploads are stuck
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 25 · Next
Author | Message |
---|---|
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Latest update from JASMIN Support: Latest communication to Andy. Sorry folks. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,580,037 RAC: 14,757 |
Well, my new 2 TB SSD arrived by post this morning - that's probably a faster data transmission rate than the internet, just at the moment.Create a second client instance on the same host, put it's dir on the new SSD. That will also get you past the 'no more tasks due to too many uploads in progress' problem if we have to wait until (at least) Monday for uploads to start. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,746,403 RAC: 5,877 |
Create a second client instance on the same host, put it's dir on the new SSD. That will also get you past the 'no more tasks due to too many uploads in progress' problem if we have to wait until (at least) Monday for uploads to start.That's a thought. Just been doing the sums: some 1.8+ GB per task, 5 tasks every 14 hours, 3+ days to Monday evening. I make that about 50 GB: disk space free, 66.37 GB. (I've already circumvented the 'too many uploads' gotcha - disk will fill up long before that kicks in again) I think I can limp through until Monday daytime, and start making plans for - at worst - an orderly shutdown then. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,580,037 RAC: 14,757 |
How did you circumvent the 'no tasks coz too many uploads'? I got around it by using a 2nd boinc client (the original one is now just trying to upload), is there an easier/better way?Create a second client instance on the same host, put it's dir on the new SSD. That will also get you past the 'no more tasks due to too many uploads in progress' problem if we have to wait until (at least) Monday for uploads to start.That's a thought. Just been doing the sums: some 1.8+ GB per task, 5 tasks every 14 hours, 3+ days to Monday evening. I make that about 50 GB: disk space free, 66.37 GB. (I've already circumvented the 'too many uploads' gotcha - disk will fill up long before that kicks in again) |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,746,403 RAC: 5,877 |
I'll send you a PM. Sent. |
Send message Joined: 27 Mar 21 Posts: 79 Credit: 78,317,688 RAC: 914 |
Glenn Carver wrote: How did you circumvent the 'no tasks coz too many uploads'?Since "too many" means 2 * "number of logical CPUs usable by BOINC", the workaround is to increase the latter. Documentation: Client configuration, 1.1.2 Options. (Before increasing this, it's certainly desirable to limit the number of simultaneously running tasks via app_config.xml's project_max_concurrent or respective per-application parameters. Also, perhaps depending on client version, keep an eye on the work buffer size, to avoid that the client then downloads too much work.) |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Well, my new 2 TB SSD arrived by post this morning - that's probably a faster data transmission rate than the internet, just at the moment. Reminds me of the early days of e-mail when the stuff went by UNIX uucp dial-up connection to a nearby other location. And e-mail from USA went to Europe, Japan, Australia, etc. on rolls of magnetic tape handed to a stewardess on an airplane going to the right place. Same with UseNet. I do not wish for those good old days. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,580,037 RAC: 14,757 |
Glenn Carver wrote:Yes, thanks.How did you circumvent the 'no tasks coz too many uploads'?Since "too many" means 2 * "number of logical CPUs usable by BOINC", the workaround is to increase the latter. Documentation: Client configuration, 1.1.2 Options. (Before increasing this, it's certainly desirable to limit the number of simultaneously running tasks via app_config.xml's project_max_concurrent or respective per-application parameters. Also, perhaps depending on client version, keep an eye on the work buffer size, to avoid that the client then downloads too much work.) I think I'll stick to running multiple client instances on the same host, it's cleaner and has the advantage I can put them on different disks. |
Send message Joined: 16 Jun 05 Posts: 16 Credit: 19,502,063 RAC: 8,108 |
Latest update from JASMIN Support: The UPLOADS went fine for a day or so and the behavior is the same " transient HTTP error" as before. Sat 07 Jan 2023 12:53:46 AM PST | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_0547_1993050100_123_962_12179191_1_r1371948884_42.zip: transient HTTP error |
Send message Joined: 16 Aug 04 Posts: 156 Credit: 9,035,872 RAC: 2,928 |
Latest update from JASMIN Support: Monday.. what week,month? It was a mistake trying this again, see if I'll bother uploading my 30-40 results, other things need my bandwith /out |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Monday.. what week,month?I fully expect uploads to be going again by 17:00 Monday at the latest. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Latest update from JASMIN Support: Probably the Monday January 9, 2023. You do not have to bother uploading. The boinc client will try from time-to-time unless you find a way to stop it. It will fail until they get the upload server running again. In my experience, it ran very very fast during the two periods a few days ago when it was up. I was sending one or two of those 14 Megabyte files every six seconds or so. But I do have a 75 Megabit/second fiber-optic Internet connection. YMMV. |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 42,636,107 RAC: 77,060 |
I hope the upload server will finally stick this time. I only got a few trickle files uploaded that day, not even completing a single WU. Most of time it's not able to connect at all even as others reported success. Assuming the same thing going to happen, the server better be up for long enough to deplete others' uploads before it (hopefully) comes to my turn. :-( |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I hope the upload server will finally stick this time. I only got a few trickle files uploaded that day, not even completing a single WU. Most of time it's not able to connect at all even as others reported success. Assuming the same thing going to happen, the server better be up for long enough to deplete others' uploads before it (hopefully) comes to my turn. :-( There are probably a few thousand of us all hoping the same thing. It has been a bad coupla weeks for all of us, both the people running the tasks, and especially the people involved on the CPDN team. The fact that the customer support people in charge of the real computer system at the server keep only banker's hours is a big negative. They do not work holidays, nights, or weekends, work a short day on Fridays, ... One consolation is that when it is really up, it works really well and fast. I have a 9375.00 KByte/second fiber-optic connection to the Internet, and it sometines runs a little faster than even that. I was uploading a 14 megabyte zip file every six seconds when the system was working well that day when it came up and worked for a few hours at a time. The results here reflect the fact that uploading these days has been zero to very bad lately. Average upload rate 115.21 KB/sec Average download rate 7376.9 KB/sec Do not frustrate yourself today: There will be no one there to help today. Customer support should start about 9AM UST tomorrow and may get it really going this time. They think the really know what the problem is and have to replace a big RAID disk subsystem. They must copy all the data from the flakey one onto the new one, then install the new one, check it all out and, if all works out, put the thing on line and turn on their part of the Internet again. It is tough for everyone. |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 42,636,107 RAC: 77,060 |
Oh, I am not that frustrated at this point. Since I've hit the upload limit, the machines have been crunching other projects. CPDN doesn't always have WUs to send out anyway. I just want to communicate that not everyone got lucky during those brief periods. That has some implications. 1) There are completed WUs approaching deadline soon. I have WUs due in 11 days and 11 days no longer look that long given the upload server has been down for two weeks. If somehow the new storage array again has problems, or some new issues showing up which is not that uncommon for new systems, we probably need server to extend deadlines to not waste work. 2) The admins shouldn't assume once the upload server is up, everyone is good. It might take a while to fully drain pending uploads before things go back to normal. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
2) The admins shouldn't assume once the upload server is up, everyone is good. It might take a while to fully drain pending uploads before things go back to normal. Unless the support people at JASMIN are wrong, the reconfiguration should be fairly straightforward at 0900 UK time tomorrow and it does have a capacity that outstrips the old system by a factor of a thousand or more. As long as it stays up, I would expect that there will be far more delays due to slow broadband connections like mine than there will be from the project end. But I have seen predictions from CPDN and other projects about things being fixed pushed back in time more often than I care to remember so yes, not assuming anything yet though I am hopeful. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
As long as it stays up, I would expect that there will be far more delays due to slow broadband connections like mine than there will be from the project end. Oh! Goody! Verizon FiOS Internet access says it is good for 75 MegaBits/second up and down, but today it is delivering better. I had suspended running my CPDN stuff, but I resumed two in anticipation... Timestamp Download Upload Latency Jitter Quality Score Test Server 1/8/2023 15:11:15 78.52 Mbps 89.82 Mbps 4 ms 1 ms Excellent speedgauge2.optonline.net.prod.hosts.ooklaserver.net |
Send message Joined: 7 Jun 17 Posts: 23 Credit: 44,434,789 RAC: 2,600,991 |
[quote]2) The admins shouldn't assume once the upload server is up, everyone is good. It might take a while to fully drain pending uploads before things go back to normal. Hello. A related issue: On one of my hosts, the boinc-client partition is full and so the state file cannot be written and boinc exits. There are over 120 WU waiting to upload (174G), but if the state file cannot be written, I am concerned that these uploads will not get initiated when the upload server comes back online. Can anyone suggest a workaround for this? My first instinct is to move a chunk of the data to an adjacent partition, but I do not know how to ensure that the data structures will remain intact. In other words: How do I move exactly 100% of half of the completed WU from the data directory to a holding directory? I have done this several times (including twice this week) to move an entire data partition to a new, bigger one using rsync -a which works reliably, but as I do not know how to move these completed WU in their entirety, I'd appreciate some feedback. Many thanks |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,580,037 RAC: 14,757 |
[quote]Hello. A related issue: On one of my hosts, the boinc-client partition is full and so the state file cannot be written and boinc exits. There are over 120 WU waiting to upload (174G), but if the state file cannot be written, I am concerned that these uploads will not get initiated when the upload server comes back online. Can anyone suggest a workaround for this? My first instinct is to move a chunk of the data to an adjacent partition, but I do not know how to ensure that the data structures will remain intact. In other words: How do I move exactly 100% of half of the completed WU from the data directory to a holding directory? I have done this several times (including twice this week) to move an entire data partition to a new, bigger one using rsync -a which works reliably, but as I do not know how to move these completed WU in their entirety, I'd appreciate some feedback.My first thought would be to use 'rsync -a' to move the directory contents to another partiion (or attach an external drive) and then just soft link all the workunit files back to their original location. I am not sure if boinc cares if the files are links though, it could check but my guess is it doesn't. Maybe someone more knowledgeable about boinc would know. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,580,037 RAC: 14,757 |
Upload server update 9/1/23 10:49GMT From a meeting this morning with CPDN they do not expect the upload server to be available until 17:00GMT TOMORROW (10th) at the earliest. The server itself is running, but they have to move many Tbs of data but also want to monitor the newly configured server to check it is stable. As already said, these are issues caused by the cloud provider, not CPDN themselves. |
©2024 cpdn.org