Message boards : Number crunching : Upload server is out of disk space
Message board moderation
Author | Message |
---|---|
Send message Joined: 2 Oct 19 Posts: 21 Credit: 47,674,094 RAC: 24,265 |
File uploads were going along quite nicely until this appeared in the boinc log. Wed 11 Jan 2023 07:27:19 PM EST | climateprediction.net | [error] Error reported by file upload server: Server is out of disk space |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
Seeing the same thing. Oh well. I'll shut the machines back down. |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 42,957,144 RAC: 78,404 |
It's kinda funny I was not able to upload anything due to transient HTTP error, but can see these messages like everyone else. ¯\_(ツ)_/¯ |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,748,059 RAC: 5,647 |
Woke up to this. I'm also seeing that many uploads have reached 100%, but failed to complete. That suggests that the upload server may have failed to forward the files to backing cloud storage (or may have not done so quickly enough). |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,614,827 RAC: 12,088 |
Waiting for an update from CPDN. My guess is the transfer server has stopped moving files off the upload server. We'll see. Hopefully most people uploaded enough they can start downloading tasks again. Woke up to this. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,748,059 RAC: 5,647 |
Thanks - please continue to keep us updated as and when. I've suspended networking on the machine which has more disk space available - it can carry on crunching at least until tomorrow without pestering the upload server (and save me money, because I'm not using the GPUs while concentrating on IFS). The machine with restricted disk space is doing GPU work (quick in and out, no long-term build up on disk), so will only contact the servers sporadically as the backoffs expire. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,614,827 RAC: 12,088 |
I'll post updates if I get them to the 'Uploads are stuck' thread, am busy with other things. I'm sure Dave will update when he hears anything too. Thanks - please continue to keep us updated as and when. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
It's kinda funny I was not able to upload anything due to transient HTTP error, but can see these messages like everyone else. ¯\_(ツ)_/¯ It makes sense. Each upload takes up a HTTP slot on the server for some long while (minutes, in my case). When the server is out of connection slots, things just time out - it can't get your connection serviced. When it's returning errors, that's a quick (milliseconds) sort of response. So it can service far, far more clients when it simply has to say, "I'm full, go away," than when it's processing a lot of long running uploads. |
Send message Joined: 27 Mar 21 Posts: 79 Credit: 78,322,658 RAC: 1,085 |
wujj123456 wrote: It's kinda funny I was not able to upload anything due to transient HTTP error, but can see these messages like everyone else. ¯\_(ツ)_/¯The web server, scheduler, feeder, validator, transitioner, download file handler… are on www.cpdn.org (status), but the upload file handler for the current OIFS work is on upload11.cpdn.org. They are physically different. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
The web server, scheduler, feeder, validator, transitioner, download file handler… are on www.cpdn.org (status), but the upload file handler for the current OIFS work is on upload11.cpdn.org. They are physically different. I could not upload a UK Met Office HadSM4 at N144 resolution v8.02-i686-pc-linux-gnu task result until the upload11.cpdn.org.server started working again (before it quit again). |
Send message Joined: 18 Nov 18 Posts: 21 Credit: 6,635,794 RAC: 2,524 |
The web server, scheduler, feeder, validator, transitioner, download file handler… are on www.cpdn.org (status), but the upload file handler for the current OIFS work is on upload11.cpdn.org. They are physically different. So YOU broke it this time, LOL!!! I too am stuck trying to upload completed tasks and have actually suspended the Project on several pc's to stop the crunching and constant back and forth stuff and let it settle down so everyone can get their stuff thru. |
Send message Joined: 20 Dec 20 Posts: 13 Credit: 40,069,885 RAC: 6,860 |
Hello, I could not upload windows task Weather At Home 2. A have more ten tasks with an error on upload : 14/01/2023 08:42:09 | climateprediction.net | Temporarily failed upload of wah2_nz25_a0d2_198905_25_936_012150232_0_r951897616_16.zip: transient HTTP error 14/01/2023 08:42:09 | climateprediction.net | Temporarily failed upload of wah2_nz25_a0d2_198905_25_936_012150232_0_r951897616_18.zip: transient HTTP error Can you help me ? Kali. |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
Hello,If going to upload11, this should resolve when the backlog of OIFS tasks has cleared. if in options>event log options you enable http debug you should be able to see if that is the case. The XML file for that batch isn't on the Trello board the project uses for me to check from here. The other way to find out is looking at client_state.xml where each task should have a line saying what the upload handler is. |
Send message Joined: 7 Jun 17 Posts: 23 Credit: 44,434,789 RAC: 2,600,991 |
I'll post updates if I get them to the 'Uploads are stuck' thread, am busy with other things. I'm sure Dave will update when he hears anything too. Here is an observation: I have five hosts with WU in uploading status. Of these five, three of them are successfully uploading files and as they are disgorging their backlog, they are able to download new WU, process and upload them. The two other hosts that are failing to secure an upload slot are blocked from downloading as they are up to capacity and therefore idle. Can anyone confirm that actively crunching machines are more successful at elbowing their way in to an upload slot? If so, it seems that it would be a shame that these machines are uploading 20 hours into a 28 day deadline, while backlog-enforced idling hosts are unable to fight their way onto the server. Just an observation, but it feels that it is more than just a sampling error. fraser |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,748,059 RAC: 5,647 |
Yes, that probably is true. BOINC has an extensive system of 'backoffs': if something isn't working, it'll pause and wait - for longer and longer. But it will try a newly created upload, just once, as soon as its been created. If that single upload gets through, then the backoffs are cleared, and everything starts moving again. You can try and clear things, by using the 'retry' tools in BOINC Manager, but it gets very tedious, very quickly. Might be worth having a look, and giving things a prod, when you happen to be passing the machine. Otherwise, simply wait until the rush has died down - BOINC will retry periodically, just not very often. |
Send message Joined: 16 Jan 18 Posts: 2 Credit: 121,919,969 RAC: 2,111 |
I have about 2.5 TB result files and can upload about 10 GB. This means to resolve the backlog takes 250 days |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
I am now down to 16 tasks uploading. I think I will be clear by the end of play tomorrow. Keeping to just one task running till backlog is cleared. |
Send message Joined: 20 Dec 20 Posts: 13 Credit: 40,069,885 RAC: 6,860 |
Hello,If going to upload11, this should resolve when the backlog of OIFS tasks has cleared. if in options>event log options you enable http debug you should be able to see if that is the case. The XML file for that batch isn't on the Trello board the project uses for me to check from here. The other way to find out is looking at client_state.xml where each task should have a line saying what the upload handler is. Thank You Dave, There is that in the xml file : <file> <name>wah2_nz25_a0d2_198905_25_936_012150232_0_r951897616_18.zip</name> <nbytes>90031062.000000</nbytes> <max_nbytes>150000000.000000</max_nbytes> <md5_cksum>e20a8b248529e2d3f15e277a2a530f41</md5_cksum> <status>1</status> <upload_url>http://upload4.cpdn.org/cgi-bin/file_upload_handler</upload_url> <persistent_file_xfer> <num_retries>56</num_retries> <first_request_time>1671650199.948561</first_request_time> <next_request_time>1673693268.434832</next_request_time> <time_so_far>46278.530403</time_so_far> <last_bytes_xferred>0.000000</last_bytes_xferred> <is_upload>1</is_upload> </persistent_file_xfer> </file> Kali. |
Send message Joined: 16 Jan 18 Posts: 2 Credit: 121,919,969 RAC: 2,111 |
I have 1400 tasks to upload. This means 2.5 TB. if there is no wonder the backlog is forever. |
Send message Joined: 7 Jun 17 Posts: 23 Credit: 44,434,789 RAC: 2,600,991 |
You can try and clear things, by using the 'retry' tools in BOINC Manager What would that be in boinccmd? --network_available seems to do nothing, I assumed it was a toggle; --file_transfer requires a filename and doesn't work with wildcards. I was hoping to set up a cronjob to try and improve the chances of getting a slot. It seems to be a case of giving to those who already have. Is there someway the backing off time period could be reduced to a few minutes for those machines that have failed to upload and a few tens of minutes for those that succeeded? If the question is simply a correlation between number of attempts and successful uploads, then to allow unsuccessful attempts shorter times between tries would stand a better chance of clearing some of these 'too many uploads' errors, at least enough to allow the stalled hosts to resume active duty. Just a thought. fraser |
©2024 cpdn.org