climateprediction.net home page
Upload failures

Upload failures

Message boards : Number crunching : Upload failures
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 18 · Next

AuthorMessage
Mephist0

Send message
Joined: 21 Feb 08
Posts: 45
Credit: 7,929,915
RAC: 201
Message 60566 - Posted: 3 Jul 2019, 8:45:42 UTC - in response to Message 60565.  
Last modified: 3 Jul 2019, 8:46:22 UTC

ok thanks!
I have 75 files to upload (around 5GB of data). Good thing the timelimit is long then ;)

I had some problems with bad proxy software also that seemed to transfer the files even if the server would not accept it or something like that. I have changed proxy software now. It does not start to transfer anything and ends after a few minutes with "transient HTTP error" so it seems to be ok then :)

The old proxy software worked fine when transfers were working fine but seems now when project has issues the problem with the old proxy shows :)

Problem is that i cannot leave the proxy software running for too long since its not allowed on the network. But i will have to start it once in a while and see if it transfers or not then :)
ID: 60566 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7182
Credit: 22,883,095
RAC: 2,783
Message 60567 - Posted: 3 Jul 2019, 9:08:19 UTC - in response to Message 60564.  

Has anyone been able to upload anything to Jasmin recently?

I have some anz50 tasks that is not using Jasmin server i guess, any idea how i could get them to upload instead? It seems it tries the same files over and over again..


ANZ stands for Australia/New Zealand.
These files go to a data center in the south of Tasmania, which is an island state in the SE corner of Australia.

To get zip files to go there, just allow access to the internet.
There are no problems with their servers.

However ...
If you have lots of zips going to various place, then there IS a problem - files are uploaded in the order in which they were created.
So, first you have to wade through all of the files that were created before the ANZ files, THEN the ANZ files get a turn.

And that means waiting for each file going to jasmin to either upload or time out.

Part of the reason that the magic word for this project is Patience
ID: 60567 · Report as offensive     Reply Quote
Mephist0

Send message
Joined: 21 Feb 08
Posts: 45
Credit: 7,929,915
RAC: 201
Message 60568 - Posted: 3 Jul 2019, 9:47:30 UTC - in response to Message 60567.  

Ok thanks, then i know. I have no rush. Its only that i had problems before the disk space problems started.. But issues seems to have been resolved now changing proxy software. I will try to send the work in 3 weeks when i get back from my vacation :)
ID: 60568 · Report as offensive     Reply Quote
Mephist0

Send message
Joined: 21 Feb 08
Posts: 45
Credit: 7,929,915
RAC: 201
Message 60569 - Posted: 3 Jul 2019, 9:59:39 UTC

Damnit.. Seems i still have issues uploading ANZ50 files.. I dont get it.. I have tried 3 different proxy software now and i changed the project from HTTP to HTTPS deleting all my file transfers for that one.. But i still cannot upload it seems like...

2019-07-03 11:57:22 | climateprediction.net | Temporarily failed upload of wah2_anz50_a0k6_201612_20_793_011761372_1_r261502111_3.zip: transient HTTP error

The ANZ50 server should accept files without problems i guess? I dont get it why i have issues here...
ID: 60569 · Report as offensive     Reply Quote
Mephist0

Send message
Joined: 21 Feb 08
Posts: 45
Credit: 7,929,915
RAC: 201
Message 60570 - Posted: 3 Jul 2019, 10:07:06 UTC

Here is the log for this attempt with ANZ50 file..

Dont know if it is possible to see something there..

2019-07-03 12:05:11 | climateprediction.net | Started upload of wah2_anz50_a0k6_201612_20_793_011761372_1_r261502111_3.zip
2019-07-03 12:05:11 | climateprediction.net | [file_xfer] URL: http://upload4.cpdn.org/cpdn_cgi/file_upload_handler
2019-07-03 12:05:12 | | [http_xfer] [ID#0] HTTP: wrote 1210 bytes
2019-07-03 12:05:12 | | [http_xfer] [ID#0] HTTP: wrote 2819 bytes
2019-07-03 12:05:12 | | [http_xfer] [ID#0] HTTP: wrote 3523 bytes
2019-07-03 12:05:12 | | [http_xfer] [ID#0] HTTP: wrote 3596 bytes
2019-07-03 12:05:12 | | [http_xfer] [ID#0] HTTP: wrote 1734 bytes
2019-07-03 12:05:12 | | Internet access OK - project servers may be temporarily down.
2019-07-03 12:05:12 | | [http_xfer] [ID#232] HTTP: wrote 98 bytes
2019-07-03 12:05:13 | climateprediction.net | [file_xfer] http op done; retval 0 (Success)
2019-07-03 12:05:13 | climateprediction.net | [file_xfer] parsing upload response: <data_server_reply> <status>0</status> <file_size>262144</file_size></data_server_reply>
2019-07-03 12:05:13 | climateprediction.net | [file_xfer] parsing status: 0
2019-07-03 12:05:13 | climateprediction.net | [fxd] starting upload, upload_offset 262144
2019-07-03 12:05:15 | | Project communication failed: attempting access to reference site
2019-07-03 12:05:15 | climateprediction.net | [file_xfer] http op done; retval -184 (transient HTTP error)
2019-07-03 12:05:15 | climateprediction.net | [file_xfer] file transfer status -184 (transient HTTP error)
2019-07-03 12:05:15 | climateprediction.net | Temporarily failed upload of wah2_anz50_a0k6_201612_20_793_011761372_1_r261502111_3.zip: transient HTTP error
ID: 60570 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 250
Credit: 15,728,685
RAC: 4,287
Message 60571 - Posted: 3 Jul 2019, 12:10:15 UTC

Continuing to get this error:

7/3/2019 8:04:28 AM | climateprediction.net | Started upload of wah2_sam50_n6hw_201612_25_822_011884425_0_r639342217_2.zip
7/3/2019 8:04:30 AM | | Project communication failed: attempting access to reference site
7/3/2019 8:04:31 AM | | Internet access OK - project servers may be temporarily down.
7/3/2019 8:04:52 AM | climateprediction.net | Temporarily failed upload of wah2_sam50_n6hw_201612_25_822_011884425_0_r639342217_2.zip: transient HTTP error
ID: 60571 · Report as offensive     Reply Quote
Profile Byron Leigh Hatch @ team Carl Sagan
Avatar

Send message
Joined: 17 Aug 04
Posts: 289
Credit: 44,059,568
RAC: 335
Message 60572 - Posted: 3 Jul 2019, 14:50:14 UTC

same here:

2019-07-03 07:33:21 | climateprediction.net | Temporarily failed upload of wah2_sam50_n6uw_201612_25_822_011884293_0_r566787151_13.zip: transient HTTP error
ID: 60572 · Report as offensive     Reply Quote
Paul

Send message
Joined: 14 Feb 06
Posts: 17
Credit: 3,602,581
RAC: 104
Message 60573 - Posted: 3 Jul 2019, 16:04:30 UTC - in response to Message 60571.  

Continuing to get this error:

7/3/2019 8:04:28 AM | climateprediction.net | Started upload of wah2_sam50_n6hw_201612_25_822_011884425_0_r639342217_2.zip
7/3/2019 8:04:30 AM | | Project communication failed: attempting access to reference site
7/3/2019 8:04:31 AM | | Internet access OK - project servers may be temporarily down.
7/3/2019 8:04:52 AM | climateprediction.net | Temporarily failed upload of wah2_sam50_n6hw_201612_25_822_011884425_0_r639342217_2.zip: transient HTTP error


That's the error that we're all getting due to the problems that the project is having.

Just let BOINC keep trying. Eventually it will upload.
ID: 60573 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7182
Credit: 22,883,095
RAC: 2,783
Message 60574 - Posted: 3 Jul 2019, 21:16:51 UTC

Yes, it could take another week to clear everything.

I DID suggest that people suspend running models until the problem was fixed, but it looks like no one listens anymore.
ID: 60574 · Report as offensive     Reply Quote
mmonnin

Send message
Joined: 28 May 17
Posts: 41
Credit: 4,852,919
RAC: 2,149
Message 60576 - Posted: 3 Jul 2019, 23:38:04 UTC

CPDN has a track record of computation errors when suspending tasks so I sure ignored that part. I don't blame any one else for it either.

I left the 2 that had not started yet suspended but the ones that had started I let them complete even if the uploads will take awhile.
ID: 60576 · Report as offensive     Reply Quote
Wilgard

Send message
Joined: 30 Mar 10
Posts: 12
Credit: 2,392,057
RAC: 953
Message 60577 - Posted: 4 Jul 2019, 7:03:30 UTC

Personally I suspended CPDN new tasks :)
ID: 60577 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2676
Credit: 3,245,797
RAC: 1,678
Message 60578 - Posted: 4 Jul 2019, 8:34:14 UTC - in response to Message 60576.  

CPDN has a track record of computation errors when suspending tasks so I sure ignored that part. I don't blame any one else for it either.

I left the 2 that had not started yet suspended but the ones that had started I let them complete even if the uploads will take awhile.


Never seen a problem from suspending tasks if BOINC isn't stopped and restarted. Also a long time since even doing that I have lost a Windows task.

Just to reiterate so the information stays near the top of the thread, Clearing the data and the backlog of people still uploading data some of whom have several hundred gigabytes means that it could easily be a week or more before the problems stop completely. Also no need to suspend any tasks other than sam50's as they go to different servers.
ID: 60578 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 167
Credit: 5,744,442
RAC: 694
Message 60579 - Posted: 4 Jul 2019, 8:36:52 UTC - in response to Message 60576.  
Last modified: 4 Jul 2019, 9:03:07 UTC

CPDN has a track record of computation errors when suspending tasks so I sure ignored that part. I don't blame any one else for it either.


I have two dual booted Mac systems and a Dell with Windows 7 all of which are run

ning CPDN. the two Macs have Fusion virtual machines running CPDN under Windows 7. (When there are no native MAC jobs available)

I have to swap between the OSX versions on the Macs relatively frequently and have never had any problems with errors of any sort except once at the very beginning when I started to do this. I've now been doing this for years whilst testing variations of RCA software.

The one problem I had at the beginning, occurred when shutting down Fusion before suspending CPDN which was in the middle of a Zip upload.

Since than I've always made sure there are no uploads running before suspending CPDN and only then suspending Fusion. (I always suspend the tasks before suspending the project although I don't have any obvious reason to think that this is strictly necessary.)

Also, I've never had a problem on the Dell with suspending, after taking similar suitable precautions.
ID: 60579 · Report as offensive     Reply Quote
Paul

Send message
Joined: 14 Feb 06
Posts: 17
Credit: 3,602,581
RAC: 104
Message 60580 - Posted: 4 Jul 2019, 8:55:27 UTC - in response to Message 60574.  

I DID suggest that people suspend running models until the problem was fixed, but it looks like no one listens anymore.


Do we have any idea how many of the 8,945 users with recent credit visit the message boards, and so would see your message?

I assumed that it was going to be a very small proportion, so I didn't see the point in suspending things.

So not so much not listening, just not seeing the point.
ID: 60580 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 167
Credit: 5,744,442
RAC: 694
Message 60581 - Posted: 4 Jul 2019, 9:08:23 UTC - in response to Message 60580.  
Last modified: 4 Jul 2019, 9:14:07 UTC

I do believe that Les understands the figures. It seems to me that he was simply replying to those who do come onto the boards but ignore his advice and continue to complain.
Actually there have been 5410 visit to this post, at this point, which is a lot more than the no. of replies so it would appear that many people probably have seen the post and may very well have acted accordingly.
ID: 60581 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 250
Credit: 15,728,685
RAC: 4,287
Message 60582 - Posted: 4 Jul 2019, 11:04:13 UTC

I have a safr50 job stuck as well, should I also suspend safr50 as well as the sam50 tasks?
ID: 60582 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 374
Credit: 12,901,862
RAC: 8,898
Message 60583 - Posted: 4 Jul 2019, 11:13:54 UTC - in response to Message 60581.  

I do check this thread numerous time a day which I guess combined with other regulars contributes also to the high number of visits. I suggested that more channels are used to spread the message but I haven't seen elsewhere - so no one listened. I did suspend the ones going to jasmin for almost a week, but again we have no clear info how things are going and what is to be expected. Once queues cleared I started few and uploads fail again.

Additionally CPDN started to require micromanagement cause of numerous issues and yet info is scarce and we have to be patient.

I think many of us are patient and persistent perhaps above average, yes CPDN is the most demanding project, but I started to feel I should not complain or ask for info.....because things are being dealt with and they are fixed up eventually.
ID: 60583 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2676
Credit: 3,245,797
RAC: 1,678
Message 60584 - Posted: 4 Jul 2019, 12:07:33 UTC - in response to Message 60582.  

I have a safr50 job stuck as well, should I also suspend safr50 as well as the sam50 tasks?


Just checked in client_state.xml as I have an safr50 that is suspended to allow testing tasks to run.

I can confirm that safr50's also go to jasmine so worth suspending that as well.

And it isn't just clearing space on the servers that needs to happen before problems stop.

Given that one person has posted saying they have 290GB to upload, an amount of data that from my connection would take over 10 days, there will be many computers competing for the limited number of connections the servers can take uploads on, even once all the backlog of data is cleared from the servers it reaches the data centre on, I don't see the problems clearing in under a week.

To check if a task goes to jasmine search for the string, "upload_url" in your client-state.xml file and go through them till you find the one for the task in question. You should find something like the following,

wah2_safr50_n0ym_198912_13_820_011866056_0_r5511092_1.zip</name><nbytes>0.000000</nbytes><max_nbytes>150000000.000000</max_nbytes><status>0</status><upload_url>http://jasmin-upload.cpdn.org/cgi-bin/file_upload_handler


Anything other than Jasmine is either OK or has a different problem.
ID: 60584 · Report as offensive     Reply Quote
mmonnin

Send message
Joined: 28 May 17
Posts: 41
Credit: 4,852,919
RAC: 2,149
Message 60585 - Posted: 4 Jul 2019, 12:27:30 UTC - in response to Message 60578.  

CPDN has a track record of computation errors when suspending tasks so I sure ignored that part. I don't blame any one else for it either.

I left the 2 that had not started yet suspended but the ones that had started I let them complete even if the uploads will take awhile.


Never seen a problem from suspending tasks if BOINC isn't stopped and restarted. Also a long time since even doing that I have lost a Windows task.

Just to reiterate so the information stays near the top of the thread, Clearing the data and the backlog of people still uploading data some of whom have several hundred gigabytes means that it could easily be a week or more before the problems stop completely. Also no need to suspend any tasks other than sam50's as they go to different servers.


How soon you forget. You started this thread.
https://www.cpdn.org/cpdnboinc/forum_thread.php?id=8701#59554

CPDN tasks are some of the most fragile tasks of all the BOINC projects. Most have no issues suspending or at least going back to the last checkpoint. Even if they did go back the last checkpoint, no one wants to lose several days of work. There's a higher chance of losing work from suspending than from a task trickle upload being lost.
ID: 60585 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2676
Credit: 3,245,797
RAC: 1,678
Message 60586 - Posted: 4 Jul 2019, 13:17:46 UTC - in response to Message 60585.  

Reminder, it is likely to take at least a week till things are back to normal!

How soon you forget. You started this thread.
https://www.cpdn.org/cpdnboinc/forum_thread.php?id=8701#59554


Not forgetting anything. That thread is specifically to do with a Linux batch. I know the problem with suspending and re-starting tasks when BOINC has been exited has not been resolved on the hadcm3 linux tasks. It remains to be seen how much of a problem it is with the HADAM4 and openifs tasks which will at some point be coming to Linux boxen.
ID: 60586 · Report as offensive     Reply Quote
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 18 · Next

Message boards : Number crunching : Upload failures

©2020 climateprediction.net