climateprediction.net home page
ANZ model upload problems.

ANZ model upload problems.

Message boards : Number crunching : ANZ model upload problems.
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 48515 - Posted: 25 Mar 2014, 2:30:24 UTC

Error reported by file upload server: hadam3p_anz_a48n_2012_1_008561644_0_6.zip] locked by file_upload_handler PID=-1

Of the 3 completed ANZ models on this host, all 3 are reporting this error on 1 or 2 upload files. All files are going to
<upload_url>http://rwah0.rdsi.tpac.org.au/cgi-bin/file_upload_handler</upload_url>

Of the 39 upload files from these 3 ANZ models, 5 are stuck with this ongoing error. All the rest uploaded OK. No pattern as to which files fail upload by sequence number.
Might be that upload handler fails to recover after a transient error on a particular file?
ID: 48515 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 48516 - Posted: 25 Mar 2014, 3:23:21 UTC - in response to Message 48515.  

I'll email the project. They'll need to get this sorted fast.

Hmmm. 3.14 am there, so it's going to be a long wait.

ID: 48516 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 48519 - Posted: 25 Mar 2014, 9:31:28 UTC - in response to Message 48516.  

I has 6 complete OK on the 24th.
ID: 48519 · Report as offensive     Reply Quote
Profile old_user651284

Send message
Joined: 28 Mar 11
Posts: 35
Credit: 82,588
RAC: 0
Message 48539 - Posted: 26 Mar 2014, 12:07:17 UTC

This appears to be an NFS file locking issue.
It currently affects about 1% of the files that have uploaded.

The solution would be to stop file locks on the NFS-mounted storage device on the ANZ server, but I am not yet sure of the implications this would have - I am guessing the effect would be minimal, but I am checking with the servers admin.

Jonathan

CPDN-sysadmin
ID: 48539 · Report as offensive     Reply Quote
Profile jimbo

Send message
Joined: 11 Dec 05
Posts: 5
Credit: 714,983
RAC: 0
Message 48547 - Posted: 26 Mar 2014, 19:29:26 UTC

3/26/2014 2:19:44 PM | climateprediction.net | Started upload of hadam3p_anz_a4qy_2012_1_008562303_0_1.zip
3/26/2014 2:19:46 PM | climateprediction.net | [error] Error reported by file upload server: [hadam3p_anz_a4qx_2012_1_008562302_0_7.zip] locked by file_upload_handler PID=-1
3/26/2014 2:19:46 PM | climateprediction.net | Temporarily failed upload of hadam3p_anz_a4qx_2012_1_008562302_0_7.zip: transient upload error
3/26/2014 2:19:46 PM | climateprediction.net | Backing off 00:18:18 on upload of hadam3p_anz_a4qx_2012_1_008562302_0_7.zip
3/26/2014 2:19:58 PM | climateprediction.net | [error] Error reported by file upload server: [hadam3p_anz_a4qy_2012_1_008562303_0_1.zip] locked by file_upload_handler PID=-1
3/26/2014 2:19:58 PM | climateprediction.net | Temporarily failed upload of hadam3p_anz_a4qy_2012_1_008562303_0_1.zip: transient upload error
3/26/2014 2:19:58 PM | climateprediction.net | Backing off 04:03:14 on upload of hadam3p_anz_a4qy_2012_1_008562303_0_1.zip
ID: 48547 · Report as offensive     Reply Quote
Profile jimbo

Send message
Joined: 11 Dec 05
Posts: 5
Credit: 714,983
RAC: 0
Message 48548 - Posted: 26 Mar 2014, 19:29:29 UTC
Last modified: 26 Mar 2014, 19:33:48 UTC

Can you check to see if there is a server down and relate my previous message please ?
ID: 48548 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 48553 - Posted: 26 Mar 2014, 21:52:37 UTC

The preceding 2 posts have been moved here from a thread in the Science section.

ID: 48553 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 48564 - Posted: 27 Mar 2014, 9:12:01 UTC
Last modified: 27 Mar 2014, 9:16:48 UTC

I also have 2 stuck with the same error getting reported from different machines:

526 climateprediction.net 27-03-2014 04:39 PM [error] Error reported by file upload server: [hadam3p_anz_n7dq_2012_1_008583254_0_1.zip] locked by file_upload_handler PID=-1

This one got a "transient upload error" at 05:17 (UTC + 11 hours) and then has been getting this since 05:30

and

604 climateprediction.net 27-03-2014 07:30 AM [error] Error reported by file upload server: [hadam3p_anz_n7ot_2012_1_008583653_1_1.zip] locked by file_upload_handler PID=-1

This one appeared after a "transient upload error" at 07:26 (UTC + 11) and from 07:30 its been getting the locked file error.

It looks like a common theme, it gets an upload error and then file isn't getting released. Running BOINC 7.2.42 on one and 7.3.11 on the other.
BOINC blog
ID: 48564 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 48567 - Posted: 27 Mar 2014, 9:42:42 UTC

One of mine is now past 20 minutes of back off.

ID: 48567 · Report as offensive     Reply Quote
theUnkownCoder

Send message
Joined: 4 Mar 14
Posts: 5
Credit: 1,459,572
RAC: 0
Message 48568 - Posted: 27 Mar 2014, 10:31:41 UTC - in response to Message 48567.  

Just to add my voice - getting the same. I did start another thread (sorry), but Lee pointed me to this one.
ID: 48568 · Report as offensive     Reply Quote
theUnkownCoder

Send message
Joined: 4 Mar 14
Posts: 5
Credit: 1,459,572
RAC: 0
Message 48575 - Posted: 27 Mar 2014, 14:54:47 UTC - in response to Message 48568.  

And now, mine has just uploaded (14:00 GMT). Thanks to whoever did whatever necessary to fix it :)
ID: 48575 · Report as offensive     Reply Quote
Profile Ananas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 48578 - Posted: 27 Mar 2014, 20:33:24 UTC

With the large upload files and the high server load here, broken uploads can easily happen - here on the server, on the ISP or who knows where else those bites can disappear on their way (maybe the NSA eats some too).

The timeout of the upload handler seems to be somewhat longer than the retry delay of the BOINC core client. I had it too lately for a few times, it always fixed itself after some time.
ID: 48578 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 48580 - Posted: 27 Mar 2014, 20:59:44 UTC

Jonathan re-booted the server some hours ago, and also removed the file locking.
There are still some transient failures, but that will be due to the large influx of data.

ID: 48580 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 48581 - Posted: 27 Mar 2014, 21:49:45 UTC - in response to Message 48580.  

All uploads from here finished ok, including one that was stuck for several days.
Thanks Jonathan.
ID: 48581 · Report as offensive     Reply Quote
Profile Byron Leigh Hatch @ team Carl ...
Avatar

Send message
Joined: 17 Aug 04
Posts: 289
Credit: 44,103,664
RAC: 0
Message 48602 - Posted: 28 Mar 2014, 15:57:19 UTC - in response to Message 48580.  

Les thank you very much for that information :)
ID: 48602 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 48611 - Posted: 29 Mar 2014, 4:04:33 UTC

After file locks disappeared still had transient upload failures. I had to dig out the old proxy server and hook it up to the dial up to clear them. Personally I think it's an issue with my ISP and they don't have a clue. Anyway all files cleared as of 2 hours ago. Work units progressing, and I added 2 more machines to help out.
BOINC blog
ID: 48611 · Report as offensive     Reply Quote
Albert H.

Send message
Joined: 18 Feb 06
Posts: 72
Credit: 54,382,603
RAC: 34,957
Message 48617 - Posted: 29 Mar 2014, 19:07:47 UTC

Hello,
as you see here below all my anz.. do not upload. eu or pnw do it !

Actually only anz are running on my 2 machines. Shall i go on and hope they are doing in the future, or shall y change something or aborting and waiting for better times..??

Thanks

29/03/2014 10:22:56 | climateprediction.net | Started upload of hadam3p_eu_e1nf_2013_1_008547582_1_7.zip
29/03/2014 10:33:23 | climateprediction.net | Finished upload of hadam3p_eu_e1nf_2013_1_008547582_1_7.zip
29/03/2014 11:27:55 | climateprediction.net | Started upload of hadam3p_anz_nb4h_2012_1_008588105_0_1.zip
29/03/2014 11:28:36 | climateprediction.net | Temporarily failed upload of hadam3p_anz_nb4h_2012_1_008588105_0_1.zip: transient HTTP error
29/03/2014 11:28:36 | climateprediction.net | Backing off 05:24:11 on upload of hadam3p_anz_nb4h_2012_1_008588105_0_1.zip
29/03/2014 11:28:39 | | Project communication failed: attempting access to reference site
29/03/2014 11:28:41 | | Internet access OK - project servers may be temporarily down.
29/03/2014 11:41:42 | climateprediction.net | Started upload of hadam3p_anz_a46k_2012_1_008561569_0_2.zip
29/03/2014 11:42:21 | climateprediction.net | Temporarily failed upload of hadam3p_anz_a46k_2012_1_008561569_0_2.zip: transient HTTP error
29/03/2014 11:42:21 | climateprediction.net | Backing off 05:51:47 on upload of hadam3p_anz_a46k_2012_1_008561569_0_2.zip
29/03/2014 11:42:35 | | Project communication failed: attempting access to reference site
29/03/2014 11:42:38 | | Internet access OK - project servers may be temporarily down.
29/03/2014 13:40:01 | climateprediction.net | Started upload of hadam3p_anz_nb4i_2012_1_008588106_0_1.zip
29/03/2014 13:41:00 | climateprediction.net | Temporarily failed upload of hadam3p_anz_nb4i_2012_1_008588106_0_1.zip: transient HTTP error
29/03/2014 13:41:00 | climateprediction.net | Backing off 03:08:42 on upload of hadam3p_anz_nb4i_2012_1_008588106_0_1.zip
29/03/2014 13:41:14 | | Project communication failed: attempting access to reference site
29/03/2014 13:41:17 | | Internet access OK - project servers may be temporarily down.

ID: 48617 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 48618 - Posted: 29 Mar 2014, 20:28:47 UTC

Please don't abort any of the models. The files will very probably upload eventually and everything you've crunched will be of use to the project. The uploads will retry automatically after specified backoff delays and the files will come to no harm waiting to upload. Some people sometimes have files waiting to upload for more than a week or two. Having files waiting to upload shouldn't prevent your computer from receiving new work.

Some of these delays can be caused by the large number of files waiting in queue to upload and the server can't handle the simultaneous volume.
Cpdn news
ID: 48618 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 48628 - Posted: 30 Mar 2014, 8:00:33 UTC
Last modified: 30 Mar 2014, 8:15:34 UTC

Upload speed here ranges from dialup speed to full available bandwidth.
Sometimes have multiple transient http errors, most uploads no problems.
Could be server sometimes overloaded, could be ISP or local problem.
Last few weeks, all uploads get there sooner or later.
If other than "transient http" please report.
Long experience leads me to say "give it a couple days" and if error message not yet reported, please report immediately.

<edit>>

Also, have seen uploads that get temp http errors restart where they left off, rather than restarting from 0. Is this the newer better CPDN version?
ID: 48628 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 48629 - Posted: 30 Mar 2014, 8:19:24 UTC

I get the occasional transient failure, and I'm only a thousand miles or so north of the server. And often by the time I notice, it's already well on it's way again.

So, no worries.

ID: 48629 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : ANZ model upload problems.

©2024 climateprediction.net