climateprediction.net home page
Upload Failure

Upload Failure

Message boards : Number crunching : Upload Failure
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next

AuthorMessage
Profile Ananas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 43995 - Posted: 11 Apr 2012, 18:46:45 UTC - in response to Message 43929.  
Last modified: 11 Apr 2012, 19:44:00 UTC

I think that's a server side problem, possibly because the server load was too high at the time. In which case it'll fix itself after a while.


It is a server side problem, the scheduler (parser) tries to read 256 bytes from sched_request.xml and doesn't get those.

There is no syntax or sanity check yet, just reading stuff into the buffer fails.

It already happened on 3 boxes for me, two of which got work in the meantime, one still struggling.

The file handle is most likely not null because it does check that (a bunch of statements before trying fgets() though).

Unfortunately they don't report errno so it's not so easy to tell the exact reason.

p.s.: the upload error and the scheduler error are not necessarily related (2 different programs) but the chance is high that the same thing causes them. The fgets() problem has been reported in a bunch of other projects like lhc, simap and seti
ID: 43995 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 44046 - Posted: 19 Apr 2012, 12:51:16 UTC

I'm getting upload failures on zip file 13 from AM3P EU models today.
ID: 44046 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 925
Credit: 34,100,818
RAC: 11,270
Message 44047 - Posted: 19 Apr 2012, 13:44:14 UTC - in response to Message 44046.  

I'm getting upload failures on zip file 13 from AM3P EU models today.

Me too. Specifically,

19/04/2012 14:39:35 | climateprediction.net | [error] Error reported by file upload server: can't open file
ID: 44047 · Report as offensive     Reply Quote
old_user423

Send message
Joined: 7 Aug 04
Posts: 1
Credit: 316,399
RAC: 12
Message 44050 - Posted: 19 Apr 2012, 21:42:52 UTC - in response to Message 44046.  

I'm getting upload failures on zip file 13 from AM3P EU models today.




Same here.

4/19/2012 1:40:54 PM | climateprediction.net | Temporarily failed upload of hadam3p_eu_ag6v_1990_1_007841613_1_13.zip: transient upload error

ID: 44050 · Report as offensive     Reply Quote
marpes

Send message
Joined: 11 Nov 04
Posts: 8
Credit: 15,267,364
RAC: 0
Message 44055 - Posted: 20 Apr 2012, 10:35:03 UTC

My one of:
20. 4. 2012 9:33:18 | climateprediction.net | Started upload of hadam3p_saf_0shh_1964_1_006860493_1_13.zip
20. 4. 2012 9:33:20 | climateprediction.net | Temporarily failed upload of hadam3p_saf_0shh_1964_1_006860493_1_13.zip: transient HTTP error
20. 4. 2012 9:33:20 | climateprediction.net | Backing off 4 hr 18 min 58 sec on upload of hadam3p_saf_0shh_1964_1_006860493_1_13.zip

error state is more than 24 hours
ID: 44055 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 44056 - Posted: 20 Apr 2012, 11:10:37 UTC - in response to Message 44046.  

I'm getting upload failures on zip file 13 from AM3P EU models today.


My zip files 13 are now uploading.
ID: 44056 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 44058 - Posted: 21 Apr 2012, 11:58:45 UTC - in response to Message 44056.  
Last modified: 21 Apr 2012, 12:10:04 UTC

Now it's upload problem hadam3p_saf_28zu_1975_1_007240600_1_3.zip
since about 10:19 Zulu
Reports "transient upload problem"
Only on SAF final .13

At _present/>
<url>http://cpdn-upload2.oerc.ox.ac.uk/cgi-bin/file_upload_handler</url/>

Sorry -- usually wait a day or two before reporting upload problems -- should have waited until Monday in any case -- the staff always fix these things -- the saf not in my script -- apologies -- this can wait a few days.

EDIT -- "not a biggie problem at all"
ID: 44058 · Report as offensive     Reply Quote
old_user131271

Send message
Joined: 6 Dec 05
Posts: 1
Credit: 250,722
RAC: 0
Message 44059 - Posted: 21 Apr 2012, 16:11:44 UTC
Last modified: 21 Apr 2012, 16:12:22 UTC

i have the same Problem:

Sa 21 Apr 18:06:34 2012 | climateprediction.net | Started upload of hadam3p_eu_94qe_1966_1_007726190_0_13.zip
Sa 21 Apr 18:06:35 2012 | climateprediction.net | [error] Error reported by file upload server: can't open file
Sa 21 Apr 18:06:35 2012 | climateprediction.net | Temporarily failed upload of hadam3p_eu_94qe_1966_1_007726190_0_13.zip: transient upload error
Sa 21 Apr 18:06:35 2012 | climateprediction.net | Backing off 3 hr 17 min 32 sec on upload of hadam3p_eu_94qe_1966_1_007726190_0_13.zip


error state is more than 24 hours
ID: 44059 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,377,675
RAC: 3,657
Message 44060 - Posted: 21 Apr 2012, 16:43:56 UTC - in response to Message 44059.  

Most likely the server is full up. Whatever the problem it is unlikely that anyone will get a chance to look at the server till Monday after 0900 UK time.

Dave
ID: 44060 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 44061 - Posted: 21 Apr 2012, 19:49:21 UTC

There is a problem with the server which can't be fixed remotely, so the project people need to get physical access to it.
This won't happen until Monday morning UK time.
And then it may take a while for them to find out what's wrong, and even more time to fix it. Especially if replacement parts are needed.


Backups: Here
ID: 44061 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 44062 - Posted: 22 Apr 2012, 19:54:21 UTC - in response to Message 44061.  

Andy went in on Sunday and gave the server a good talking to.
It's now working again.

Backups: Here
ID: 44062 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,377,675
RAC: 3,657
Message 44063 - Posted: 22 Apr 2012, 20:28:52 UTC - in response to Message 44062.  

Never thought of trying that when my machine is misbehaving, must try it next time I have a problem.
ID: 44063 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,377,675
RAC: 3,657
Message 44071 - Posted: 23 Apr 2012, 10:01:35 UTC - in response to Message 44063.  

Looks like some more talking is needed.

Mon 23 Apr 2012 10:55:31 BST | climateprediction.net | Started upload of hadam3p_eu_98iv_1963_1_007852746_0_11.zip
Mon 23 Apr 2012 10:55:32 BST | climateprediction.net | [error] Error reported by file upload server: can't open log file '../log_uploader1/file_upload_handler.log' (errno: 9)
Mon 23 Apr 2012 10:55:32 BST | climateprediction.net | Temporarily failed upload of hadam3p_eu_98iv_1963_1_007852746_0_11.zip: transient upload error
Mon 23 Apr 2012 10:55:32 BST | climateprediction.net | Backing off 1 hr 50 min 55 sec on upload of hadam3p_eu_98iv_1963_1_007852746_0_11.zip


Dave
ID: 44071 · Report as offensive     Reply Quote
Profile old_user480295

Send message
Joined: 2 Nov 07
Posts: 1
Credit: 332,900
RAC: 0
Message 44073 - Posted: 23 Apr 2012, 10:30:48 UTC

I found simpler way.
Abort and disconnect.
ID: 44073 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,377,675
RAC: 3,657
Message 44074 - Posted: 23 Apr 2012, 10:42:00 UTC - in response to Message 44073.  

Once the relevant server is working again it will accept the zip files. Why bother running the project only to waste the computation time by aborting?

Dave
ID: 44074 · Report as offensive     Reply Quote
Profile old_user651284

Send message
Joined: 28 Mar 11
Posts: 35
Credit: 82,588
RAC: 0
Message 44075 - Posted: 23 Apr 2012, 15:54:17 UTC

Hi everyone,

We are currently suffering two server failures - both serious hard disk issues, so I am configuring another to take over their roles before I get around to sorting out those problems.

I will let you know how things proceed, but it will be at least 24 hours before we can consider ourselves back online.

Please accept my apologies.

Jonathan

CPDN Sys-Admin
ID: 44075 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,377,675
RAC: 3,657
Message 44077 - Posted: 23 Apr 2012, 19:57:38 UTC - in response to Message 44075.  

OK network activity suspended till tomorrow at least. That way it won't keep trying when everyone else is also trying.

Dave
ID: 44077 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 44081 - Posted: 24 Apr 2012, 8:31:08 UTC - in response to Message 44077.  

Yeah -- just wait a while -- some files uploading now, some not.
Patience, patience. It won't be long -- don't waste any wu.
Don 't kill any process -- this happens sometimes and the support team at Oxford will fix it so nothing gets wasted.
They've done it before and will do again -- nothing gets wasted.
Donating another few terabytes might be welcome, but who can afford another 200 TB? or EB? or whatever the Bigabytes are now?
Patience.
They do get it right.
Wait a day or two and all will be well.

Really.
ID: 44081 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 44086 - Posted: 25 Apr 2012, 8:01:35 UTC
Last modified: 25 Apr 2012, 8:20:24 UTC

Looking at the server status page just now and 3 of the 7 upload servers are off-line. Must be some fairly major failures going on.

I know Jonathan wrote that 2 of them had hard disk problems, so looks like they may need to replace a lot of the drives with new ones. Maybe we need a fund-raising drive to help things along? The donations page can be found HERE

I don't know what the cost of a 2Tb server grade HDD is in the UK, I would guess around 70 pounds. Anyway i've made a donation to get things started.
BOINC blog
ID: 44086 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,377,675
RAC: 3,657
Message 44087 - Posted: 25 Apr 2012, 14:47:32 UTC - in response to Message 44086.  

Back down to two out again now. At least there are still plenty of work units going. I only have two cores so the transfer backlog isn't taking up too much disk space.

Dave
ID: 44087 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next

Message boards : Number crunching : Upload Failure

©2024 climateprediction.net