climateprediction.net (CPDN) home page
Thread 'Upload Failure'

Thread 'Upload Failure'

Message boards : Number crunching : Upload Failure
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next

AuthorMessage
marpes

Send message
Joined: 11 Nov 04
Posts: 8
Credit: 15,267,364
RAC: 0
Message 44146 - Posted: 4 May 2012, 18:16:16 UTC

From yesterday I can see:
3. 5. 2012 20:31:16 | climateprediction.net | [error] Error reported by file upload server: can't open log file '../log_uploader1/file_upload_handler.log' (errno: 9)
3. 5. 2012 20:31:16 | climateprediction.net | Temporarily failed upload of hadam3p_eu_a3mc_1997_1_007861165_2_11.zip: transient upload error
.
4. 5. 2012 19:11:00 | climateprediction.net | [error] Error reported by file upload server: can't open log file '../log_uploader1/file_upload_handler.log' (errno: 9)
4. 5. 2012 19:11:00 | climateprediction.net | Temporarily failed upload of hadam3p_eu_a3mc_1997_1_007861165_2_12.zip: transient upload error
.
4. 5. 2012 19:53:29 | climateprediction.net | [error] Error reported by file upload server: can't open log file '../log_uploader1/file_upload_handler.log' (errno: 9)
4. 5. 2012 19:53:29 | climateprediction.net | Temporarily failed upload of hadam3p_eu_a3mc_1997_1_007861165_2_11.zip: transient upload error

On page Server Status is no problem.
Where problem is?
ID: 44146 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 44147 - Posted: 4 May 2012, 20:27:21 UTC - in response to Message 44146.  

It should just be temporary, (transient), at a time when the server was overloaded with computers wanting to upload files.

I've been having no problems.


Backups: Here
ID: 44147 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 44148 - Posted: 5 May 2012, 4:01:11 UTC - in response to Message 44147.  

It should just be temporary, (transient), at a time when the server was overloaded with computers wanting to upload files.

I've been having no problems.


Nope. I am getting them too on some eu work units for zip files 11 and 12. I think the admins need to move the log file or something. It's probably run out of disk space (with the backlog of uploads that wouldn't be surprising). Hopefully Jonathan or one of the other guys will notice and fix it soon.

BOINC blog
ID: 44148 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 44149 - Posted: 5 May 2012, 4:10:04 UTC - in response to Message 44148.  

I've just sent them a note, but it's 5am on Saturday morning.


Backups: Here
ID: 44149 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,748,059
RAC: 5,647
Message 44152 - Posted: 5 May 2012, 8:42:33 UTC - in response to Message 44148.  

It should just be temporary, (transient), at a time when the server was overloaded with computers wanting to upload files.

I've been having no problems.

Nope. I am getting them too on some eu work units for zip files 11 and 12. I think the admins need to move the log file or something. It's probably run out of disk space (with the backlog of uploads that wouldn't be surprising). Hopefully Jonathan or one of the other guys will notice and fix it soon.

If it's either a log file problem or a lack of disk space, the reason for the upload failure would appear in your local BOINC client message/event log (which the admins can't see directly).

Why not help them find the cause of the problem by quoting the error message which you can see?
ID: 44152 · Report as offensive     Reply Quote
Nigel Garvey

Send message
Joined: 5 May 10
Posts: 69
Credit: 1,169,103
RAC: 2,258
Message 44153 - Posted: 5 May 2012, 9:08:50 UTC - in response to Message 44152.  

Why not help them find the cause of the problem by quoting the error message which you can see?


Hi.

I'm getting the same thing with files 10 to 13 of a HADAM3P EU. The error's: "[error] Error reported by upload server: can't open log file '../log_uploader1/file_upload_handler.log' (errno: 9)".


NG
ID: 44153 · Report as offensive     Reply Quote
old_user557471

Send message
Joined: 18 Feb 09
Posts: 4
Credit: 97,447
RAC: 0
Message 44162 - Posted: 6 May 2012, 9:12:51 UTC - in response to Message 44153.  
Last modified: 6 May 2012, 9:13:31 UTC

I'm getting the same thing with files 10 to 13 of a HADAM3P EU. The error's: "[error] Error reported by upload server: can't open log file '../log_uploader1/file_upload_handler.log' (errno: 9)".



Same problem for me too.
ID: 44162 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 44171 - Posted: 7 May 2012, 8:31:39 UTC - in response to Message 44162.  

A server always seems to go down on bank holiday weekend. Is rain getting into the system?
ID: 44171 · Report as offensive     Reply Quote
Profile[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 21 Oct 10
Posts: 53
Credit: 2,101,753
RAC: 3,985
Message 44172 - Posted: 7 May 2012, 17:51:01 UTC

I have a http error while trying to upload a zip file + one of the upload servers is down (http://climateapps2.oerc.ox.ac.uk/cpdnboinc/server_status.html) but there are others online...

Last time I tried to upload was 1 or 2 weeks ago (I crunch on an offline machine and move WU with USB key on a windows VM into my home Mac when they are terminated, and request new work from the VM), and it was down too... I have now 4 WU waiting for upload... the good thing is that the deadline is faaaaar away ;)
ID: 44172 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 44173 - Posted: 7 May 2012, 20:05:25 UTC
Last modified: 7 May 2012, 20:14:20 UTC

Seems that the problem is only with the eu uploads -- pnw uploads work here, eu don't even the _01 uploads fail.
The one server is reported as down.

@Dave -- yeah Murphy's law says that all system failures will happen at the worst time.

And on a re-assuring note -- in the last 8 years crunching for climateprediction -- the staff has always fixed problems without losing data. Usually in a day or 2, once it took almost 2 weeks when they had a serious hardware failure. They got mirrors they got logs they got backups.
I've learned to trust their backups. Wish my own home backup system was as good.

Please keep posting any upload problems here.

And keep on crunching.

Eric
ID: 44173 · Report as offensive     Reply Quote
Profile[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 21 Oct 10
Posts: 53
Credit: 2,101,753
RAC: 3,985
Message 44174 - Posted: 7 May 2012, 20:31:06 UTC

As I said, since there is such a long deadline it's not a real problem, only a pain :D
ID: 44174 · Report as offensive     Reply Quote
Duzzie

Send message
Joined: 31 Aug 04
Posts: 1
Credit: 1,083,806
RAC: 0
Message 44176 - Posted: 14 May 2012, 9:15:06 UTC

Is there any update on this. I have a few EU uploads waiting and I have WUs still working but I don't want to waste time crunching the WUs if the issue isn't going to be resolved and the data lost.
ID: 44176 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 44177 - Posted: 14 May 2012, 10:22:11 UTC - in response to Message 44176.  

Jonathan and the team will get this sorted, I know it has been a bit longer than usual on this occasion - maybe he is on holiday or something. In the years I have been with the project these problems have always been resolved without data loss - all the data is stored on a mirrored raid array and regularly backed up externally as well. Hopefully one day the project will get the money it deserves which would allow newer better hardware and more staff to look after it.

The only advice I can offer is a phrase I heard regularly when in the forces, "Hurry up and wait!"

Dave
ID: 44177 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 44178 - Posted: 15 May 2012, 11:56:15 UTC

uploader1.atm seems to back on-line and one of my eu work units thats been trying to upload for a week has finally gone through. A big thank you to the guys for fixing it up.
BOINC blog
ID: 44178 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 44179 - Posted: 15 May 2012, 13:58:20 UTC - in response to Message 44178.  

Good to see the server back on line, have I been unobservant or is srv1.cpdn.psu.edu new in the past week or so? I must get back in the habit of backing up my work units - just lost a couple due to power outage. "The chances of a power of disk failure is proportional to at least the square of the time since the last backup."
ID: 44179 · Report as offensive     Reply Quote
ProfileThyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 44184 - Posted: 17 May 2012, 8:31:59 UTC - in response to Message 44179.  

The project team believe they have resolved all of the upload server issues. If anyone is still having problems uploading please let us know.
have I been unobservant or is srv1.cpdn.psu.edu new in the past week or so?

That's an upload server which hasn't been used for some time Dave. I don't pay much attention to the green on the server status page, but it may well have been returned to active service.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 44184 · Report as offensive     Reply Quote
Profile[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 21 Oct 10
Posts: 53
Credit: 2,101,753
RAC: 3,985
Message 44202 - Posted: 18 May 2012, 20:12:45 UTC

Wow, I can upload !

Good news, it's been so long I wasn't able to do it that I have almost 800 MB to upload now, it's going to take a while with my DSL 110 KBPS upload...

Thanks for fixing this !
ID: 44202 · Report as offensive     Reply Quote
old_user633787

Send message
Joined: 14 Sep 10
Posts: 11
Credit: 1,812,972
RAC: 0
Message 44205 - Posted: 20 May 2012, 1:44:32 UTC - in response to Message 44202.  
Last modified: 20 May 2012, 1:47:38 UTC

It's still broken here. I get logs full of:

[file_xfer_debug] URL: http://cpdn-restarts.oerc.ox.ac.uk/cgi-bin/file_upload_handler
[file_xfer_debug] FILE_XFER_SET::poll(): http op done, retval -107
[file_xfer_debug] file transfer status -107

when attempting to upload _13 hada* results.

I can ping http://cpdn-restarts.oerc.ox.ac.uk successfully from the machines running cpdn.
ID: 44205 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 44206 - Posted: 20 May 2012, 2:32:12 UTC - in response to Message 44205.  
Last modified: 20 May 2012, 2:33:08 UTC

Welcome to the Twilight Zone.
Don't adjust your horizontal, your vertical, or your mind, just some spellings on your computer.

There's a problem with some flavours of Linux that cause a character string to become corrupted when it's stored on the computer in question.

This has been discussed in several threads, (probably under several Topics), on our php board. This is one thread.

The cure is to **carefully** edit client_state.xml, and correct the corruption. One spelling is hnndler, but there are others.

Do a search on the 4 character model name, then check each line until you find the upload section for zip 13, then look at the spelling.
Backups: Here
ID: 44206 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 44207 - Posted: 20 May 2012, 6:35:23 UTC

Oh dear == thought this problem had gone away -
follow the advice on the thread Les referred to
And please follow this advice about stopping BOINC and the CPDN models and doing a backup before editing the client-state.xml
If you follow the procedure to correct the misspellings -uploads will start working.

Keep on keeping on.


ID: 44207 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next

Message boards : Number crunching : Upload Failure

©2024 cpdn.org