climateprediction.net home page
ANOTHER UPLOAD PROBLEM

ANOTHER UPLOAD PROBLEM

Message boards : Number crunching : ANOTHER UPLOAD PROBLEM
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 33 · Next

AuthorMessage
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7100
Credit: 21,637,230
RAC: 9,439
Message 48118 - Posted: 7 Feb 2014, 22:25:35 UTC

OK, slow again. :)
Going off to collect some info.

..
..
..


For the record:

2 models on this machine (Q6600) (So as to leave resources for my usage.)
Windows XP Pro
BOINC 6.2.18

9 hours 10 minutes and still OK. No zips yet.

**********

4 models on the other Q6600.
Windows XP Pro
BOINC 6.10.18

Just over 9 hours and still OK. No zips yet.

**********

i7-3770K

Linux Mint 15, 32 bit
BOINC 7.2.33

4 models at 6 hours 30 minutes and running OK.

4 zips ready to upload.
From a bit of maths, it took just on 5 hours to get to that point.


ID: 48118 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 48120 - Posted: 7 Feb 2014, 22:59:09 UTC - in response to Message 48117.  

Hi Les and everyone,

> Did you notice how long the models ran before the message showed up?
Afraid not. The Event log only starts at 0728 this morning NZ - looks like it rolls over on a size limit that's been exceeded.

Looking at my tasks on the web shows typically 6-8 secs CPU time for each task, but this will be for the multiple starts - I assume.

Like Thyme Lawn, all my tasks have the heartbeat error in the stderr logs and the Event Log also shows each task getiing repeatedly hammered and I've extracted some of the calls for one task.

8/02/2014 7:28:31 a.m. | climateprediction.net | Restarting task hadam3p_pnw_uau9_1999_1_008507101_1 using hadam3p_pnw version 722 in slot 7
8/02/2014 7:28:42 a.m. | climateprediction.net | Task hadam3p_pnw_uau9_1999_1_008507101_1 exited with zero status but no 'finished' file
8/02/2014 7:28:42 a.m. | climateprediction.net | Restarting task hadam3p_pnw_uau9_1999_1_008507101_1 using hadam3p_pnw version 722 in slot 7
8/02/2014 7:28:52 a.m. | climateprediction.net | Task hadam3p_pnw_uau9_1999_1_008507101_1 exited with zero status but no 'finished' file
8/02/2014 7:28:52 a.m. | climateprediction.net | Restarting task hadam3p_pnw_uau9_1999_1_008507101_1 using hadam3p_pnw version 722 in slot 7
8/02/2014 7:29:03 a.m. | climateprediction.net | Task hadam3p_pnw_uau9_1999_1_008507101_1 exited with zero status but no 'finished' file

etc
ID: 48120 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7100
Credit: 21,637,230
RAC: 9,439
Message 48121 - Posted: 7 Feb 2014, 23:30:42 UTC - in response to Message 48120.  

OK, I guess that's close enough.
They're having problems right from the start, so mine at multiple hours are a different case. Phew

Searching for and analysing the failures will keep Andy busy for a while. :(

ID: 48121 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1093
Credit: 19,728,822
RAC: 3,589
Message 48123 - Posted: 8 Feb 2014, 0:29:25 UTC


ID: 48123 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 1863
Credit: 37,946,922
RAC: 27,000
Message 48126 - Posted: 8 Feb 2014, 5:28:49 UTC - in response to Message 48116.  

Does anyone have a HadAM3P PNW v7.22 task running successfully?

Note that Martin is running BOINC v7.2.33 on Windows 7 and I'm running v7.0.73 on Vista, so the combination of a BOINC v7 client on a Windows system might be significant.

Running two on an i5 2500K in Win7 64bit with 64 bit BOINC 7.0.64 and each model has sent up 4 zips. No status messages with any errors in the event log.

No problems noticed with 6 linux tasks either.
ID: 48126 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 93,590,762
RAC: 16,921
Message 48127 - Posted: 8 Feb 2014, 6:53:41 UTC

A flock flies in my boxes, in Windows Vista/7/8 -- including two retreads (one each _1 and _2). Some .zip uploads generated and uploaded okay. All seems in order.
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 48127 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7100
Credit: 21,637,230
RAC: 9,439
Message 48128 - Posted: 8 Feb 2014, 8:41:05 UTC - in response to Message 48126.  

I seem to be having multiple senior moments at present.

First, I was composing a lengthy reply this morning, and posted it just after Thyme Lawn. I then didn't read all of his post, so I missed the bit at the end asking if anyone else had some running.

I also remember collecting info about what I had running and emailing it back to Andy, but now I can't find any copy of that. I've either clicked a wrong button, or imagined the whole thing.

And now my mouse is playing up, and won't "let go" of text. :(

**********

So:

2 models running on this Q6600 with Windows XP Pro, 32 bit. BOINC 6.2.18
2nd lot of zips waiting to upload.

4 models running on the other Q6600 with Windows XP Pro, 32 bit. BOINC 6.10.18
1st lot of zips uploaded. 2nd lot shouldn't be far away now.

4 running on my i7-3770K, with Linux Mint 15, 32 bit. BOINC 7.2.33
3rd lot of zips waiting to upload.


Backups: Here
ID: 48128 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 194
Credit: 9,970,489
RAC: 4,950
Message 48129 - Posted: 8 Feb 2014, 9:35:58 UTC - in response to Message 48116.  

Does anyone have a HadAM3P PNW v7.22 task running successfully?


I have 4 running just fine right now.
ID: 48129 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 48131 - Posted: 9 Feb 2014, 8:31:10 UTC - in response to Message 48129.  

Just to let you know I've gone back to BOINC v7.2.28. To stop too many bad runs I've changed the preferences so that the processor will only accept 4 tasks (was 12) to run.

Interesting to see what happens when the next lot come along.
ID: 48131 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 48146 - Posted: 11 Feb 2014, 2:42:59 UTC - in response to Message 48131.  

HADAM3P_EU models running OK on Boinc v7.2.28, but they might have been OK on the later version as well - who knows. I'll leave it with only 4 tasks running as I won't be able to watch the PC over the next 10 days. It would be nice to know what was causing the PNW issues.
ID: 48146 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1093
Credit: 19,728,822
RAC: 3,589
Message 48188 - Posted: 18 Feb 2014, 1:28:21 UTC

We seem to have that zip file upload problem again. Hadam3p_pnw zip files seem to upload normally until reach 100% and then hang up. Time to remount the server or something.

ID: 48188 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 1863
Credit: 37,946,922
RAC: 27,000
Message 48190 - Posted: 18 Feb 2014, 3:01:36 UTC - in response to Message 48188.  

Agree with JIM. This is what my upload attempts result in

Mon 17 Feb 2014 07:25:53 PM CST climateprediction.net [error] Error reported by file upload server: can't open file /storage/incoming/uploader_main/hadam3p_pnw_ubeu_2003_1_008507842_1_10.zip: Read-only file system
Mon 17 Feb 2014 07:25:53 PM CST climateprediction.net Temporarily failed upload of hadam3p_pnw_ubeu_2003_1_008507842_1_10.zip: transient upload error
Mon 17 Feb 2014 07:25:53 PM CST climateprediction.net Backing off 3 hr 49 min 39 sec on upload of hadam3p_pnw_ubeu_2003_1_008507842_1_10.zip
Mon 17 Feb 2014 08:48:01 PM CST climateprediction.net Started upload of hadam3p_pnw_ubeu_2003_1_008507842_1_11.zip
Mon 17 Feb 2014 08:48:18 PM CST climateprediction.net [error] Error reported by file upload server: can't open file /storage/incoming/uploader_main/hadam3p_pnw_ubeu_2003_1_008507842_1_11.zip: Read-only file system
Mon 17 Feb 2014 08:48:18 PM CST climateprediction.net Temporarily failed upload of hadam3p_pnw_ubeu_2003_1_008507842_1_11.zip: transient upload error
Mon 17 Feb 2014 08:48:18 PM CST climateprediction.net Backing off 1 min 0 sec on upload of hadam3p_pnw_ubeu_2003_1_008507842_1_11.zip
ID: 48190 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2585
Credit: 3,137,178
RAC: 481
Message 48191 - Posted: 18 Feb 2014, 9:12:48 UTC

HADAM3P_EU models running OK on Boinc v7.2.28, but they might have been OK on the later version as well - who knows.


Certainly my one PNW model seems ok on 7.2.39
ID: 48191 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 377
Credit: 7,127,507
RAC: 0
Message 48196 - Posted: 18 Feb 2014, 11:37:57 UTC

Apparently the "Read-only file system" upload problem relates to the PNW researchers' server in Oregon. They've been alerted, but the email probably won't be acted on until the start of office hours in the Pacific time zone - five or six hours from now.

Because the whole file is re-sent at each retry, volunteers with limited upload bandwidth might want to disable BOINC's networking until then.
ID: 48196 · Report as offensive     Reply Quote
Profile Bonsai911

Send message
Joined: 9 Sep 04
Posts: 214
Credit: 29,115,536
RAC: 15,245
Message 48231 - Posted: 25 Feb 2014, 15:43:03 UTC

Is this reported?


25.02.2014 16:36:57 | climateprediction.net | Requesting new tasks for CPU
25.02.2014 16:37:58 | climateprediction.net | Scheduler request failed: HTTP gateway timeout


2 finished wu with stucked upload, upload is ready, but they're not reported.
ID: 48231 · Report as offensive     Reply Quote
Profile Bonsai911

Send message
Joined: 9 Sep 04
Posts: 214
Credit: 29,115,536
RAC: 15,245
Message 48233 - Posted: 25 Feb 2014, 17:17:26 UTC

Resolved

ID: 48233 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 48784 - Posted: 13 Apr 2014, 6:25:10 UTC - in response to Message 48233.  

I'm now having similar issues with EU models, but back running BOINC v7.2.28. Three tasks have crashed after 9 sec run time, 0 sec cpu time, and all other tasks in those work units have also crashed. Just when things were running pretty sweetly :-(

1 of the tasks here

Std Err:
Model crashed: INITTIME: Atmosphere basis time mismatch

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>hadam3p_eu_aasr_2013_1_008605345_2_1.zip</file_name> <error_code>-161</error_code>

Typical event log:
13/04/2014 2:17:09 p.m. | climateprediction.net | Starting task hadam3p_eu_aasr_2013_1_008605345_2 using hadam3p_eu version 609 in slot 0
13/04/2014 2:17:20 p.m. | climateprediction.net | Computation for task hadam3p_eu_aasr_2013_1_008605345_2 finished
13/04/2014 2:17:20 p.m. | climateprediction.net | Output file hadam3p_eu_aasr_2013_1_008605345_2_1.zip for task hadam3p_eu_aasr_2013_1_008605345_2 absent
.....
.....
13/04/2014 2:17:20 p.m. | climateprediction.net | Output file hadam3p_eu_aasr_2013_1_008605345_2_13.zip for task hadam3p_eu_aasr_2013_1_008605345_2 absent

ID: 48784 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7100
Credit: 21,637,230
RAC: 9,439
Message 48785 - Posted: 13 Apr 2014, 6:48:50 UTC - in response to Message 48784.  

The error INITTIME is caused by a problem with the number of "things" in one file not matching the number of "things" in another file.

So, just bad luck.

ID: 48785 · Report as offensive     Reply Quote
Waldmeister

Send message
Joined: 13 Jun 11
Posts: 34
Credit: 745,983
RAC: 1,080
Message 49329 - Posted: 10 Jun 2014, 12:26:57 UTC

Hello folks!

Gotta report the same problem as in first post of this thread again: zip-file No.13 loads up to 100% and then doesn't finish, it rather restarts upload.

The task itself seemed to work really well the past few days. No problems of any kind visible. Just the very last upload of Zip-file No.13 doesn't complete.
Tried it three times now.

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=16656981


Here the logs of the message menu (... lines denote messages of other projects)

10.06.2014 12:44:05 | climateprediction.net | Computation for task hadam3p_eu_r858_2013_1_008755442_0 finished
...
10.06.2014 12:50:05 | | Resuming network activity
10.06.2014 12:50:05 | climateprediction.net | Started upload of hadam3p_eu_r858_2013_1_008755442_0_9.zip
...
10.06.2014 12:50:05 | climateprediction.net | Started upload of hadam3p_eu_r858_2013_1_008755442_0_10.zip
...
10.06.2014 12:50:15 | climateprediction.net | Sending scheduler request: To send trickle-up message.
10.06.2014 12:50:15 | climateprediction.net | Not requesting tasks: "no new tasks" requested via Manager
...
10.06.2014 12:50:18 | climateprediction.net | Scheduler request completed
...
10.06.2014 12:59:14 | climateprediction.net | Finished upload of hadam3p_eu_r858_2013_1_008755442_0_9.zip
10.06.2014 12:59:14 | climateprediction.net | Started upload of hadam3p_eu_r858_2013_1_008755442_0_11.zip
10.06.2014 12:59:45 | climateprediction.net | Finished upload of hadam3p_eu_r858_2013_1_008755442_0_10.zip
10.06.2014 12:59:45 | climateprediction.net | Started upload of hadam3p_eu_r858_2013_1_008755442_0_12.zip
10.06.2014 13:07:28 | climateprediction.net | Finished upload of hadam3p_eu_r858_2013_1_008755442_0_11.zip
10.06.2014 13:07:28 | climateprediction.net | Started upload of hadam3p_eu_r858_2013_1_008755442_0_13.zip
...
10.06.2014 13:08:22 | climateprediction.net | Finished upload of hadam3p_eu_r858_2013_1_008755442_0_12.zip
...
10.06.2014 13:15:45 | climateprediction.net | [error] Error reported by file upload server: can't open file /storage/cpdn-restarts/incoming/uploader/hadam3p_eu_r858_2013_1_008755442_0_13.zip: No such file or directory
10.06.2014 13:15:45 | climateprediction.net | Temporarily failed upload of hadam3p_eu_r858_2013_1_008755442_0_13.zip: transient upload error
10.06.2014 13:15:45 | climateprediction.net | Backing off 00:02:57 on upload of hadam3p_eu_r858_2013_1_008755442_0_13.zip
10.06.2014 13:18:43 | climateprediction.net | Started upload of hadam3p_eu_r858_2013_1_008755442_0_13.zip
10.06.2014 13:26:21 | climateprediction.net | [error] Error reported by file upload server: can't open file /storage/cpdn-restarts/incoming/uploader/hadam3p_eu_r858_2013_1_008755442_0_13.zip: No such file or directory
10.06.2014 13:26:21 | climateprediction.net | Temporarily failed upload of hadam3p_eu_r858_2013_1_008755442_0_13.zip: transient upload error
10.06.2014 13:26:21 | climateprediction.net | Backing off 00:05:38 on upload of hadam3p_eu_r858_2013_1_008755442_0_13.zip
10.06.2014 13:26:31 | | Suspending network activity - user request
10.06.2014 13:52:48 | | Resuming network activity
10.06.2014 13:52:48 | climateprediction.net | Started upload of hadam3p_eu_r858_2013_1_008755442_0_13.zip
...
10.06.2014 14:00:17 | climateprediction.net | [error] Error reported by file upload server: can't open file /storage/cpdn-restarts/incoming/uploader/hadam3p_eu_r858_2013_1_008755442_0_13.zip: No such file or directory
10.06.2014 14:00:17 | climateprediction.net | Temporarily failed upload of hadam3p_eu_r858_2013_1_008755442_0_13.zip: transient upload error
10.06.2014 14:00:17 | climateprediction.net | Backing off 00:10:37 on upload of hadam3p_eu_r858_2013_1_008755442_0_13.zip
10.06.2014 14:00:24 | | Suspending network activity - user request


The upload-section for this ZIP-File in file client_state.xml looks like this:

<file>
<name>hadam3p_eu_r858_2013_1_008755442_0_13.zip</name>
<nbytes>36821677.000000</nbytes>
<max_nbytes>150000000.000000</max_nbytes>
<md5_cksum>b4d07615d3c72c2219551c58bb075fba</md5_cksum>
<status>1</status>
<upload_url>http://cpdn-restarts.oerc.ox.ac.uk/cgi-bin/file_upload_handler</upload_url>
<persistent_file_xfer>
<num_retries>3</num_retries>
<first_request_time>1402397041.847410</first_request_time>
<next_request_time>1402402255.478571</next_request_time>
<time_so_far>1403.103088</time_so_far>
<last_bytes_xferred>36821891.000000</last_bytes_xferred>
<is_upload>1</is_upload>
</persistent_file_xfer>
</file>



Could someone please look into this? Has it got to do with this announcement: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=7846#49312 ?

Greetings Waldmeister
ID: 49329 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1093
Credit: 19,728,822
RAC: 3,589
Message 49332 - Posted: 10 Jun 2014, 15:14:54 UTC - in response to Message 49329.  


ID: 49332 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 33 · Next

Message boards : Number crunching : ANOTHER UPLOAD PROBLEM

©2019 climateprediction.net