climateprediction.net home page
Upload failures

Upload failures

Message boards : Number crunching : Upload failures
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 19 · Next

AuthorMessage
Wilgard

Send message
Joined: 30 Mar 10
Posts: 12
Credit: 2,607,453
RAC: 110
Message 60587 - Posted: 4 Jul 2019, 13:27:16 UTC

A few tasks are successfully uploaded :) but not all of them. Patience is a virtue
ID: 60587 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,279,907
RAC: 5,765
Message 60588 - Posted: 4 Jul 2019, 16:21:30 UTC - in response to Message 60578.  
Last modified: 4 Jul 2019, 16:24:05 UTC

[quote][quote]Never seen a problem from suspending tasks if BOINC isn't stopped and restarted. Also a long time since even doing that I have lost a Windows task.

I have. It’s rare, but, I have occasionally had a WU’s crash after going through the suspend, wait a minute, exit the Boinc manager process.

FINALLY! My zips are uploading>
ID: 60588 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 60592 - Posted: 4 Jul 2019, 21:42:27 UTC - in response to Message 60583.  

Bernard said:
I suggested that more channels are used to spread the message but I haven't seen elsewhere - so no one listened.


This has been talked about, although not recently.

No way I'm going to touch any "social media" stuff, and the BOINC Notices system will only get picked up by people who've not turned them off, and who bother to look at BOINC anyway. (Do they even appear in the simple view?)

My post was intended as a "heads up" to those that read this board.
And I meant by it, to stop running models so as to not accumulate lots of zips that would have to join the fight to get back to the server after things got fixed.

People that crash hundreds of tasks without even wondering why they aren't getting any credit don't count.

I haven't asked for an update on the jasmin situation, because I know that the Profs and Drs are busy with other matters.
But I guess I should before the weekend sets in. (In my case, it looks like being a wet one.)
ID: 60592 · Report as offensive     Reply Quote
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 487
Credit: 30,493,229
RAC: 6,415
Message 60593 - Posted: 4 Jul 2019, 22:09:12 UTC - in response to Message 60584.  
Last modified: 4 Jul 2019, 22:11:07 UTC



To check if a task goes to jasmine search for the string, "upload_url" in your client-state.xml file and go through them till you find the one for the task in question. You should find something like the following,

[quote]wah2_safr50_n0ym_198912_13_820_011866056_0_r5511092_1.zip</name><nbytes>0.000000</nbytes><max_nbytes>150000000.000000</max_nbytes><status>0</status><upload_url>http://jasmin-upload.cpdn.org/cgi-bin/file_upload_handler


Or check the event log in the tools tab for the file transfer and you will get something like:

04/07/2019 22:15:49 | climateprediction.net | Started upload of wah2_safr50_n06x_201112_13_818_011860496_1_r430858283_5.zip
04/07/2019 22:15:49 | climateprediction.net | [file_xfer] URL: http://jasmin-upload.cpdn.org/cgi-bin/file_upload_handler
04/07/2019 22:15:50 | climateprediction.net | [http] [ID#100] Info: Trying 192.171.139.103...

Patiently waiting for uploads to get going again!
ID: 60593 · Report as offensive     Reply Quote
rob

Send message
Joined: 5 Jun 09
Posts: 90
Credit: 3,556,000
RAC: 6,458
Message 60594 - Posted: 5 Jul 2019, 6:46:25 UTC
Last modified: 5 Jul 2019, 6:48:08 UTC

No way I'm going to touch any "social media" stuff, and the BOINC Notices system will only get picked up by people who've not turned them off, and who bother to look at BOINC anyway. (Do they even appear in the simple view?)


The answer to your question is "sort of" - the notices button gains a red border when you hover over it, not very obvious, and not every user will even bother with looking at the simple view to see that.

And last night something stirred in the land of Jasmine - the mountain of my backed up zip files were transferred, stating round about 10pm BST.
ID: 60594 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 60595 - Posted: 5 Jul 2019, 6:57:20 UTC

Thanks for the bit about the Notices..

Glad to hear that about your zips.
I've only just sent an email asking for an update, and I said that people were still having trouble getting uploads to go, so I guess this is where it tries to make a liar out of me.
ID: 60595 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,481,198
RAC: 8,897
Message 60596 - Posted: 5 Jul 2019, 7:50:43 UTC - in response to Message 60595.  

Thanks Les,

I do appreciate all the efforts of people posting here and especially all of the moderators. However I really think project people should be more pro-active especially in times of trouble and we shouldn't plea for updates. It is just few lines that need to be written in an e-mail, not on a type machine and send via horse power.

I did check CPDN twitter recently (during the upload failures) and all I saw was info on the new OpenIFS project and how thousands of WU were sent to crunchers. Well they could've also sent an alert, or post on the CPDN web, and use BOINC. Of course some people will never get the message, but the idea is to try to reach as many as possible.

It is great queues are clearing up, let's hope all goes well during the summer.
ID: 60596 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 60597 - Posted: 5 Jul 2019, 8:55:52 UTC

OK, update:

The issue is one of bandwidth onto the system. We are constantly running 146 upload processes at the moment and have been since I announced on the board that we were restarting jasmin. This is one of the issues around backing up when not running.
We are investigating how we can setup another upload server within the infrastructure that can be run as part of a load balancing pair behind the jasmin-upload name.


David
ID: 60597 · Report as offensive     Reply Quote
Billy Ewell 1931

Send message
Joined: 14 Aug 06
Posts: 22
Credit: 6,155,220
RAC: 4,616
Message 60598 - Posted: 5 Jul 2019, 15:15:31 UTC

Confirm please: You want us to suspend those CPDN tasks identified as safr 50 and sam 50. That is obviously easy to do on BOINC by simply hitting the SUSPEND button. However, that does not stop the transfer actions on those tasks that are already completed and in upload status. Regrets but this almost 88 year old brain is not understanding while trying to comply with instructions. Bill
ID: 60598 · Report as offensive     Reply Quote
KWSN Sir Clark

Send message
Joined: 8 Jul 05
Posts: 33
Credit: 1,274,211
RAC: 0
Message 60599 - Posted: 5 Jul 2019, 16:37:06 UTC - in response to Message 60598.  

It seems to be working now

I'm sure Les will clarify but my backlog cleared.
ID: 60599 · Report as offensive     Reply Quote
_Ryle_

Send message
Joined: 17 Aug 05
Posts: 22
Credit: 16,057,688
RAC: 15,434
Message 60600 - Posted: 5 Jul 2019, 19:30:37 UTC

Billy Ewell: You can suspend network activity but only in Boinc Managers advanced view I think - I don't think there is that option in simple view.

But no communication is possible while network is suspended.

I've suspended my uploads until further notice, but try now and then to see if I can upload. So far no luck here - I've got a few uploads waiting, but no running work units now. Think I will wait the weekend over.
ID: 60600 · Report as offensive     Reply Quote
Profile Vicki

Send message
Joined: 28 Nov 15
Posts: 50
Credit: 4,099,809
RAC: 0
Message 60601 - Posted: 5 Jul 2019, 20:32:13 UTC - in response to Message 59957.  

I have had stuck uploads for the better part of a week from these 2 types of wah
5/07/2019 7:50:05 a.m. | climateprediction.net | Started upload of wah2_sam50_a4z3_201112_24_815_011854387_0_r799897574_20.zip
5/07/2019 7:55:13 a.m. | climateprediction.net | Temporarily failed upload of wah2_sam50_a4z3_201112_24_815_011854387_0_r799897574_20.zip: transient HTTP error
5/07/2019 7:50:05 a.m. | climateprediction.net | Started upload of wah2_safr50_n7cq_201512_13_820_011874340_0_r1435939452_8.zip
5/07/2019 7:55:13 a.m. | climateprediction.net | Temporarily failed upload of wah2_safr50_n7cq_201512_13_820_011874340_0_r1435939452_8.zip: transient HTTP error
I have more trickles of the same series also stuck on another computer.
I hope they can fix the servers soon, I am running out of space to store unsent trickles.
Kind regards Vicki.
ID: 60601 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 60602 - Posted: 5 Jul 2019, 21:09:12 UTC

Vicki

First of all, they're zip files, not trickles. (Which are RPCs, and don't show up in the Transfers tab.)

Second, if you're running out of space then Suspend either each model in the Tasks tab, or the project in the Projects tab.
I said this a week ago.

Then wait for however long it takes to clear.

There are posts from the project's technical manager in several places in this thread explaining the situation.
ID: 60602 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 60603 - Posted: 5 Jul 2019, 21:13:49 UTC - in response to Message 60598.  

Billy

The reason for Suspending the models, is so that they don't keep creating zips, and therefore making more problems for you the individual.

Leaving the computer connected to the internet all the time, will allow what zips you already have to eventually upload.

Which may be another week. Or two. Who knows.
ID: 60603 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 60604 - Posted: 5 Jul 2019, 21:16:32 UTC

KWSN Sir Clark

Thanks for that. Nice to have some good news.
ID: 60604 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 60605 - Posted: 6 Jul 2019, 16:52:15 UTC

All of my sam50 zips (around a hundred) have gone. But I have a couple of cam25 zips that have been hanging for over a week. They have been retried 50 to 62 times.

Do they go to a different server?
ID: 60605 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4472
Credit: 18,448,326
RAC: 22,385
Message 60606 - Posted: 6 Jul 2019, 17:07:10 UTC - in response to Message 60605.  

All of my sam50 zips (around a hundred) have gone. But I have a couple of cam25 zips that have been hanging for over a week. They have been retried 50 to 62 times.

Do they go to a different server?


Yes, in Mexico I believe.
ID: 60606 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 60609 - Posted: 6 Jul 2019, 21:30:06 UTC - in response to Message 60605.  

Hi Jim

I had 3 cam25s a year or 2 back.
One uploaded OK, one repeated zips 3 and 6 each time a new zip was created, and the third did the same, but I've forgotten how many it had trouble with.
In the end I just Aborted those two after all zips were created and giving a chance to upload.

I don't know what it is with those cams, but I wouldn't waste too much time on them. Let someone else have a try.

And it's good to hear that your others have gone.
That's two people with good news.
ID: 60609 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 60610 - Posted: 6 Jul 2019, 22:15:03 UTC

Like other posters, my zips for SAM50 and SAFR50 models have all cleared. But one ZIP for a CAM25 model repeatedly sticks at 13.47 percent of the upload. I have a folk memory that we have had this issue before but cannot remember whether there was a solution or whether we all just aborted them as Les suggests.
ID: 60610 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 60611 - Posted: 6 Jul 2019, 22:42:24 UTC
Last modified: 6 Jul 2019, 22:46:24 UTC

Thanks for the history on the cam25's. I found the one in question, and it has returned 18 zips.
https://www.cpdn.org/cpdnboinc/result.php?resultid=21709022

However, the ones that are stuck are #12 and #13. So it looks like they got lost in the shuffle.
If they have not uploaded by the time my other work has finished tomorrow, I will just can them
(as in trash can; I just realized that may not be clear to non-native English speakers).
ID: 60611 · Report as offensive     Reply Quote
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 19 · Next

Message boards : Number crunching : Upload failures

©2024 cpdn.org