climateprediction.net home page
Upload failures

Upload failures

Message boards : Number crunching : Upload failures
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · Next

AuthorMessage
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2111
Credit: 58,047,187
RAC: 692
Message 61281 - Posted: 19 Oct 2019, 21:38:57 UTC - in response to Message 61280.  

@Speedy The error message in stderr on the task page says "The system cannot find the drive specified.". This is an error that crops up occasionally. No one knows the cause. It's not typically reproduced in the other tasks in the work unit. It may be some kind of timing issue when the model tries to write to, or read from the disk.

The error listing you pasted into your post are just because the model crashed before those monthly upload files are created. It was expecting to upload them and they were never generated. It's unfortunately not useful for finding the cause of the crash.
ID: 61281 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7609
Credit: 24,240,330
RAC: 2,564
Message 61285 - Posted: 20 Oct 2019, 0:30:14 UTC

Climate models have lots of files open, which all need saving at checkpoints.
With your computer having so many processors, it will need a VERY fast HD to keep up with all that saving when it occurs at the same time.
ID: 61285 · Report as offensive     Reply Quote
Speedy

Send message
Joined: 20 Jul 05
Posts: 25
Credit: 409,712
RAC: 0
Message 61286 - Posted: 20 Oct 2019, 0:53:50 UTC - in response to Message 61285.  

Climate models have lots of files open, which all need saving at checkpoints.
With your computer having so many processors, it will need a VERY fast HD to keep up with all that saving when it occurs at the same time.

Thank you for pointing this out I have cut it down to working on three tasks at a time. I'm not sure but maybe when I turned my machine last night it was trying to upload a trickle message
@Speedy The error message in stderr on the task page says "The system cannot find the drive specified.". This is an error that crops up occasionally. No one knows the cause. It's not typically reproduced in the other tasks in the work unit. It may be some kind of timing issue when the model tries to write to, or read from the disk.

The error listing you pasted into your post are just because the model crashed before those monthly upload files are created. It was expecting to upload them and they were never generated. It's unfortunately not useful for finding the cause of the crash.

Thank you for explaining the error message it makes complete sense. Hopefully I will be able to complete other tasks without them crashing
ID: 61286 · Report as offensive     Reply Quote
Ivorget

Send message
Joined: 23 Feb 05
Posts: 4
Credit: 1,291,717
RAC: 1,166
Message 61530 - Posted: 14 Nov 2019, 5:19:58 UTC - in response to Message 61239.  

I've had one cam25 zip transfer failing with 'transient HTTP error' since about Oct 26:
https://www.cpdn.org/result.php?resultid=21743279

Can it be fixed or is the advice still to abort?
ID: 61530 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7609
Credit: 24,240,330
RAC: 2,564
Message 61533 - Posted: 14 Nov 2019, 13:19:45 UTC - in response to Message 61530.  

Might as well abort, I can't see them finding what's wrong any time soon.
ID: 61533 · Report as offensive     Reply Quote
Ivorget

Send message
Joined: 23 Feb 05
Posts: 4
Credit: 1,291,717
RAC: 1,166
Message 61539 - Posted: 15 Nov 2019, 2:46:32 UTC - in response to Message 61533.  

I logged onto the machine now intending to abort it and funnily enough the transfer had finally succeeded just a couple of hours ago. ¯\_(ツ)_/¯
ID: 61539 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7609
Credit: 24,240,330
RAC: 2,564
Message 61540 - Posted: 15 Nov 2019, 3:52:05 UTC - in response to Message 61539.  

That's good.
After I posted here, I put a message on our project board, and the researcher in Mexico picked it up soon after and posted:

Hi! there was a a brief shutdown the 26th of October (for maintenance), so that might be the cause.


They're going to look into it when they're back in the office.
I Thought: Great. Too late now.

I'll post another internal message.
ID: 61540 · Report as offensive     Reply Quote
Ivorget

Send message
Joined: 23 Feb 05
Posts: 4
Credit: 1,291,717
RAC: 1,166
Message 61557 - Posted: 17 Nov 2019, 4:40:33 UTC - in response to Message 61540.  

Very good thanks for sorting it out then.
ID: 61557 · Report as offensive     Reply Quote
Merowig

Send message
Joined: 27 May 15
Posts: 3
Credit: 109,766
RAC: 0
Message 63492 - Posted: 5 Feb 2021, 16:13:27 UTC - in response to Message 61557.  

Hi

I can't get WUs to upload either
Upload : Pending (project backoff> and then a timer)
ID: 63492 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7609
Credit: 24,240,330
RAC: 2,564
Message 63493 - Posted: 5 Feb 2021, 19:08:02 UTC - in response to Message 63492.  

If you're talking about batch 869, PNW, then it's been closed.
Sorry, but you took too long.

Just abort them.
ID: 63493 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3426
Credit: 10,438,930
RAC: 9,029
Message 63494 - Posted: 5 Feb 2021, 21:24:20 UTC - in response to Message 60096.  

So out of curiosity, how much internet should I be using *per* task vs. drive space? Is it 2 gb per task which progressively gets smaller on the drive as the task moves along? Additionally I'm guessing that if an upload fails it would show up either a) in the tasks list on the website or b) in the boinc transfers window?
I have my Toshiba laptop connected to wireless internet and sometimes it can be a bit far from the router if I'm working on it. Will this be an issue?


thanks


Currently I have ten tasks on my Ryzen. disk usage for CPDN is 25.5GB. My laptop with just one task is currently using 3GB disk space. Uploads from memory are about 100MB for the zips on my Linux boxes. The only time this is a problem for me with bored as opposed to broad band is during zoom calls when I suspend internet activity for BOINC. I think there is also the facility to restrict the bandwidth used but as it is on the same machine I use for Zoom, I just make sure BOINC can't grab my bandwidth.
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63494 · Report as offensive     Reply Quote
Merowig

Send message
Joined: 27 May 15
Posts: 3
Credit: 109,766
RAC: 0
Message 63504 - Posted: 7 Feb 2021, 22:09:48 UTC - in response to Message 63493.  
Last modified: 7 Feb 2021, 22:18:44 UTC

If you're talking about batch 869, PNW, then it's been closed.
Sorry, but you took too long.

Just abort them.


Thanks
Deadline of the Work Units was End of April this year - didn't know they become invalid if someone is quicker - good to know.
ID: 63504 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7609
Credit: 24,240,330
RAC: 2,564
Message 63506 - Posted: 7 Feb 2021, 22:29:39 UTC - in response to Message 63504.  

The "deadline" that's often quoted, is just a very long time to stop BOINC from causing problems when shorter tasks from other projects are run at the same time.

The ones that I mentioned, are the real deadlines here.

And whether or not the long BOINC time is needed anymore is the subject of much discussion.
ID: 63506 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 177
Credit: 7,162,291
RAC: 11,114
Message 63508 - Posted: 8 Feb 2021, 4:30:37 UTC - in response to Message 59955.  

We (Pakistan, South Asia) have had the warmest winter that I have seen in the past 65 years. It might become a record of sorts. Let us say, no winter at all. Just Autumn, Spring type temps.
ID: 63508 · Report as offensive     Reply Quote
Peter Hucker of the Scottish B...

Send message
Joined: 9 Oct 20
Posts: 221
Credit: 1,789,184
RAC: 2,530
Message 63517 - Posted: 8 Feb 2021, 18:52:27 UTC - in response to Message 63506.  
Last modified: 8 Feb 2021, 18:53:56 UTC

The "deadline" that's often quoted, is just a very long time to stop BOINC from causing problems when shorter tasks from other projects are run at the same time.

The ones that I mentioned, are the real deadlines here.

And whether or not the long BOINC time is needed anymore is the subject of much discussion.
At the risk of repeating what's already been said, if the server lies to the client and user about when the work must be done, it won't be done on time. If it has to be done in say 2 months, then set that as the deadline. Then the client will do it more urgently if it's nearing that time, and the user can know if their computer is too slow (or not switched on often enough) and wasting time doing these big tasks. Because of this nonsense, Merowig has just pointlessly wasted electricity running tasks that are now cancelled.
ID: 63517 · Report as offensive     Reply Quote
Peter Hucker of the Scottish B...

Send message
Joined: 9 Oct 20
Posts: 221
Credit: 1,789,184
RAC: 2,530
Message 63518 - Posted: 8 Feb 2021, 18:53:06 UTC - in response to Message 63508.  

We (Pakistan, South Asia) have had the warmest winter that I have seen in the past 65 years. It might become a record of sorts. Let us say, no winter at all. Just Autumn, Spring type temps.
It's snowing here. The scientists obviously haven't told the clouds about the new regulations.
ID: 63518 · Report as offensive     Reply Quote
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 434
Credit: 19,406,456
RAC: 6,903
Message 63519 - Posted: 8 Feb 2021, 23:38:03 UTC - in response to Message 63493.  

Would it be possible to post when batches are closed because enough results are in for the researchers to work on?
ID: 63519 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7609
Credit: 24,240,330
RAC: 2,564
Message 63520 - Posted: 9 Feb 2021, 0:04:54 UTC - in response to Message 63519.  

Usually they're closed a long time after they're issued, to free up storage space.

Several (all?) of the AFLAME batches have just been closed. This started in April 2019.

More than enough time to crunch them.
ID: 63520 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 177
Credit: 7,162,291
RAC: 11,114
Message 63521 - Posted: 9 Feb 2021, 4:30:56 UTC - in response to Message 63518.  

We (Pakistan, South Asia) have had the warmest winter that I have seen in the past 65 years. It might become a record of sorts. Let us say, no winter at all. Just Autumn, Spring type temps.
It's snowing here. The scientists obviously haven't told the clouds about the new regulations.

______________________
Peter, what do scientists have to do with clouds? Just return us our snow. Himalayas, Hindu Kush, Pamirs, etc. A sort of drought or to confuse drouth, you can take whichever word you can understand. ;) p) This year will be the warmest on record.
.
ID: 63521 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7609
Credit: 24,240,330
RAC: 2,564
Message 63522 - Posted: 9 Feb 2021, 4:53:03 UTC - in response to Message 63521.  

And this thread is not about weather.
Personal observations about this should be in the Cafe section.
ID: 63522 · Report as offensive     Reply Quote
Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · Next

Message boards : Number crunching : Upload failures

©2022 climateprediction.net