climateprediction.net home page
wah tasks failed

wah tasks failed

Message boards : Number crunching : wah tasks failed
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Andrew Sanchez
Avatar

Send message
Joined: 28 May 14
Posts: 34
Credit: 705,936
RAC: 0
Message 52516 - Posted: 10 Sep 2015, 19:19:48 UTC
Last modified: 10 Sep 2015, 19:26:57 UTC

First wah task on each of my pcs erred. Is this normal for these tasks? I do have a wah task on each pc that seems to be running alright for now but it was weird that each original task failed on each computer.

I remember when i first joined the project we would have a lot of errors because the models weren't accurate. Is that what is happening with these wah tasks or is it my PCs or settings that are screwed up?

NOTE: i'm just talking about the wah tasks. The errors on my tasks page from last month are from a bad BOINC installation on the new PC and interrupted tasks from BOINC upgrade on the old PC.
ID: 52516 · Report as offensive     Reply Quote
Profile ritterm
Avatar

Send message
Joined: 29 May 08
Posts: 128
Credit: 6,289,876
RAC: 0
Message 52517 - Posted: 10 Sep 2015, 20:30:46 UTC - in response to Message 52516.  

Andrew Sanchez wrote:
First wah task on each of my pcs erred...

Interesting. The first one on one of my PCs did the same things (the other two were okay). I had the same kind of error as you had -- a series of what look like upload failures similar to the following:

upload failure: <file_xfer_error>
<file_name>wah2_eu2_g79c_1967_1_010161324_0_1.zip</file_name>
<error_code>-161 (not found)</error_code>
</file_xfer_error>


ID: 52517 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 52518 - Posted: 10 Sep 2015, 20:46:35 UTC

Most of mine are running but a few failed. I'm working-up an email to staff.

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 52518 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 52519 - Posted: 10 Sep 2015, 20:50:24 UTC

<error_code>-161 (not found)</error_code>

That's not a failure to upload. It's a failure (by BOINC), to find the zip file to upload it.
The usual reason for this, is that the model crashed before that zip could be created.

First guess in the case of both posters is that it's a computer problem.

We'll know more when the first zips/trickles arrive, or more fail to do so.

************

While previewing this, I see that astroWX has posted. So more info coming in now.


ID: 52519 · Report as offensive     Reply Quote
Andrew Sanchez
Avatar

Send message
Joined: 28 May 14
Posts: 34
Credit: 705,936
RAC: 0
Message 52520 - Posted: 10 Sep 2015, 21:32:26 UTC
Last modified: 10 Sep 2015, 21:44:24 UTC

Just got another failure on my older laptop. That is 2 failed tasks on that laptop (computer name "Andy") and 1 fail so far on the new laptop ("Beats").
Seems to be a problem with the zip file each time.
Watching a movie on Beats right now so task is suspended at 1.189% for now.

Downloaded 3 more tasks on Andy and 2 of them are running. We'll see how they go...
ID: 52520 · Report as offensive     Reply Quote
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 484
Credit: 29,579,234
RAC: 4,572
Message 52521 - Posted: 10 Sep 2015, 22:51:10 UTC - in response to Message 52520.  

4 WAH tasks running at the moment but no trickles/zips uploaded yet.
ID: 52521 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 925
Credit: 34,100,818
RAC: 11,270
Message 52522 - Posted: 10 Sep 2015, 23:06:07 UTC - in response to Message 52520.  

Just got another failure on my older laptop. That is 2 failed tasks on that laptop (computer name "Andy") and 1 fail so far on the new laptop ("Beats").
Seems to be a problem with the zip file each time.
Watching a movie on Beats right now so task is suspended at 1.189% for now.

Downloaded 3 more tasks on Andy and 2 of them are running. We'll see how they go...

We can't see your computer names. so we'll have to guess which is which. But I see that all computers on your account have been upgraded to Windows 10 - this might possibly be significant. The project staff are going to check in the morning whether there is a significant correlation across the database between running Windows 10 and these new task failures.
ID: 52522 · Report as offensive     Reply Quote
candido

Send message
Joined: 15 Nov 10
Posts: 43
Credit: 6,118,949
RAC: 0
Message 52523 - Posted: 10 Sep 2015, 23:32:24 UTC - in response to Message 52522.  

all of the tasks I ran failed after 3 or 4 minutes running
It's a windows 10 laptop


ID: 52523 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 52524 - Posted: 11 Sep 2015, 0:02:25 UTC - in response to Message 52523.  

Thanks for that. Knowing how far into the work they were is useful.

ID: 52524 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 52525 - Posted: 11 Sep 2015, 0:32:09 UTC

A look ahead:

Not only do these tasks have a covey of large downloads, my first .zip upload is 69.04 MB! (This won't do my DSL link any good . . .)


"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 52525 · Report as offensive     Reply Quote
Profile ritterm
Avatar

Send message
Joined: 29 May 08
Posts: 128
Credit: 6,289,876
RAC: 0
Message 52526 - Posted: 11 Sep 2015, 0:59:11 UTC - in response to Message 52519.  
Last modified: 11 Sep 2015, 1:02:00 UTC

Les Bayliss wrote:
...First guess in the case of both posters is that it's a computer problem...

I'm willing to concede that the error my PC generated was computer related -- perhaps some transient condition or just bad luck. This same host is 8-10 hours into 4 other WAH tasks with no apparent problems (that I know of). My lone Win10 PC seems to be blissfully grinding away, 6-7 hours into two of its WAHs.
ID: 52526 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,053,321
RAC: 4,417
Message 52527 - Posted: 11 Sep 2015, 1:42:30 UTC
Last modified: 11 Sep 2015, 1:45:56 UTC

The new WAH2 tasks seem to be running fine on my Win7 machine. Three of them are at 4.4% and 16+ hours. No problems. No graphics so no data on s/TS. No trickles yet.

Does anyone know how many zip files these tasks produce.
ID: 52527 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 52528 - Posted: 11 Sep 2015, 2:12:44 UTC - in response to Message 52527.  

Jim

The failed tasks say 13 zips.

*****************

ritterm

There's now failures on Windows 7 and Windows 10.
Also a "signal 11" and a "model crashed".
So, it looks random. For now.

As Richard said, the project people are going to dig into it tomorrow.
And then we have the weekend. (Surprise!)

*****************

astro

Linux uploads for hadam3prm3pm2t_eu variety are ~72 Megs, and ~92 for zip 13.

Ahhh the joys of big stash files. Computing power is sort of keeping up with the researchers wants, but not cheap 'net speeds.

Although, while my phone line was dead, (wires broken in 2 spots), I got hold of a gadget with WiFi input, and output to the nearest 4G phone tower.
Now THAT's fast. But expensive. I'm only using it now for my day to day activity, which means I don't get held up because BOINC is hogging the land line connection with zip uploads.
Which it's doing right now, and will be for another 45 minutes or so.


ID: 52528 · Report as offensive     Reply Quote
KWSN Sir Clark

Send message
Joined: 8 Jul 05
Posts: 33
Credit: 1,274,211
RAC: 0
Message 52529 - Posted: 11 Sep 2015, 2:22:35 UTC

Just had the one WU but it errored out on Win 7 64bit with the 13 absent zip file errors


ID: 52529 · Report as offensive     Reply Quote
ed2353

Send message
Joined: 15 Feb 06
Posts: 137
Credit: 33,347,857
RAC: 0
Message 52530 - Posted: 11 Sep 2015, 9:34:24 UTC - in response to Message 52529.  

Running 3 WAH2 WUs at present on Windows 10 64 bit.
All sent their first ZIP with no problems.
ID: 52530 · Report as offensive     Reply Quote
Profile ritterm
Avatar

Send message
Joined: 29 May 08
Posts: 128
Credit: 6,289,876
RAC: 0
Message 52531 - Posted: 11 Sep 2015, 10:38:55 UTC

How often should trickles be uploaded with the WAHs? I have 9 tasks across three hosts that have been running for 15-20 hours and I see no trickles logged. And, I see no upload attempts in the BOINC message logs.
ID: 52531 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,380,160
RAC: 3,563
Message 52532 - Posted: 11 Sep 2015, 10:45:34 UTC - in response to Message 52531.  

Not sure but for me a lot of tasks only trickle once a day or twice at the most these days. I am sure that someone will have an answer soon and be able to say how long it took them with their hardware for comparison.
ID: 52532 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,380,160
RAC: 3,563
Message 52533 - Posted: 11 Sep 2015, 10:47:34 UTC

I notice these tasks have all gone and the number of tasks in progress hasn't gone up enough to account for this. Have they been recalled?
ID: 52533 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 925
Credit: 34,100,818
RAC: 11,270
Message 52534 - Posted: 11 Sep 2015, 11:14:03 UTC - in response to Message 52531.  

How often should trickles be uploaded with the WAHs? I have 9 tasks across three hosts that have been running for 15-20 hours and I see no trickles logged. And, I see no upload attempts in the BOINC message logs.

With the great variation in computer speeds, it's probably best to answer that in terms of progress made, rather than absolute time.

With 12 'simulation months' to be completed by each model, the trickle+upload pair should be generated around 8.3%, 16.6%, 25% ... progress. My leader is still only at 4.064% (after 21 hours), so it would be some time before I could fill in the third decimal place for the actual moment when it happens.
ID: 52534 · Report as offensive     Reply Quote
Andrew Sanchez
Avatar

Send message
Joined: 28 May 14
Posts: 34
Credit: 705,936
RAC: 0
Message 52535 - Posted: 11 Sep 2015, 12:09:38 UTC - in response to Message 52522.  
Last modified: 11 Sep 2015, 12:10:10 UTC


We can't see your computer names. so we'll have to guess which is which. But I see that all computers on your account have been upgraded to Windows 10 - this might possibly be significant. The project staff are going to check in the morning whether there is a significant correlation across the database between running Windows 10 and these new task failures.


I didn't have any failures over night. Got one task on the AMD laptop running at 3.368% and 1 task of the Intel laptop running at 1.822%.
I think the failures yesterday all happened below 2% completion.
ID: 52535 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : wah tasks failed

©2024 climateprediction.net