climateprediction.net home page
Server Problem Fixed
Server Problem Fixed
log in

Advanced search

Message boards : Number crunching : Server Problem Fixed

1 · 2 · Next
Author Message
Profile JIM
Send message
Joined: 31 Dec 07
Posts: 1071
Credit: 18,636,429
RAC: 5,368
Message 56120 - Posted: 2 May 2017, 14:33:40 UTC

Yes! Trckles are now uploading again a finished tasks can report and clear. Good work.
____________

Profile Byron Leigh Hatch @ team Carl Sagan
Avatar
Send message
Joined: 17 Aug 04
Posts: 279
Credit: 43,022,075
RAC: 0
Message 56121 - Posted: 2 May 2017, 16:29:54 UTC

Downloads are failing.
I followed the advice given of: do not detach and reattach.
I'm getting the following message:

2017-05-02 6:50:02 AM | cpdnboinc | Requesting new tasks for CPU
2017-05-02 6:50:03 AM | | Project communication failed: attempting access to reference site
2017-05-02 6:50:03 AM | cpdnboinc | Scheduler request failed: Couldn't resolve host name
2017-05-02 6:50:05 AM | | Internet access OK - project servers may be temporarily down.
2017-05-02 6:59:17 AM | cpdnboinc | Sending scheduler request: To fetch work.
2017-05-02 6:59:17 AM | cpdnboinc | Requesting new tasks for CPU
2017-05-02 6:59:19 AM | cpdnboinc | Scheduler request failed: Couldn't resolve host name
2017-05-02 6:59:20 AM | | Project communication failed: attempting access to reference site
2017-05-02 6:59:22 AM | | Internet access OK - project servers may be temporarily down.
2017-05-02 7:28:22 AM | cpdnboinc | Sending scheduler request: To fetch work.
2017-05-02 7:28:22 AM | cpdnboinc | Requesting new tasks for CPU
2017-05-02 7:28:24 AM | climateprediction.net | Scheduler request completed: got 0 new tasks
2017-05-02 7:28:24 AM | climateprediction.net | Project has no tasks available
2017-05-02 8:29:03 AM | climateprediction.net | Sending scheduler request: To fetch work.
2017-05-02 8:29:03 AM | climateprediction.net | Requesting new tasks for CPU
2017-05-02 8:29:07 AM | climateprediction.net | Scheduler request completed: got 2 new tasks
2017-05-02 8:29:09 AM | climateprediction.net | Started download of hadcm3s_8142_201412_120_564_011003749.zip
2017-05-02 8:29:09 AM | climateprediction.net | Started download of 71rs_2014.ostart.gz
2017-05-02 8:29:31 AM | climateprediction.net | Temporarily failed download of hadcm3s_8142_201412_120_564_011003749.zip: connect() failed
2017-05-02 8:29:31 AM | climateprediction.net | Backing off 00:03:14 on download of hadcm3s_8142_201412_120_564_011003749.zip
2017-05-02 8:29:31 AM | climateprediction.net | Temporarily failed download of 71rs_2014.ostart.gz: connect() failed
2017-05-02 8:29:31 AM | climateprediction.net | Backing off 00:02:43 on download of 71rs_2014.ostart.gz
2017-05-02 8:29:31 AM | climateprediction.net | Started download of 71rs_2014.astart.gz
2017-05-02 8:29:31 AM | climateprediction.net | Started download of spec3a_sw_3_asol2c_hadcm3.gz
2017-05-02 8:29:32 AM | | Project communication failed: attempting access to reference site
2017-05-02 8:29:34 AM | | Internet access OK - project servers may be temporarily down.
2017-05-02 8:29:53 AM | climateprediction.net | Temporarily failed download of 71rs_2014.astart.gz: connect() failed
2017-05-02 8:29:53 AM | climateprediction.net | Backing off 00:02:16 on download of 71rs_2014.astart.gz
2017-05-02 8:29:53 AM | climateprediction.net | Temporarily failed download of spec3a_sw_3_asol2c_hadcm3.gz: connect() failed
2017-05-02 8:29:53 AM | climateprediction.net | Backing off 00:02:41 on download of spec3a_sw_3_asol2c_hadcm3.gz
2017-05-02 8:29:53 AM | climateprediction.net | Started download of spec3a_lw_3_asol2c_hadcm3.gz
2017-05-02 8:29:53 AM | climateprediction.net | Started download of waterfix.ancil.be.32.gz
2017-05-02 8:29:54 AM | | Project communication failed: attempting access to reference site
2017-05-02 8:29:55 AM | | Internet access OK - project servers may be temporarily down.

Profile Jeff Bakle
Send message
Joined: 24 Nov 05
Posts: 1
Credit: 1,220,885
RAC: 9
Message 56122 - Posted: 3 May 2017, 0:27:56 UTC

I was able to add the project back to my system. No work is currently available for my system at this time, but it is good to be back in the collective.
____________

Profile Randi
Avatar
Send message
Joined: 28 Jun 07
Posts: 30
Credit: 3,678,775
RAC: 2,062
Message 56123 - Posted: 3 May 2017, 3:00:33 UTC

The "do not detach and reattach" advice came too late for me.

Just now I reset and then removed CPDN and then I added it back.
It appears to be working correctly.
____________
Zooniverse Old Weather transcriber
and
Old Weather BOINC team member.

Kevin
Send message
Joined: 5 Jul 09
Posts: 63
Credit: 5,500,617
RAC: 0
Message 56124 - Posted: 3 May 2017, 4:20:02 UTC

My last task that reported after the server came back on line is showing as completed.

https://www.cpdn.org/cpdnboinc/result.php?resultid=20340872

3 tasks that finished while the backup server was running and were showing as completed have lost their trickles and are now showing as in progress.

https://www.cpdn.org/cpdnboinc/result.php?resultid=20350265
https://www.cpdn.org/cpdnboinc/result.php?resultid=20350827
https://www.cpdn.org/cpdnboinc/result.php?resultid=20345787

The trickles reported in the last task were reported before sever went down.
____________
Kevin

Profile Dave Jackson
Volunteer moderator
Send message
Joined: 15 May 09
Posts: 2286
Credit: 2,949,847
RAC: 1,471
Message 56125 - Posted: 3 May 2017, 5:53:53 UTC - in response to Message 56124.

Two of my tasks have sent trickles since things went back to normal but trickles sent before the alternative upload server went/was taken off line don't appear on the task pages. Won't know how this affects credit until the credit script is run.

Not overly worried about this as the information has always been retrieved and sorted eventually in the past. I know this is frustrating for those who keep a close tally on credits however.

Kevin
Send message
Joined: 5 Jul 09
Posts: 63
Credit: 5,500,617
RAC: 0
Message 56126 - Posted: 3 May 2017, 7:25:20 UTC - in response to Message 56125.


Not overly worried about this as the information has always been retrieved and sorted eventually in the past. I know this is frustrating for those who keep a close tally on credits however.


Not worried about credit, it should turn up eventually, it was just a gentle hint that something may need a quick look at:-)

Apart from that 3 of them are batch 561 which some were having problems with.
____________
Kevin

Profile Iain Inglis
Volunteer moderator
Send message
Joined: 16 Jan 10
Posts: 959
Credit: 1,964,466
RAC: 9,794
Message 56134 - Posted: 4 May 2017, 13:24:33 UTC

There's a new batch of 186 WAH2 PNW25/21 but none of my machines can download any because:

04/05/2017 14:06:51 | climateprediction.net | Not requesting tasks: some download is stalled

I'm tempted to abort the stalled downloads if there is no prospect of the stalled models being unstalled.

PS The WAH2 batch number 565 is duplicated with a small HADCM3S test batch on the backup site, but that's only a cosmetic problem.

Profile Dave Jackson
Volunteer moderator
Send message
Joined: 15 May 09
Posts: 2286
Credit: 2,949,847
RAC: 1,471
Message 56135 - Posted: 4 May 2017, 14:07:22 UTC - in response to Message 56134.

04/05/2017 14:06:51 | climateprediction.net | Not requesting tasks: some download is stalled


Was also wondering about aborting the stalled download task I have, though this machine doesn't have any stalled downloads and is now telling me no work is available so perhaps I should give it a bit more of a chance.

I had wondered if the reason mine wasn't downloading was why it had been abandoned by previous cruncher but on checking
https://www.cpdn.org/cpdnboinc//workunit.php?wuid=10996540


I see it got as far as producing three trickles. So still don't know how global an issue the stuck downloads is.

Kevin
Send message
Joined: 5 Jul 09
Posts: 63
Credit: 5,500,617
RAC: 0
Message 56136 - Posted: 4 May 2017, 14:39:21 UTC

I've had one stuck downloading for a couple of days, and its a _1

A couple of the servers have gone from the server status page so maybe they are still sorting things out.
____________
Kevin

Profile Byron Leigh Hatch @ team Carl Sagan
Avatar
Send message
Joined: 17 Aug 04
Posts: 279
Credit: 43,022,075
RAC: 0
Message 56137 - Posted: 4 May 2017, 15:12:18 UTC

I'm getting the same: I've had 2 downloads stalled for a couple of days now, and there both _2

2017-05-04 7:21:28 AM | climateprediction.net | update requested by user
2017-05-04 7:21:32 AM | climateprediction.net | Sending scheduler request: Requested by user.
2017-05-04 7:21:32 AM | climateprediction.net | Not requesting tasks: some download is stalled
2017-05-04 7:21:34 AM | climateprediction.net | Scheduler request completed

hadcm3s_831b_201412_120_564_011006242_2
hadcm3s_8142_201412_120_564_011003749_2

I was wondered if I should abort the stalled download task?
I think I will just wait, the weekend is not far away.

Profile JIM
Send message
Joined: 31 Dec 07
Posts: 1071
Credit: 18,636,429
RAC: 5,368
Message 56138 - Posted: 4 May 2017, 16:35:37 UTC

Have the same problem. I have 4 wah2_pnw25 downloads stalled in my transfer tab since last night.
____________

Profile Dave Jackson
Volunteer moderator
Send message
Joined: 15 May 09
Posts: 2286
Credit: 2,949,847
RAC: 1,471
Message 56139 - Posted: 4 May 2017, 16:48:23 UTC
Last modified: 4 May 2017, 16:52:48 UTC

Collating information from previous posts, this is affecting at least batches 563, 564 and 565. Will let project know.

keputnam
Send message
Joined: 31 Aug 04
Posts: 23
Credit: 2,857,049
RAC: 3,922
Message 56141 - Posted: 4 May 2017, 20:39:12 UTC - in response to Message 56139.

Add batch 486

I've had a download stalled for almost three days now

Profile Dave Jackson
Volunteer moderator
Send message
Joined: 15 May 09
Posts: 2286
Credit: 2,949,847
RAC: 1,471
Message 56142 - Posted: 4 May 2017, 20:53:50 UTC - in response to Message 56141.

Am becoming increasingly certain it is all work for download is stalling. That means I wont be aborting any tasks especially as most tasks are retreads at the moment meaning they may well be on their last chance.

Profile Alan K
Send message
Joined: 22 Feb 06
Posts: 263
Credit: 13,330,471
RAC: 10,279
Message 56143 - Posted: 4 May 2017, 22:15:31 UTC - in response to Message 56139.

Add batch 406 as well.

Dave Roberts
Send message
Joined: 15 Jan 11
Posts: 153
Credit: 4,678,734
RAC: 4,092
Message 56144 - Posted: 4 May 2017, 23:21:33 UTC

I had this problem three weeks ago. I posted under "New Work".

I eventually aborted all the tasks on three machines, given that the maintenance problems had become acute. they'd been hanging there for days.

Profile Dave Jackson
Volunteer moderator
Send message
Joined: 15 May 09
Posts: 2286
Credit: 2,949,847
RAC: 1,471
Message 56147 - Posted: 5 May 2017, 5:45:02 UTC

For me the question is whether it is configuration problems with individual batches where the wrong location is being pointed to for the files to be downloaded as has happened in the past or a global issue with the servers. As some of the tasks in question have at least got as far as downloading on to other computers previously my money is on the latter so I am not aborting any thing unless I hear from the project people either direct or via moderators that this should be done..

Profile Iain Inglis
Volunteer moderator
Send message
Joined: 16 Jan 10
Posts: 959
Credit: 1,964,466
RAC: 9,794
Message 56148 - Posted: 5 May 2017, 10:12:07 UTC - in response to Message 56147.

I agree, Dave: I've got 406, 499, 506, 561. It looks to me like an infrastructure problem somewhere. I would quite like to run some of the models, even though they're reissues, as they would help fill in some gaps in my cross-machine performance array. However, if they're never going to download then clearly they have to be aborted.

Profile Dave Jackson
Volunteer moderator
Send message
Joined: 15 May 09
Posts: 2286
Credit: 2,949,847
RAC: 1,471
Message 56149 - Posted: 5 May 2017, 12:10:29 UTC

Wednesday 9.30am project is being taken offline FOR IT TO to upgrade THE GPFS (General Parallel File System and not something to do with the Green Party as I first thought, that having dominated my other half's life over past weeks!) Uploads will be diverted to another server but subsetting server will be off line. It is anticipated this will take a day. - Not sure if that means 24 hrs or a working day.

1 · 2 · Next

Message boards : Number crunching : Server Problem Fixed


Main page · Your account · Message boards


Copyright © 2019 climateprediction.net