climateprediction.net home page
Model uploaded/finished/reported, but still in progress on the web

Model uploaded/finished/reported, but still in progress on the web

Message boards : Number crunching : Model uploaded/finished/reported, but still in progress on the web
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Kevin

Send message
Joined: 5 Jul 09
Posts: 63
Credit: 5,500,617
RAC: 0
Message 55579 - Posted: 25 Jan 2017, 2:59:05 UTC - in response to Message 55574.  

Got another one:-(

https://www.cpdn.org/cpdnboinc/result.php?resultid=20117981

I have suspended the WU at 99.700%, so still have files on computer.


I have not looked at the files in the CPND folder, I do not know what is worth looking at anyway.

But looking at boinc manager 'statistics' there was no communication between my machine and CPND since the 23rd, the date of the last trickle, I have another WU suspended not yet started so this machine would not be asking for work.

It looks as if for some reason the WU is either not generating the trickles or boinc is not sending them and not showing them as unsent.

I am going to leave everything as is until early evening, then I shall allow that WU to complete, abort the other WU and detach and re-attach to try to clear this problem.

Les, this could be the reason that nothing is showing up on the server logs.
Kevin
ID: 55579 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 6961
Credit: 20,843,205
RAC: 0
Message 55580 - Posted: 25 Jan 2017, 3:09:51 UTC - in response to Message 55579.  
Last modified: 25 Jan 2017, 3:11:31 UTC

The checking of the server logs was last year, after I and a couple of others found the original problem.

We're currently in the thinking about it phase.

And it doesn't affect everything. All of mine since are being shown normally.
ID: 55580 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 6961
Credit: 20,843,205
RAC: 0
Message 55581 - Posted: 25 Jan 2017, 7:26:36 UTC - in response to Message 55579.  

abort the other WU

The problem is suspected to be something to do with the server, so if you abort that task, we'll never know what would have happened to it.
ID: 55581 · Report as offensive     Reply Quote
Kevin

Send message
Joined: 5 Jul 09
Posts: 63
Credit: 5,500,617
RAC: 0
Message 55583 - Posted: 25 Jan 2017, 8:58:23 UTC - in response to Message 55581.  

abort the other WU

The problem is suspected to be something to do with the server, so if you abort that task, we'll never know what would have happened to it.


That would be

https://www.cpdn.org/cpdnboinc/result.php?resultid=20118325

which has not been started.

If I run the above WU it will cost about a week with no guarantee that It will be completed properly.

The almost completed WU I will allow to finish later tonight, The reason I have suspended it was so that if you wanted copies of any of the files that are deleted after completion then they are available now and can be copied and e-mailed to wherever required.
Kevin
ID: 55583 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 344
Credit: 9,943,922
RAC: 25
Message 55860 - Posted: 6 Mar 2017, 10:23:23 UTC
Last modified: 6 Mar 2017, 10:23:41 UTC

Here is a wah2_afr50_t0uu_201412_13_501 WU that finished and reported 11h ago, but still in progress.
ID: 55860 · Report as offensive     Reply Quote
boinc127

Send message
Joined: 5 Apr 14
Posts: 10
Credit: 263,917
RAC: 0
Message 55864 - Posted: 6 Mar 2017, 13:41:12 UTC

I have a WU that has finished successfully but still shows up as in progress (and some of the trickles are missing), but as long as its just a server side issue and not lost work I don't mind...

wah2_eas50_g4xm_201412_10_506_010872315_0
ID: 55864 · Report as offensive     Reply Quote
boinc127

Send message
Joined: 5 Apr 14
Posts: 10
Credit: 263,917
RAC: 0
Message 55923 - Posted: 17 Mar 2017, 18:40:31 UTC

I have a workunit that completed 10 days ago that still shows as new (in progress) although it did upload its final trickle on the 6th of March...

wah2_eas50_g4xm_201412_10_506_010872315_0
ID: 55923 · Report as offensive     Reply Quote
boinc127

Send message
Joined: 5 Apr 14
Posts: 10
Credit: 263,917
RAC: 0
Message 55924 - Posted: 17 Mar 2017, 18:52:24 UTC - in response to Message 55923.  

To add to it the log from BOINC:

06-Mar-2017 04:50:16 [climateprediction.net] Started upload of wah2_eas50_g4xm_201412_10_506_010872315_0_r448371534_10.zip
06-Mar-2017 04:51:42 [climateprediction.net] Finished upload of wah2_eas50_g4xm_201412_10_506_010872315_0_r448371534_10.zip
06-Mar-2017 04:52:20 [climateprediction.net] Started upload of wah2_eas50_g4xm_201412_10_506_010872315_0_r448371534_out.zip
06-Mar-2017 04:52:21 [climateprediction.net] Computation for task wah2_eas50_g4xm_201412_10_506_010872315_0 finished

06-Mar-2017 04:52:25 [climateprediction.net] Finished upload of wah2_eas50_g4xm_201412_10_506_010872315_0_r448371534_out.zip
ID: 55924 · Report as offensive     Reply Quote
Kevin

Send message
Joined: 5 Jul 09
Posts: 63
Credit: 5,500,617
RAC: 0
Message 55932 - Posted: 19 Mar 2017, 10:53:00 UTC

Another one

https://www.cpdn.org/cpdnboinc/result.php?resultid=20331106

19/03/2017 02:02:55 | climateprediction.net | Started upload of wah2_eas50_g99u_201512_10_506_010877939_1_r1991005778_10.zip
19/03/2017 02:04:40 | climateprediction.net | Computation for task wah2_eas50_g99u_201512_10_506_010877939_1 finished
19/03/2017 02:04:42 | climateprediction.net | Started upload of wah2_eas50_g99u_201512_10_506_010877939_1_r1991005778_out.zip
19/03/2017 02:04:49 | climateprediction.net | Finished upload of wah2_eas50_g99u_201512_10_506_010877939_1_r1991005778_out.zip
19/03/2017 02:05:01 | climateprediction.net | Finished upload of wah2_eas50_g99u_201512_10_506_010877939_1_r1991005778_10.zip


Don't know if it has got anything to do with it but but the out trickle finished uploading before the final trickle?
Kevin
ID: 55932 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 979
Credit: 3,103,609
RAC: 47
Message 55935 - Posted: 20 Mar 2017, 0:31:58 UTC - in response to Message 55932.  
Last modified: 20 Mar 2017, 0:32:29 UTC

...Don't know if it has got anything to do with it but but the out trickle finished uploading before the final trickle?

The "out" Zip is a small file containing run logging data. That data used to be added to each Zip file but is now uploaded separately, at the end of the run. If the maximum number of simultaneous file transfers is set at the default value (i.e. > 1) then uploads will take place in parallel, which explains why the small "_out.zip" file finishes uploading before the larger "_10.zip" file in your log.
ID: 55935 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2472
Credit: 3,125,721
RAC: 304
Message 55937 - Posted: 20 Mar 2017, 8:36:07 UTC - in response to Message 55935.  

If the maximum number of simultaneous file transfers is set at the default value (i.e. > 1) then uploads will take place in parallel,


I have so I can see better what is going on, set maximum number of file transfers/project in cc_config.xml to 1. If network activity has been suspended when for instance there has been a problem at the project servers, when it is resumed, the order files upload in might still not be the expected one.
ID: 55937 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 6961
Credit: 20,843,205
RAC: 0
Message 55938 - Posted: 20 Mar 2017, 9:13:27 UTC

If you turn off Network access before the model has finished, you can see the order in which the files get created.
I think that some of the data in the out file may get deleted when the data is zipped, so it needs to be gathered first. Which means that the out file gets returned first.
ID: 55938 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 344
Credit: 9,943,922
RAC: 25
Message 56140 - Posted: 4 May 2017, 19:15:49 UTC
Last modified: 4 May 2017, 19:16:08 UTC

It seems many WUs that have finished during the blackout are still in progress. In my case around 20 WUs are gone from my machines but are up on the web and with partial trickles listed. I also see that the WUs in progress on the server status page go down rather slowly so I might not be the only one with finished WUs that look unfinished.
ID: 56140 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 344
Credit: 9,943,922
RAC: 25
Message 56208 - Posted: 12 May 2017, 13:19:04 UTC

Any news from staff about these WUs?
ID: 56208 · Report as offensive     Reply Quote
Kevin

Send message
Joined: 5 Jul 09
Posts: 63
Credit: 5,500,617
RAC: 0
Message 56210 - Posted: 12 May 2017, 14:45:35 UTC - in response to Message 56140.  

It seems many WUs that have finished during the blackout are still in progress.


I have 3.
Kevin
ID: 56210 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 20 Dec 14
Posts: 23
Credit: 2,348,964
RAC: 0
Message 56211 - Posted: 12 May 2017, 15:02:15 UTC

Here is one more task whose completion was reported but now lost: https://www.cpdn.org/cpdnboinc/result.php?resultid=20378119
ID: 56211 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 130
Credit: 48,738,424
RAC: 117
Message 56212 - Posted: 12 May 2017, 17:38:41 UTC

I had several computers with no work units on them but the web showed that I did (after the web was restored).

I removed the project from my computer(s) and then added the project back in.

All of the work units then showed as "abandoned" and were immediately sent to other users.

If the website thinks you are working on them but you aren't, nothing will ever happen until they expire a year from now.
ID: 56212 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 344
Credit: 9,943,922
RAC: 25
Message 56213 - Posted: 12 May 2017, 18:10:36 UTC - in response to Message 56212.  

I had several computers with no work units on them but the web showed that I did (after the web was restored).

I removed the project from my computer(s) and then added the project back in.

All of the work units then showed as "abandoned" and were immediately sent to other users.

If the website thinks you are working on them but you aren't, nothing will ever happen until they expire a year from now.


Yeah, the problem with these WUs is two fold:
1. They have been successfully computed and uploaded, hence no need to run them again
2. There is potentially I huge pile of them waiting for user's micromanagement like detach-reattach. So these WUs are stuck in void unless staff fixes the issue.
ID: 56213 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 130
Credit: 48,738,424
RAC: 117
Message 56215 - Posted: 12 May 2017, 21:00:36 UTC

berbard_ivo -

I am pretty sure that many, if not all, of the tasks that I "abandoned" using my "detach project/attach project" procedure I never did any processing on. I haven't received any tasks in maybe 4 or 6 weeks, but the website shows I did.

So, in the interest of science, I "abandoned" them and let someone else complete them for the scientists.
ID: 56215 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 344
Credit: 9,943,922
RAC: 25
Message 56216 - Posted: 12 May 2017, 21:31:30 UTC - in response to Message 56215.  

berbard_ivo -

I am pretty sure that many, if not all, of the tasks that I "abandoned" using my "detach project/attach project" procedure I never did any processing on. I haven't received any tasks in maybe 4 or 6 weeks, but the website shows I did.


Then this is another matter. Here we report only WUs that have been uploaded/finished/reported, but still in progress on the web.
ID: 56216 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Model uploaded/finished/reported, but still in progress on the web

©2019 climateprediction.net