climateprediction.net home page
Problems after SAP merger into CPDN

Problems after SAP merger into CPDN

Message boards : Number crunching : Problems after SAP merger into CPDN
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
ChinookFoehn

Send message
Joined: 7 Aug 04
Posts: 83
Credit: 410,895
RAC: 0
Message 33817 - Posted: 16 May 2008, 20:40:52 UTC - in response to Message 33816.  

Chinook, that CPDN task is a HADCM and it successfully sent a trickle to a CPDN server less than 24 hours ago...

Yes, my problem problem is that the SAP task was transmogrified from reporting to SAP to CPDN which now gives me 2 CPDN tasks which seem to be, partially, interlinked.

The CPDN (HADCM) seems to update properly. The ex-SAP (HADAM) seems not to.
As I previously posted... updating either one results in only the ex-SAP (HADAM) issuing the update - to the CPDN site, not the SAP site from where it was issued. The ex-SAP (HADAM) task is not shown on my CPDN account, only on my SAP account.

Bizarrely, today, both tasks only go to the SAP site whereas, yesterday, both were going to the CPDN site - even though only climateprediction.net shows on the web-site button for both tasks.

If no-one has anything against it, I\'ll keep the ex-SAP (HADAM) suspended until someone figures out out to remove the cross-links or informs me I should abort it, and resume the CPDN (HADCM) task. At least the trickles are working, the issued credit is being reported to statistic sites, such as BOINCstats, even though no updated credit is showing up on this BOINC Manager any more (115871 here, 116182 BOINCstats). I have not checked on the other computer.

Hopefully, before the CPDN task is completed, the cross-linkage will be resolved or the SAP unit can be aborted so I can issue a No NEW Tasks command (only applies to the ex-SAP at the moment) and then reset the project once the task is finished. I also will not upgrade the BOINC Manager until that point as I can well imagine that would screw thing up royally - presently using v5.10.45.

If I read nothing to the contrary, I\'ll resume the CPDN (HADCM) task tomorrow (Saturday) or earlier if you concur.

-ChinookFöhn

ID: 33817 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 33818 - Posted: 16 May 2008, 20:53:41 UTC

Milo has done all that he can for this week.
The department housing the SAP server has shut down for the weekend and there\'s no access until Monday.

You\'re not the only one that received a model in the last couple of months, and the \"top\" 4 crunchers have dozens of models on their machines, which possibly don\'t get looked at often, so they might not be aware that there\'s a problem.

Best that I can say is: Good Luck.

ID: 33818 · Report as offensive     Reply Quote
ChinookFoehn

Send message
Joined: 7 Aug 04
Posts: 83
Credit: 410,895
RAC: 0
Message 33819 - Posted: 16 May 2008, 21:29:31 UTC - in response to Message 33818.  

Milo has done all that he can for this week... Good Luck.

Thanks for the thought but luck, I believe, won\'t have much to do with it, just analytical reasoning.

The ex-SAP unit isn\'t due until November so there are a few months of time to try to solve it. Don\'t know if it is worth it other than as an intellectual exercise and for the knowledge it would bring for if no solution becomes available next week, then aborting and re-issuing the units, likely, is a faster method of obtaining the results.

As for me, I am content to hold the the ex-SAP task until the end of July at which time, if a solution is found, then crunch it and something like Milky Way tasks, \'til it completes, and then the CPDN task with Milky Way until it completes, and then reset both CPDN and SAP on this computer.

Have a nice weekend.

-ChinookFöhn

ID: 33819 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 33820 - Posted: 16 May 2008, 22:35:07 UTC

The November \"due date\" is artifical; something HAS to be put into that field for the other projects, and this project uses one a long time into the future.
The REAL due date was last year, before the project person\'s thesis was written.
The models now slowly being completed will go into the collection with the others, for future researchers to use.

The only think that will happen if you go past November, will be a message saying that the model is overdue, and to consider aborting it. Which isn\'t necessary.

This applies to ALL climate models.

ID: 33820 · Report as offensive     Reply Quote
[B^S] sTrey
Avatar

Send message
Joined: 9 Jan 05
Posts: 30
Credit: 434,469
RAC: 0
Message 33823 - Posted: 17 May 2008, 1:51:25 UTC

My cpdn model has been suspended since before SAP broke, due to cpdn server problems and non-cpdn circs. SAP is now suspended on all my hosts, but it\'s still named climateprediction.net so if I force an update I still get the you\'re-already-attached-to-cpdn-please-detach messages.

So I wasn\'t clear if the redirect has truly been fixed... Does it appear safe to let normal cpdn models continue to run and contact the server, or do I need to wait for SAP to recover and the duplicate identity to go away first?
ID: 33823 · Report as offensive     Reply Quote
ChinookFoehn

Send message
Joined: 7 Aug 04
Posts: 83
Credit: 410,895
RAC: 0
Message 33826 - Posted: 17 May 2008, 2:50:57 UTC - in response to Message 33823.  

...So I wasn\'t clear if the redirect has truly been fixed...
Nope.
Does it appear safe to let normal cpdn models continue to run and contact the server, or do I need to wait for SAP to recover and the duplicate identity to go away first?

All I can recommend is that you do as I did and look at your CPDN task under your computer and see if it did a trickle since the attempt to merge SAP into CPDN. If it did/does, I would say yes.

If it hasn\'t, and it is a HADAM unit... I\'d recommend you wait \'til what is accomplished next week. Of course my advice could be totally in error.

As I\'ve read nothing to the contrary, I shall re-start my CPDN (HADCM) in the morning.

-ChinookFöhn

ID: 33826 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 33828 - Posted: 17 May 2008, 9:19:55 UTC


1) I was issued with a WU from SAP server this month.
2) Both my projects WU\'s have been effected.
What i can do but i don\'t have a spare machine, is to split both the projects from the combined folder into two different folders (can be done) then check up, but, alas will have to wait a month. Stuck i suppose :\'( 5 WU\'s of SAP and 3 160 year models? what a shame, i watch them grow, like i watch my kids grow.
Regards
Masud.
ID: 33828 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 33829 - Posted: 17 May 2008, 12:55:54 UTC


Thyme Lawn is experimenting with small edits to the project account XML file and client_state.xml to put things back where they should be. It seems to be fixing the problem on my affected PC, in any case...

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 33829 · Report as offensive     Reply Quote
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 33830 - Posted: 17 May 2008, 13:41:55 UTC
Last modified: 18 May 2008, 14:40:08 UTC

The redirect caused the project name and scheduler URL for SAP to be changed to those for the main CPDN project. After testing some ideas out with MikeMarsUK the following sequence will sort out the problem.

Edit: the instructions have been changed because they relied on forcing BOINC to do a master file fetch before sending another scheduler request. This didn\'t always happen, causing the request to go to the CPDN server instead of the SAP server and undo all of the changes.

The new instructions are here.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 33830 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 33831 - Posted: 17 May 2008, 14:11:22 UTC


Thyme i did as you suggested it started of Ok but reverted back.

5/16/2008 7:03:22 PM|CPDN Seasonal Attribution Project|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks
5/16/2008 7:03:27 PM|climateprediction.net|Scheduler request succeeded: got 0 new tasks
5/16/2008 7:03:27 PM|climateprediction.net|You used the wrong URL for this project
5/16/2008 7:03:27 PM|climateprediction.net|The correct URL is http://climateprediction.net/
5/16/2008 7:03:27 PM|climateprediction.net|You seem to be attached to this project twice
5/16/2008 7:03:27 PM|climateprediction.net|We suggest that you detach projects named climateprediction.net,
5/16/2008 7:03:27 PM|climateprediction.net|then reattach to http://climateprediction.net/
5/16/2008 7:03:27 PM|climateprediction.net|Already attached to a project named climateprediction.net (possibly with wrong URL)
5/16/2008 7:03:27 PM|climateprediction.net|Consider detaching this project, then trying again
5/16/2008 7:03:27 PM|climateprediction.net|Message from server: Invalid or missing account key. Visit this project\'s web site to get an account key.


Just to keep life simple, what if we consider these as crashed WU\'s and re-run from back up?
Regards
Masud.
ID: 33831 · Report as offensive     Reply Quote
ChinookFoehn

Send message
Joined: 7 Aug 04
Posts: 83
Credit: 410,895
RAC: 0
Message 33833 - Posted: 17 May 2008, 16:06:46 UTC

The same occurred with me other than...

WARNING! Do not restart any tasks if you have SAP tasks merged into CPDN.

as my original CPDN (HADCM) started up, started looking for HADAM data and errored out.

Of course I can not update the task to rid BOINC Manager of it as only the ex-SAP [HADAM] task issues updates.

I do not think it is worth trying to find a fix as the data must be corrupted.

Whether there is any value in obtaining the knowledge of how to correct this error when it seems obvious that the data in the intermingled work units is highly suspect.

That my CPDN task issued a correct trickle, once, after the fiasco must have been an anomaly.

I too vote for aborting all affected units and issuing a re-set project both to CPDN and SAP.

A shame but if the data is important, then it seems to me that it should be re-issued.

-Chinookföhn
ID: 33833 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 33834 - Posted: 17 May 2008, 17:02:36 UTC

Hi sTrey

It would be safer also to keep your CPDN model(s) suspended for the time being and crunch something else instead. If I were in your position I\'d back up the contents of the BOINC folder now, or certainly before restarting any climate models. And I wouldn\'t restart any of them until Milo tells us what the situation is on Monday.


Cpdn news
ID: 33834 · Report as offensive     Reply Quote
old_user170894
Avatar

Send message
Joined: 3 Mar 06
Posts: 96
Credit: 353,185
RAC: 0
Message 33835 - Posted: 17 May 2008, 18:52:32 UTC - in response to Message 33831.  


Thyme i did as you suggested it started of Ok but reverted back.


I think any manual change to client_state.xml reverts back unless you delete client_state_previous.xml before restarting BOINC. Could that be the reason Thyme Lawn\'s procedure doesn\'t seem to work?

I would wait for Thyme Lawn to verify my theory. I may be wrong.



ID: 33835 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 33836 - Posted: 17 May 2008, 19:11:43 UTC
Last modified: 17 May 2008, 19:12:37 UTC

Last year when my BBC model crashed with a \'Max CPU time exceeded\' message and I increased the fpops_bound figure in the xml file to give it more time to complete, I watched to see whether the figure reverted later. It didn\'t revert, so the client_state_previous file didn\'t need to be edited.

The procedure I used for editing the xml file is described here for Windows:

http://www.climateprediction.net/board/viewtopic.php?t=7215

The edit procedure has now been tested by several members (for that xml file of course) and it definitely works. I wonder whether members are omitting some step of the procedure and this is causing the edit to revert later? Or perhaps some edits revert and some don\'t.
Cpdn news
ID: 33836 · Report as offensive     Reply Quote
Profile Thyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 33837 - Posted: 17 May 2008, 20:26:57 UTC - in response to Message 33835.  
Last modified: 17 May 2008, 20:27:44 UTC

I think any manual change to client_state.xml reverts back unless you delete client_state_previous.xml before restarting BOINC. Could that be the reason Thyme Lawn\'s procedure doesn\'t seem to work?

client_state_prev.xml only comes into play if client_state.xml is corrupt. The fix failed for KAMasud (and probably chinooffoehn) because BOINC didn\'t issue the expected master file fetch before doing the scheduler update.

I\'ve asked them to try a modification to the fix and have posted a modified set of instructions here.

When I have confirmation that the modified fix works I\'ll modify the instructions here and on the SAP forum.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 33837 · Report as offensive     Reply Quote
ChinookFoehn

Send message
Joined: 7 Aug 04
Posts: 83
Credit: 410,895
RAC: 0
Message 33839 - Posted: 18 May 2008, 1:47:58 UTC - in response to Message 33837.  

...
I\'ve asked them to try a modification to the fix and have posted a modified set of instructions here.
...

The instructions did work, CPDN updated and SAP is back where it is supposed to be. The only difference I had, was that there was no change in my account_attribution_.cpdn.org.xml file.

Until I am informed otherwise, I am am leaving the SAP task suspended.
Alas, the CPDN task errored out and was lost.

A rather novel experience but had I my druthers...

Thank you Thyme Lawn for the correction and knowledge.

-ChinookFöhn

ID: 33839 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 33840 - Posted: 18 May 2008, 7:16:25 UTC


Hi Thyme, again did as you have suggested but the project name in Boinc Manager is still pointing towards CPDN Main? i.e. climateprediction.net. Even though i have edoted the name in account folder as per advise.
I backup twice a day, what if i replace Boinc folder with a clean backup folder. Should do the trick. I have not tried it as yet due to WU\'s from LHC and RS.

Hello Dagorath/ Seinfeld, at last you found peace at some project. LoL. Wonder, what magic the Mods did on you :).
To Mods. You all, are the real driving force behind these climate projects.
Regards
Masud.

ID: 33840 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 33842 - Posted: 18 May 2008, 7:56:24 UTC - in response to Message 33840.  


Hi Thyme, again did as you have suggested but the project name in Boinc Manager is still pointing towards CPDN Main? i.e. climateprediction.net. Even though i have edoted the name in account folder as per advise.
...


I think you need to edit the account name in both the project account file and also the client_state.xml file (in the attribution section). The same XML tag appears in both.


I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 33842 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 33843 - Posted: 18 May 2008, 9:11:07 UTC


Mike, i did make the changes in client_state.xml. The change in name occurs and every thing is Ok! until Boinc contacts the server. The name changes back? I open client_state.xml, it has reverted back.
It seems that somehow the climateprediction.net genuine folder is controlling the ex SAP folder. I suspend climate and it suspends the ex SAP project, while it has forgotten all about its own WU\'s. I have to suspend them individually.
Maybe Boinc is recording it some where? point me towards it and i can copy those contents, if that helps? Still, have a week of WU\'s from other projects, so no hurry.
Regards
Masud.
ID: 33843 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 33849 - Posted: 18 May 2008, 11:22:25 UTC

Thyme Lawn has modified his original instructions here:

http://www.climateprediction.net/board/viewtopic.php?p=76424#76424
Cpdn news
ID: 33849 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Problems after SAP merger into CPDN

©2024 climateprediction.net