climateprediction.net home page
Some kind of loop - no progress.

Some kind of loop - no progress.

Questions and Answers : Windows : Some kind of loop - no progress.
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile adrianxw
Avatar

Send message
Joined: 31 Aug 04
Posts: 145
Credit: 2,021,020
RAC: 816
Message 22038 - Posted: 14 Apr 2006, 17:07:17 UTC
Last modified: 14 Apr 2006, 17:11:31 UTC

Something has gone wrong with one of my CPDN wu\'s. It has not crashed, or rather, I don\'t think it has, but at the same time, it seems to be crashing once a second. Let me explain.

Earlier this year, I goofed whilst suspending and resuming projects. At one point, only CPDN was enabled, and of course, being a HT processor, BOINC trotted off and downloaded a second sulphur wu. Net result is I have 2. This is not a serious issue as both will complete well before their due date.

In order to keep the trickles flowing at a reasonable rate, I\'d taken to suspending one wu and crunching the other for a week or so, then reversing the set up. There are numerous other projects running on this system I should add, but CPDN is the highest CPU quota, 30% on a 3.2GHz p-IV HT Prescott, running Win XP and with the BOINC 5.2.13 core.

I\'d not touched CPDN today, but suddenly, this afternoon, my system went weird. The pointer kept flashing between the normal arrow, and the arrow with an hourglass, (is hourglass all one word?).

I assumed this was due to the crazyness going on with my testing of IE7 Beta 2, but no, killing IE7 did not help.

The task manager showed that sulphur_4.22_windows_intelx86.exe was running as normal, but \"the other program\" - you know what I mean, was popping up, running for less then a second then disappearing again. This was going on and on and on.

At the same time, perhaps not suprisingly, BOINC Manager was showing the task as \"Running\" but nothing was happening to the times.

I have rebooted of course, and it started doing exactly the same thing when it got a quantum.

The wu has been posting trickles for the last few days, (ignore the gaps, I was crunching another project almost 100% because my team manager wanted it done), and everything seemed to be great.

The wu has crunched 342:35:00 and was at 20.00% completion, although since the reboot, the times are right, but the % shows zero.

Obviously, I don\'t want to crash it unless I have too. I have a backup from roughly week ago, but it is a load of hassle to do that when you have a lot of other projects running.

Is there something here that is a familiar problem, (I did a quick search but could not find anything)? Is there something I can do?

I should add, I am quite happy reading/editing XML or whatever. Looking for a way forward.

Current status is I have suspended CPDN, and not attempted to try the other wu I have, (which has 1202:36:19/75.92% accrued), in case I damage it.

Sensible suggestions welcome.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 22038 · Report as offensive     Reply Quote
Profile Pooh Bear 27
Avatar

Send message
Joined: 5 Feb 05
Posts: 465
Credit: 1,914,189
RAC: 0
Message 22039 - Posted: 14 Apr 2006, 18:05:50 UTC

As for after a reboot the number showing 0%, but the times look correct, this is something that does happen. I am unsure how to explain the issue, except to say once it gets to a certain point, it will go back to normal. So if you were ~30% and 340+ hours, but now it\'s showing 0% and 340 some hours, you can actually tell approximately where you are if you view the graphic. You should be near the middle of the 2nd phase (Sulpher is 5 phases so each is approximately 20%).

In fact, by the time you read this all should be good again, I would suspect.

If not, let us know what it\'s looking like.

I have read in other forums that IE 7 is causing issues with some BOINC projects. I hope that this is not the case.


ID: 22039 · Report as offensive     Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 31 Aug 04
Posts: 145
Credit: 2,021,020
RAC: 816
Message 22040 - Posted: 14 Apr 2006, 18:37:12 UTC
Last modified: 14 Apr 2006, 18:37:38 UTC

I am aware of the 0% for a while after a reboot, I have seen this before, and it has always rectified itself shortly after the quantum starts.

The real issue is this loop though. If the application is starting and exiting again after less then a second, it will never sort itself out. I have not seen this behaviour before.

The unit was exactly 20.00% complete before this happened. I have CPDN suspended at the moment pending advice.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 22040 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 22041 - Posted: 14 Apr 2006, 21:27:31 UTC

OH dear! I do hope that you haven\'t meddled too much, or you\'ll lose it.

The constant \'flickering\' at 20%, is the model zipping up the hundreds of files at the end of phase 1. (5 phases X 20% = 100%)
Leave it alone until it finishes.
On my P4 3.2Ghz, this takes about 15-20 minutes, with NOTHING else running!

And if the model crashes on completing this, then it\'s one of the faulty batch from last December, which crashes at the very start of phase 2.
But phase one of a sulphur model contains extra info that\'s need for the TCMs.

And upgrade BOINC to version 5.2.13 before going any further. The Transient Coupled Models require a 5.* version, as they are a \'different kettle of fish\' to slabs and sulphurs.

And finally, some bad news. The project is out of models until after the Easter break.

ID: 22041 · Report as offensive     Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 31 Aug 04
Posts: 145
Credit: 2,021,020
RAC: 816
Message 22060 - Posted: 15 Apr 2006, 14:11:05 UTC
Last modified: 15 Apr 2006, 14:13:47 UTC

I resumed CPDN and wiggled the quotas to get it running straight away, it flickered for a while longer, then sat doing nothing visible for a while, (status was running but elapsed time was static), and then seemed to resume normally.

I am running 5.2.13 at this time on that machine.

Cheers Les.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 22060 · Report as offensive     Reply Quote

Questions and Answers : Windows : Some kind of loop - no progress.

©2024 climateprediction.net