climateprediction.net home page
Computation error in Climate Prediction

Computation error in Climate Prediction

Questions and Answers : Windows : Computation error in Climate Prediction
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
old_user528049

Send message
Joined: 17 Jul 08
Posts: 2
Credit: 32,199
RAC: 0
Message 41931 - Posted: 8 Apr 2011, 4:56:58 UTC

All my jobs that start the ClimatePrediction are encased in a few seconds with the message "computation error ". This occurred after changing to a computer with AMD PhenomII Windows 7 Ultimate edition. Only problem with this occurs ClimatePrediction, with four other projects running on this same machine.
Excuse me for English, but this text is the result of translating Brazilian Portuguese into English by the translator of google.
ID: 41931 · Report as offensive     Reply Quote
Profile Greg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 41937 - Posted: 8 Apr 2011, 21:53:36 UTC - in response to Message 41931.  
Last modified: 8 Apr 2011, 21:54:02 UTC

It looks like a problem with access to the Boinc data folder, C:\ProgramData\BOINC.

Check the permissions on this folder and its sub-folders, Projects and Slots. The user "boinc_projects" should have "Full Control" permission.

Also, some virus checking software can cause this trouble. Check the virus checker settings are so that it allows all access to the C:\ProgramData\BOINC folder.

( From your task http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=12783932:
<core_client_version>6.10.60</core_client_version>
<![CDATA[
<message>
 - exit code -2 (0xfffffffe)
</message>
<stderr_txt>
Could not launch model process. Last Error=87
Regional yearly means requires 12 input files got 0
Called boinc_finish

</stderr_txt>
]]>
)

I hope this helps.
ID: 41937 · Report as offensive     Reply Quote
old_user528049

Send message
Joined: 17 Jul 08
Posts: 2
Credit: 32,199
RAC: 0
Message 43184 - Posted: 10 Oct 2011, 13:44:40 UTC

Sorry for the delay in responding, but only today I returned my attention to BOINC and tried again to release it to receive new tasks. However, it is not even able to receive tasks to perform. Are showing the following messages:

10/10/2011 09:34:42 | climateprediction.net | Sending scheduler request: Requested by user.
10/10/2011 09:34:42 | climateprediction.net | Requesting new tasks for CPU
10/10/2011 09:34:49 | climateprediction.net | Scheduler request completed: got 0 new tasks
10/10/2011 09:34:49 | climateprediction.net | Project has no tasks available

The folder C: \ ProgramData \ BOINC was no release of BOINC for some customers, as had been asked by the user Greg (thanks for the attention). Have full access to all liberated. I look forward guidance for this new situation presents itself.
ID: 43184 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 43185 - Posted: 10 Oct 2011, 13:56:33 UTC - in response to Message 43184.  
Last modified: 10 Oct 2011, 13:57:23 UTC

The answer is in the messages that you posted: Project has no tasks available

You can check the Server Status page in the blue menu to the left for this.

This project doesn't always have work.
There was some about a week ago, but it's all gone.
Backups: Here
ID: 43185 · Report as offensive     Reply Quote
mdc_on_ca

Send message
Joined: 30 Aug 11
Posts: 3
Credit: 1,226,132
RAC: 0
Message 43286 - Posted: 26 Oct 2011, 5:34:11 UTC

Hi... I have been running a 1100 hour climate prediction model since August 30 as well as a few other boinc projects. It has been running great until the other day, my son accidentally hit the power switch on my computer and the next morning the project is now listed as "computation error"... I was so disappointed. The task had less than 100 hours remaining. I was so looking forward to the completion. The workunit is: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=7629634

Is there any way to resume computation on it? It seems that 3.5 million seconds of my cpu time were wasted.

Also I have been unable to login to this website since Oct 20. The task would have completed by now.
mdc_on_ca
ID: 43286 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,380,160
RAC: 3,563
Message 43287 - Posted: 26 Oct 2011, 6:08:21 UTC - in response to Message 43286.  

If you have a backup it should be possible to restore the BOINC directory. See [http://www.climateprediction.net/board/viewtopic.php?t=5895] for how to back up and restore. However the data is useful even if the model does not complete so your computer's time has not been wasted.

Dave[
ID: 43287 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,053,321
RAC: 4,417
Message 43288 - Posted: 26 Oct 2011, 6:09:27 UTC - in response to Message 43286.  

I?m afraid that the short answer is no. Unless you have a recent backup made before the model crashed there is not way to resurrect it.

The project has been offline since Oct. 18 due to an attack on the servers. It now is back up and running.


ID: 43288 · Report as offensive     Reply Quote
Profile Randi
Avatar

Send message
Joined: 28 Jun 07
Posts: 31
Credit: 4,219,881
RAC: 1,427
Message 43316 - Posted: 29 Oct 2011, 20:33:07 UTC

I'm writing on behalf of Bunts (http://climateapps2.oerc.ox.ac.uk/cpdnboinc/hosts_user.php?userid=663630) an Old Weather team member. He had 1 wu complete successfully, but 2 failed with "Error while computing". Would someone please take a look and see if they can tell what the problem is?

I've run CPDN for about 4 years, with very few problems, but until now I never really looked at the details. Now that I am co-founder of a team I have been trying to learn more (reading my way through the forums), but I still have a LONG way to go.

Thanks
ID: 43316 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,380,160
RAC: 3,563
Message 43317 - Posted: 29 Oct 2011, 20:52:08 UTC - in response to Message 43316.  

Not obvious to me, but looking at the work unit from one of the crashed tasks, both the other tasks in the work unit also failed to complete. May not be too relevant as they were both on Darwin OS machines as opposed to Windows. If you click on the + next to Stderr you will get a list of the error messages. It doesn't seem to be any of the ones indicating that the model has produced a climate model outside of the allowed parameters e.g. a world with a negative atmospheric pressure.

Dave
ID: 43317 · Report as offensive     Reply Quote
Profile Randi
Avatar

Send message
Joined: 28 Jun 07
Posts: 31
Credit: 4,219,881
RAC: 1,427
Message 43319 - Posted: 29 Oct 2011, 21:42:03 UTC - in response to Message 43317.  

As far as I know, Bunts is running Windows XP.
ID: 43319 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,403,322
RAC: 5,085
Message 43320 - Posted: 30 Oct 2011, 0:42:37 UTC

The pertinent messages from stderr on both failed tasks appear to be:

forrtl: The requested operation cannot be performed on a file with a user-mapped section open.

forrtl: severe (38): error during write, unit 6, file C:\Documents and Settings\All Users\Application Data\BOINC\projects\climateprediction.net\hadam3p_eu_670t_2003_1_007471114\dataout\xaakg.out



Searching on the net for that error, a lot of the responses in diagnosing that error have to do with another program accessing the file while the main program (cpdn in this case) is trying to write to it. They suggest a possible antivirus conflict. You could suggest to your friend that he/she exclude the BOINC directories and subdirectories from virus scanning and see if that clears up the problem.
ID: 43320 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1079
Credit: 6,907,363
RAC: 6,402
Message 43321 - Posted: 30 Oct 2011, 0:55:05 UTC

... odd that it should be the same file each time.
ID: 43321 · Report as offensive     Reply Quote
Profile Randi
Avatar

Send message
Joined: 28 Jun 07
Posts: 31
Credit: 4,219,881
RAC: 1,427
Message 43323 - Posted: 30 Oct 2011, 7:30:46 UTC

Thanks a lot.
Would it also work if he shut down BOINC before scanning that directory?
ID: 43323 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,380,160
RAC: 3,563
Message 43328 - Posted: 30 Oct 2011, 10:05:53 UTC - in response to Message 43323.  

If BOINC shut down first it should work, it is I understand the conflict with BOINC trying to write to the file while the virus scanner is accessing it that causes the problem.

Dave
ID: 43328 · Report as offensive     Reply Quote
Profile Randi
Avatar

Send message
Joined: 28 Jun 07
Posts: 31
Credit: 4,219,881
RAC: 1,427
Message 43330 - Posted: 30 Oct 2011, 10:42:36 UTC

OK - Thanks Dave
ID: 43330 · Report as offensive     Reply Quote
mdc_on_ca

Send message
Joined: 30 Aug 11
Posts: 3
Credit: 1,226,132
RAC: 0
Message 43356 - Posted: 2 Nov 2011, 1:48:46 UTC - in response to Message 43286.  

Well now my computer is running 1900 to 1940, over 1100 hours to go high priority. I hope this one does not crash.
ID: 43356 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,053,321
RAC: 4,417
Message 43357 - Posted: 2 Nov 2011, 17:48:19 UTC - in response to Message 43356.  

The To Completion time of 1100 hours is to high. On my 1.5 GHz machine the CM3ns only take about 870 hours. Running on your 3.2 GHz machine these long WUs should much less.

ID: 43357 · Report as offensive     Reply Quote
mdc_on_ca

Send message
Joined: 30 Aug 11
Posts: 3
Credit: 1,226,132
RAC: 0
Message 43565 - Posted: 16 Dec 2011, 16:02:41 UTC - in response to Message 43357.  

I had 2 projects over 1000 hours. I made sure that everyday they had at least 50% of my CPU time. They were running great.

The First one is 1900_1940 hadcm3n_y8gw
It ran for about 500 hours and for about a week now it is stuck at 49.882% 07/04/1921 00:00 on the grapic, 484:27:26 hours of computing.
Usually, I can see the graphic updating every few minutes.
The Task is showing 589:02:xx and is still counting up.

Since the graphic and the task% are stuck has this project become corrupted?

I has another 1000 hour project running and also about a week ago around 50%. My computer ran a windows update, I suspended boinc tasks, shut boinc down and rebooted. When it came back up, the task showed corruption. WHen it updated, there was 2 more smaller hsdam3p tasks.

I am worried that I am wasting my cpu time running the stuck y8gw.

What should I do? Is there a way to check the dates inside the data files?

Thanks,
Mike
ID: 43565 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 43566 - Posted: 16 Dec 2011, 17:00:11 UTC - in response to Message 43565.  

Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1108, iMonCtr=1
Model crash detected, will try to restart...
Suspended CPDN Monitor - Suspend request from BOINC...
forrtl: Access is denied.

Probably done-in by a virus scan locking a file the CPDN program wants. (It is a super-computer program and doesn't expect competition for its resources and crashes if a conflict.) Our time-honored advice: best to set anti-virus program to skip boinc Data folder.

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 43566 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 43567 - Posted: 16 Dec 2011, 19:08:24 UTC

And, yes, abort that stuck model.

These models are rather sensitive to being interrupted around the time that they're collecting the created data to make a zip file to return to the project. This occurs every 25% of the way, although BOINC doesn't always show it as exactly this number.


Backups: Here
ID: 43567 · Report as offensive     Reply Quote
1 · 2 · Next

Questions and Answers : Windows : Computation error in Climate Prediction

©2024 climateprediction.net