climateprediction.net home page
hadcm3n failed at 1%

hadcm3n failed at 1%

Message boards : Number crunching : hadcm3n failed at 1%
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user683883

Send message
Joined: 18 Aug 12
Posts: 3
Credit: 0
RAC: 0
Message 47114 - Posted: 19 Sep 2013, 2:39:59 UTC
Last modified: 19 Sep 2013, 2:40:47 UTC

Is hadcm3n compatible with a Mac mini with OSX 10.5.8 and intel core 2 duo? I can't find any error message but I think the task I was running had a computation error at ~1%. The only trace I can find is in /Library/Application Support/BOINC Data/stdoutdae.txt:

18-Sep-2013 08:23:31 [climateprediction.net] Restarting task hadcm3n_83eu_1980_40_008462361_1 using hadcm3n version 607 in slot 0
18-Sep-2013 08:23:35 [climateprediction.net] Computation for task hadcm3n_83eu_1980_40_008462361_1 finished
18-Sep-2013 08:23:35 [climateprediction.net] Output file hadcm3n_83eu_1980_40_008462361_1_1.zip for task hadcm3n_83eu_1980_40_008462361_1 absent
18-Sep-2013 08:23:35 [climateprediction.net] Output file hadcm3n_83eu_1980_40_008462361_1_2.zip for task hadcm3n_83eu_1980_40_008462361_1 absent
18-Sep-2013 08:23:35 [climateprediction.net] Output file hadcm3n_83eu_1980_40_008462361_1_3.zip for task hadcm3n_83eu_1980_40_008462361_1 absent
18-Sep-2013 08:23:35 [climateprediction.net] Output file hadcm3n_83eu_1980_40_008462361_1_4.zip for task hadcm3n_83eu_1980_40_008462361_1 absent

Is there a way that I can find out what happened to the process? This is not the first time a task from climateprediction.net has failed on this computer, and they have always failed before the first trickle. PrimeGrid and Constellation work fine for me. Asteroids fails immediately. Any educated guesses as to whether the problem is with the task, or with my computer? Should I give up running Climateprediction.net tasks on this computer?

Thanks for your help.
ID: 47114 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,530,229
RAC: 6,501
Message 47115 - Posted: 19 Sep 2013, 3:21:18 UTC

Jane,

Your computers are hidden so we can't look at the tasks page, and the stderr listing on it. If you could link to that task/result or the computer, we could look at it in more depth.
ID: 47115 · Report as offensive     Reply Quote
old_user683883

Send message
Joined: 18 Aug 12
Posts: 3
Credit: 0
RAC: 0
Message 47116 - Posted: 19 Sep 2013, 3:59:08 UTC - in response to Message 47115.  

Jane,

Your computers are hidden so we can't look at the tasks page, and the stderr listing on it. If you could link to that task/result or the computer, we could look at it in more depth.


Sorry about that. This a link to the tasks on this computer:

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1288963

Jane
ID: 47116 · Report as offensive     Reply Quote
Profile Greg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 47117 - Posted: 19 Sep 2013, 4:38:58 UTC - in response to Message 47116.  

Hi Jane.

Code 193 is a bit of a catch-all error.

If you have upgraded BOINC from an older version, the upgrade may have caused a permissions problem. Probably not that, though.

The problem may be caused by not having selected the processing option "Leave tasks in memory while suspended". (BOINC's default is unselected, but that doesn't work well for CPDN.)

It's also best to set the limit of CPU use by other programs quite high, too -- i.e. to not suspend BOINC too frequently. (The work runs at low priority and Macs are good at prioritising work, so leaving BOINC to run mostly has no effect on other work. The exceptions are recording sound or editing movies, and some games.)

Both of those options are in the "Computing preferences..." menu option, available from Boinc Manager's "advanced" view.

It's also best to exclude the BOINC folder from backups, as the CPDN programs are "touchy" about other programs trying to access their files.

The Mac section of this Board may be a source of other things to try if those don't fix the problem, and it has a post detailing how to fix the permission problem.
ID: 47117 · Report as offensive     Reply Quote
old_user683883

Send message
Joined: 18 Aug 12
Posts: 3
Credit: 0
RAC: 0
Message 47128 - Posted: 20 Sep 2013, 3:05:40 UTC - in response to Message 47117.  

Greg,

Thank you for the advice. I did have "Leave tasks in memory while suspended?" checked, but I will raise "Suspend work if CPU usage is above" to 75%.

My computer tends to get hot, so I have set my processor usage to run all the time, but use at most 10% of the CPU. Do you think that may have contributed to this failure?

Jane
ID: 47128 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 47145 - Posted: 21 Sep 2013, 11:30:09 UTC

I don't think that limiting CPU use to 10% could affect the success of your climate models.

If people join CPDN through the Weather@home website or through Progress Thru Processors, their CPU usage will be set by default to 60%. That's in case people are running the project on laptops and don't realise that they need to take action to avoid overheating.

So limiting CPU usage is frequently used and BOINC is designed for this.

When you shut down your computer do you first suspend your tasks in BOINC Manager and then exit completely from BOINC? You can exit by right-clicking on the BOINC icon in the system tray, then selecting Exit. Not exiting from BOINC before computer shutdown will sooner or later cause the occasional model to crash.
Cpdn news
ID: 47145 · Report as offensive     Reply Quote
Profile Greg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 47153 - Posted: 23 Sep 2013, 9:50:44 UTC - in response to Message 47128.  

As Mo says, BOINC is designed to allow frequent suspension of work. I'm not so sure about all of CPDN's models, though. In my experience---this is of course anecdotal---Weather At Home models, and the now retired FAMOUS models are quite robust to frequent suspension; but the older HadCM3Ps, and now HadCM3Ns: not so much. It does seem to vary a lot between machines, though.

But HadCM3N seems to have the most trouble with disk contention: its files in use by other software when HadCM3N wants to write to them. Antivirus or backup software, usually. In the case of Macs, I guess that means Time Machine.

And I second Mo's advice about exiting from BOINC before shutting down the computer.
ID: 47153 · Report as offensive     Reply Quote

Message boards : Number crunching : hadcm3n failed at 1%

©2024 climateprediction.net