climateprediction.net home page
Comments for \'Generic solutions to models\' sticky

Comments for \'Generic solutions to models\' sticky

Questions and Answers : Windows : Comments for \'Generic solutions to models\' sticky
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 6 · Next

AuthorMessage
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 21068 - Posted: 5 Mar 2006, 12:21:51 UTC
Last modified: 2 Oct 2007, 8:19:12 UTC


This is the comments thread for the \'generic solutions\' sticky post. Please post any queries, suggestions, and so forth here.

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 21068 · Report as offensive     Reply Quote
old_user128774

Send message
Joined: 4 Dec 05
Posts: 1
Credit: 49,802
RAC: 0
Message 21178 - Posted: 9 Mar 2006, 20:44:37 UTC

* Before playing games or other heavy duty applications (high CPU or memory usage), set \'no more work\' against the project and \'suspend\' the model - that way they won\'t tread on each other\'s toes. Sometimes simultaneous use of graphics drivers from two different programs seems to cause problems.
...
If any of these suggestions succeeded in helping, please add a note to this thread so that people know which are the best suggestions.


This one worked for me.

I found that my CP project started playing up after I ran GoogleEarth, which is pretty hungry for CPU, memory and graphics resources. It resulted in the sulphur_um_4.22_windows_intelx86.exe process constantly spawning a new process which closed almost immediately. I had to set the project to Run Always to fix it, but even then it reset from 40% to 0% and took a while to sort itself out properly.

Next time I ran GoogleEarth I suspended the CP project first and it Resumed quite happily afterwards.
ID: 21178 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 21598 - Posted: 25 Mar 2006, 21:11:17 UTC

From the BBC Boards:

Martin Smith
UPDATE: SUCCESS (so far). I followed all the advice (am even taking backups!) and the latest experiment is chugging along (up to 1938 now). Just wanted to provide feedback in case other users are having issues -- The advice worked for me -- thanks to all those that provided help.

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 21598 · Report as offensive     Reply Quote
old_user87359

Send message
Joined: 12 Jul 05
Posts: 1
Credit: 0
RAC: 0
Message 21622 - Posted: 26 Mar 2006, 23:43:58 UTC - in response to Message 21068.  

If you have multiple processors and therefore might have several Climate Prediction work processes, do not run both graphics. This causes at least one work to crash and you have to start all over agin.
ID: 21622 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 22081 - Posted: 16 Apr 2006, 12:20:35 UTC

error code 99 AKA the \'killer trickle\'

This is a remote kill command sent to shut down models which contain errors. Many of these have been sent out today (April 16th), for the reason why please see the announcement :

http://www.climateprediction.net/board/viewtopic.php?t=4697

Please do not restore models shut down in this way.


I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 22081 · Report as offensive     Reply Quote
John McLeod VII
Avatar

Send message
Joined: 5 Aug 04
Posts: 172
Credit: 4,023,611
RAC: 0
Message 22188 - Posted: 19 Apr 2006, 2:13:38 UTC

I have noticed one thing about the crash during heavy CPU usage. The CPDN application is split into two pieces - one that uses almost no CPU (M if I recall correctly) and one that uses all of the available CPU (UM if I recall). In any case, on a single CPU system with one CPDN WU running, multiple UM processes were started, I believe that if this could be prevented, this crash would stop happening.

The only time that CPDN crashed on that machine, multiple CPDN results were started for the same result. I believe that if the first thing that happened during the execution of UM were to create a mutex based on the name of the result, then this crash would stop.

Let us know when it has been fixed.


BOINC WIKI
ID: 22188 · Report as offensive     Reply Quote
Profile gandhi

Send message
Joined: 5 Aug 04
Posts: 22
Credit: 7,271,105
RAC: 0
Message 22289 - Posted: 22 Apr 2006, 10:41:12 UTC

hi there,

as it was told in a linked thread, i tryed the solution with the updated graphics-driver.

my mashine produced some -1073741819 (0xc0000005) errors, but prime95 was stable over 24hours.
a new driver (catalyst 5.9 (the last without ccc) for the Radeon7000) solved the problem (as it seems).

but: what the heck does the gravicsdriver to cdpn?
and why does this error appears just now?


greetings, Micha
ID: 22289 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 23071 - Posted: 9 Jun 2006, 9:40:03 UTC

There are no child processes to wait for. (0x80) - exit code 128 (0x80)

There is an announcement on the Boinc news site :

Microsoft Windows has a component called DirectX that manages graphics and sound. If you are running BOINC on Windows and you don\'t have a recent version (9.0c or later) of DirectX, applications may crash with error messages like
There are no child processes to wait for. (0x80) - exit code 128 (0x80)

If this is happening, or you having other problems involving graphics, you may want to check your DirectX version.
If your version of DirectX is older than 9.0c (or if you are unable to check your version) then you may want to download the current version of DirectX.


I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 23071 · Report as offensive     Reply Quote
old_user132880

Send message
Joined: 8 Dec 05
Posts: 21
Credit: 215,749
RAC: 0
Message 23584 - Posted: 11 Jul 2006, 15:04:15 UTC

Hmmm after 45% or so, my hadcm crashed....The PC wasn\'t being used any differently and I do have BOINC excluded from Norton.....
ID: 23584 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 23585 - Posted: 11 Jul 2006, 16:54:13 UTC - in response to Message 23584.  

Hmmm after 45% or so, my hadcm crashed....The PC wasn\'t being used any differently and I do have BOINC excluded from Norton.....


Yep, it was a -161 error. Did you work through Mike\'s recommendations (first post in this Thread)?

Hope you had a recent backup, that\'s a large investment to lose...

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 23585 · Report as offensive     Reply Quote
Profile Dr_Mabuse

Send message
Joined: 21 Feb 05
Posts: 24
Credit: 991,032
RAC: 0
Message 23643 - Posted: 17 Jul 2006, 14:16:49 UTC - in response to Message 21627.  

about the 161 error :
It is a very common error and hits a lot of us. The real problem actually occured before this error pops up, it means that the model crashed for whatever reason but didn\'t return a proper status code to the BOINC program.

The real reason might be found in a file with the name \"yabsd.out\" (possibly \"yabsd.out.gz\") which sits in the dataout folder of your model.

If you can uncompress it (WinZip should be able to do .gz files) and open it with Notepad, you might find a problem report at the end of the file (often something with negative pressure) or the file just ends with an incomplete line.
...


I got 10 of 12 runs terminated by an 161 error and had a look on the yabsd.out file. But there is a flood of messages i could not get any sense out.

Do you know a describtion or explanation of the content of this file ?

my last job terminated today, hadcm3lbm_4eh3_05161524, without leaving any out folder or file. do you know what happened there ?

thanks for help
Jochen from Old Germany
*** Since I'm a fool I prooved that the system is not foolproof ;-) ***
ID: 23643 · Report as offensive     Reply Quote
old_user101783

Send message
Joined: 10 Oct 05
Posts: 3
Credit: 26,902
RAC: 0
Message 24104 - Posted: 26 Aug 2006, 16:33:05 UTC - in response to Message 21066.  

Well,

The ID/units just keep returning Client Error -161 on and on on this machine. I\'m getting a bit tired of this, since i understand most unfinished projects (<10%) are useless for research, and i don\'t enjoy credit which wasn\'t actually beneficial, and it\'s swallowing time on this dual processor xp system which could benefit other boincs.

There might be a breakthrough: recently the comptuer started crashing again and again, with a blue window, i followed the clues on the crash screen, ended up downloading a display/graphics driver (don\'t remember the details), since then >20 days and the project is running on, the fact that i set \'no more work\' might also be related, in that there\'s no way for two CP\'s to run now (whereas previously they did many times, and only now i see i should\'ve set one of their graphics off). I hope it\'ll get better, \'cause only about a third of my 22k credit is justified, a big unit on a single-cpu machine, and a small one.

I hope it\'ll be possible to solve this via the BOINC/CP end, this is one of the projects i most want to contribute my infinitisimal help to.

Thanks,

gady b
thaye.net
ID: 24104 · Report as offensive     Reply Quote
Alex Plantema

Send message
Joined: 3 Sep 04
Posts: 126
Credit: 26,363,193
RAC: 0
Message 24263 - Posted: 10 Sep 2006, 10:01:01 UTC

Leaving the graphics window open while switching to another user (Windows XP) crashed a workunit. It happened with workunits from other projects as well.

ID: 24263 · Report as offensive     Reply Quote
old_user197776

Send message
Joined: 1 Sep 06
Posts: 11
Credit: 4,627
RAC: 0
Message 24335 - Posted: 18 Sep 2006, 1:44:29 UTC

I have run climateprediction twice before and started again today. With both previous attempts the models hung my computer - mouse, keyboard, graphics all frozen solid and required me resetting or switching off.

I have not seen this mentioned here and wondered if anybody else has had this problem or if anybody has any advice for me.

Thanks
ID: 24335 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 24338 - Posted: 18 Sep 2006, 7:23:12 UTC

The two most common reasons are:

* Memory (but you have well over the 512MB recommended amount)
* CPU overheating. Try the monitoring tools mentioned in an earlier post to get your CPU temp.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 24338 · Report as offensive     Reply Quote
old_user197776

Send message
Joined: 1 Sep 06
Posts: 11
Credit: 4,627
RAC: 0
Message 24370 - Posted: 21 Sep 2006, 12:50:28 UTC

I checked and it does not seem to be a temperature problem. I am running several projects and both cores at 100% for hours keeps the processor at 50°C and only once I saw it at 56°C. The processor fan speed also rarely exceeds 1400 rpm. The MB temp is usually below 40°C, so I don\'t see a problem there.

My problem is that everything hangs and I have to reset or switch off with power button, nothing else responds. This means I get no error messages.

I am running 4 SATA drives and my pagefile and temp directories are on seperate drives and seperate from program files ... not sure if this can be a problem though as it is set in the control panel.

It happens while using no graphic display, so doubt that could be the problem. The problem also occurs with only CPDN running or while other projects are running. If it is because I use BAM, is it possible to detach only CPDN from BAM and Boincstats?

Thank you for the assistance so far. I really want to run this project.
ID: 24370 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 24375 - Posted: 21 Sep 2006, 19:29:22 UTC

Hi,

The temperatures sound OK. I have a similar setup to yours in terms of the paging file and so forth, and it works fine.

It may be worth going through the stability checks, Prime95\'s Torture Test is very good...

Does the same thing happen with the Seasonal Attribution project? (attribution.cpdn.org)
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 24375 · Report as offensive     Reply Quote
Profile Rick D
Avatar

Send message
Joined: 12 Dec 05
Posts: 4
Credit: 93,834
RAC: 0
Message 24495 - Posted: 2 Oct 2006, 3:08:12 UTC

Hmmmmf.

I just crashed a BBC sim AGAIN. I\'ve read the posts about backups and temperatures and so on, but I think that\'s all missing the point--

this software is very fragile.

That is not a compliment.

There are many applications that make my machine (AMD 3700+, 3GB ram) work much harder. It has never failed any app due to heat.

I\'ve had four or five sims get up to around 10% and die of something or other. I don\'t have time to babysit my screensaver. It\'s frustrating that this app, alone of all the BOINC apps I\'ve tried, is so prone to crashing.

Further, I really don\'t subscribe to the \"some programs crash, you know\" proposition. If I had paid for sw that behaved like this, I\'d be furious. Honestly, wouldn\'t you?

I would like to help CPDN and BBC et al. save the world from climate change. Seriously. However, if my sims all die, my machine might just as well be folding proteins or tracking mosquitoes or evesdropping on ET or something.

My preferred solution is robust code from CPDN. My interim solution will be to drop this project after the next crash.

Harrumph.
-Rick

ID: 24495 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 24496 - Posted: 2 Oct 2006, 4:13:06 UTC

Perhaps you should send a copy of your complaints to the UK Met office.
After all, it IS their code, and they run it daily on their supercomputers for weather and climate forecasting. So the sooner they know how terrible it is the better.

ID: 24496 · Report as offensive     Reply Quote
Profile Rick D
Avatar

Send message
Joined: 12 Dec 05
Posts: 4
Credit: 93,834
RAC: 0
Message 24502 - Posted: 2 Oct 2006, 17:24:07 UTC - in response to Message 24496.  

Perhaps you should send a copy of your complaints to the UK Met office.
After all, it IS their code, and they run it daily on their supercomputers for weather and climate forecasting. So the sooner they know how terrible it is the better.



Hi Les,

Fine. Let\'s assume then that the core prediction sw runs perfectly on their supercomputers. That doesn\'t change a word of my points, it only indicates that the problem is elsewhere, for example in the Windows wrapper or the screensaver functionality. I\'m assuming that the UK Met does not run this daily on their supercomputers as a Windows screensaver.

-Rick
ID: 24502 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 6 · Next

Questions and Answers : Windows : Comments for \'Generic solutions to models\' sticky

©2024 climateprediction.net