climateprediction.net home page
Posts by [B^S] Paul@home

Posts by [B^S] Paul@home

1) Message boards : Number crunching : Sulpher model stopped running? (Message 19364)
Posted 16 Jan 2006 by Profile [B^S] Paul@home
Post:
Don\'t worry, 5.x is going nowhere near that host until the proxy issue is resolved!

just an update - Chris Sutton\'s fixed version of 4.45 seems to have doen the trick. Benchmarks ran this morning, the sulpher model was paused, removed from memory and restarted succesfully once the BM finsished.

Cheers!

Paul.
2) Message boards : Number crunching : Sulpher model stopped running? (Message 19317)
Posted 14 Jan 2006 by Profile [B^S] Paul@home
Post:
When I had BOINC 4.45, and got the same problem, the patched version solved it.



Cool. Hopefully it will for me too... Have another machine running 4.45 at work so if it works on this one host, I will put it on the other too...

thanks again for the help guys! :)

Paul.
3) Message boards : Number crunching : Sulpher model stopped running? (Message 19267)
Posted 13 Jan 2006 by Profile [B^S] Paul@home
Post:
Yes there are a few people reporting problems with proxy authentication and 5.x

I have a bbug raised in relation to it and it is being looked at (hopefully!)

the patched 4.45 seems to be working nicely for me now. I will wait and see what happens next time BOINC does a benchmark to see if it craps out again!
4) Message boards : Number crunching : Sulpher model stopped running? (Message 19258)
Posted 13 Jan 2006 by Profile [B^S] Paul@home
Post:
Yes I am running 4.45. I cannot switch to 5.x on this host because of problems with proxy authentication on all 5.x versions.

I have downloaded Chris Sutton\'s version of 4.45 and it seems to be running fine. The Sulpher model started right back up again as soon as I stop / started the BOINC service...

Thanks for the help!

Paul.
5) Message boards : Number crunching : Sulpher model stopped running? (Message 19251)
Posted 13 Jan 2006 by Profile [B^S] Paul@home
Post:
Hi folks,

I am currently crunching WorkUnit 844214. This host usually trickles about once every day but I have noticed that it has not trickeled in 2 days. I have take a look at the WU on the host and have noticed something odd...

BOINC has the model status= running yet the process is not in the taskmanager.

In projects\\\\climateprediction.net folder, the last modified file was from Jan 11th at about 10.44am. This is a few hours after my last trickle. And the time a benchmark was attempted by BOINC.

in climateprediction.net\\\\sulphur_dfit_000626645, the last modified file is stderr_um.txt, last modified at 04.25 on the 11th. This roughly corresponds to the last trickle time (give +1 hour UTC). In this file, there are a couple of warnings and then the last line of text is cut off:
OPEN:  File dataout/dfitba.da39810 Created on Unit 22
CLOSE: WARNING: Unit 66 Not Opened
OPEN:  File dataout/dfitba.pg39aug Created on Unit 66
CLOSE: WARNING: Unit 67 Not Opened
OPEN:  File dataout/dfitba.ph39aug Created on Unit 67
CLOSE: WARNING: Unit 68 Not Opened
OPEN:  File dataout/dfitba.pi39aug Created on Unit 68
OPEN:  File dataout/dfitba.da39840 Created on Unit 22
OPEN:  File dataout/dfitba.da39870 Created on Unit 22
OPEN:  File dataout/dfitba.da398a0 Created on Unit 22
OPEN:  File dataout/dfitba.da398d0 Created on Unit 22
OPEN:  File dataout/dfitba.da398g0 Created on Unit 22
OPEN:  File dataout/dfitba.da398j0 Created on Unit 22
OPEN:  File dataout/dfitba.da398m0 Created on Unit 22
OPEN:  File da


This is as-is from the file. I have not messed up the copy/paste!!

Does this indicate that the model had some sort of issue at around the time of the last trickle?

Has anyone seen this before?


As an extra bit of info, the Benchmarks (10ish am) failed due to \'Aborting CPU benchmarks, one or more active tasks are still running.\'. I believe this was because the CP model had already hung at this point and BOINC seemed to think it could not stop it.

Now the current state:
BOINC things CPDN is running
The sulpher app is not in taskmanger.
The load on the machine is \'missing\' BOINC science processes (host shuold always run 6 out of 8 processors, currently only 5 are busy)

I will stop/start BOINC in a while to see what happens but just wondering if there is any information that would be of use to to the project while the model is still in this state...

cheers,


Paul




©2024 climateprediction.net