climateprediction.net home page
Sulphur units constantly failing

Sulphur units constantly failing

Message boards : Number crunching : Sulphur units constantly failing
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
old_user70540

Send message
Joined: 15 Apr 05
Posts: 10
Credit: 129,186
RAC: 0
Message 19939 - Posted: 3 Feb 2006, 13:16:54 UTC

Ever since getting sulphur 4.22 dl\'d to my STABLE machine, this has happened. Any explanations?
ID: 19939 · Report as offensive     Reply Quote
old_user19523

Send message
Joined: 20 Sep 04
Posts: 14
Credit: 30,765
RAC: 0
Message 19940 - Posted: 3 Feb 2006, 13:44:25 UTC - in response to Message 19939.  
Last modified: 3 Feb 2006, 13:44:46 UTC

Ever since getting sulphur 4.22 dl\'d to my STABLE machine, this has happened. Any explanations?


you should post the workunits, we can\'t see your results page

well, i think that sulphur 4.22 has some problems, i hope that the next experiment or next suplhur version will be more stable
ID: 19940 · Report as offensive     Reply Quote
old_user70540

Send message
Joined: 15 Apr 05
Posts: 10
Credit: 129,186
RAC: 0
Message 19941 - Posted: 3 Feb 2006, 13:54:33 UTC - in response to Message 19940.  
Last modified: 3 Feb 2006, 13:58:02 UTC

you should post the workunits, we can\'t see your results page

well, i think that sulphur 4.22 has some problems, i hope that the next experiment or next suplhur version will be more stable


There are quite a few:
1780404
1775160
1709086
1626999
1619956
1617821
1617728 (I reset the project after this WU failed, but before it reported, so it still shows as active)
1618826

I just dl\'d and have started 1782971

I have used CPDN on this machine since April alongside SETI, Einstein, LHC, and PrimeGrid without any troubles until now.
ID: 19941 · Report as offensive     Reply Quote
old_user70540

Send message
Joined: 15 Apr 05
Posts: 10
Credit: 129,186
RAC: 0
Message 19943 - Posted: 3 Feb 2006, 14:11:44 UTC

I just noticed this thread, so I\'ll be watching that one.
ID: 19943 · Report as offensive     Reply Quote
KWSN Sir Clark

Send message
Joined: 8 Jul 05
Posts: 33
Credit: 1,274,211
RAC: 0
Message 19964 - Posted: 4 Feb 2006, 1:43:31 UTC

Phew, thought it was just me.....

Not had much luck at all with CPDN.........haven\'t finished a WU yet for various reasons.
ID: 19964 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 19967 - Posted: 4 Feb 2006, 4:22:07 UTC

I just had another 4.22 die on me at 60% (?) forgot to write it down, it was at either 40 or 60%, It was this one. I know it says aborted by GUI, but the app was not running, the directory had only files that were 1K in size (I did save this as a copy, send me a note if you want the copy of the slots dir).

Very strange indeed. the only good news I guess is that I still have 3 4.19 models and they seem to be runing well. The tension is rising ... 1 hour something on one of them .. :)

I hate to be brusk, but are any of the 4.22 models completing?

And, even it not, is it worth my time to run paritals? I know you probably told me somewhere ...

Anyhow, p.d.buck@comcast.net if you want the slots dir ... not sure what good it will do, all the files as small and zips ... a runaway delete all files gremelin? NO matter I suppose ... :)
ID: 19967 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 19968 - Posted: 4 Feb 2006, 4:49:44 UTC

If people can finish pahse one, there is extra info in it, (compared to slab), that is very usefull. After that, the team need all of the rest.

Frequent backups are a must!

ID: 19968 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2169
Credit: 64,555,907
RAC: 5,858
Message 19969 - Posted: 4 Feb 2006, 5:20:53 UTC - in response to Message 19967.  

I hate to be brusk, but are any of the 4.22 models completing?

Just finished two in WinXP in the last two days.

1575491
1556477

but failure after failure in Linux 4.23.
ID: 19969 · Report as offensive     Reply Quote
old_user19523

Send message
Joined: 20 Sep 04
Posts: 14
Credit: 30,765
RAC: 0
Message 19970 - Posted: 4 Feb 2006, 11:29:02 UTC - in response to Message 19968.  
Last modified: 4 Feb 2006, 11:29:54 UTC

If people can finish pahse one, there is extra info in it, (compared to slab), that is very usefull. After that, the team need all of the rest.

Frequent backups are a must!



daily backups and internet disabled is a must for 4.22 :)

I keep 3 days of backups to be sure 100% :)

it\'s a challenge for me to finish this workunit :)

ID: 19970 · Report as offensive     Reply Quote
KWSN Sir Clark

Send message
Joined: 8 Jul 05
Posts: 33
Credit: 1,274,211
RAC: 0
Message 19971 - Posted: 4 Feb 2006, 11:52:47 UTC

I thought the idea of BOINC was you can crunch multiple projects but just attaching and then leaving it be.

I can\'t be bothered doing backups etc. Knowing my luck I\'d forget to backup.

If the next WU fails, I\'m gonna stop crunching CPDN until a new app. is released.
ID: 19971 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 19972 - Posted: 4 Feb 2006, 12:42:07 UTC - in response to Message 19968.  
Last modified: 4 Feb 2006, 12:52:26 UTC

Frequent backups are a must!

Well, if this is true, then it needs to be part of the application.

Like Clark said, it is difficult for me to remember/perform daily back-up on the CPDN project on the Daily even times ~80 days times 8 systems is a lot of additional work. To this point I have had models die on occasion ... but, on systems that had completed regularly and without trouble 4.19 models ...

My only concern is that I am wasting my and your time even trying to run these models. I cannot be sure, but I do not think I have yet to complete a 4.22 model. Many die immedately others seem to wait a bit ..

Ok, I saw on the other board that some new work has be created so I will try a couple more.

==== edit

I don\'t mind the space, and would like to have \"rotations\" (for me I would pick 2) but there is litterally no way that I am going to remember to do this consistently enough to be of value. In my situation it is just too hard for *ME* to do on a manual basis.

Perhaps this should become a fairly high priority item for the Devs on the next major update when the new models are released. In all honesty, it probably should be made part of the BOINC application so that any project could avail itself of the utility. The control would be on the Preferences page for the project, allow backup or perhaps simply a number daily back up allowed 0, 1, 2 to probably a max of 4 ...
ID: 19972 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 19975 - Posted: 4 Feb 2006, 14:08:19 UTC

It jsut occurred to me that this would HAVE to be a BOINC Client Software change as along with the CPDN slot folder back up, a client state file extract would also have to be made. Requiring the participant to back up the entire BOINC folder is obviously impractical for those of us that run multiple projects as ther remaining project contents would be very dynamic.

I am not sure how the system as a whole will react with a client error that has already been reported if such a back-up system was in place in that the \"rewind\" would \"reserrect\" the dead! :)
ID: 19975 · Report as offensive     Reply Quote
Profile old_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 19985 - Posted: 5 Feb 2006, 7:04:13 UTC

Guys, I am not sure if this is completely relevant, but, I had two idle computers and downloaded work on both. One has 12 hours in on the work, the other died immediately.

Looking at the ones that are dying right away all of them have been recently created. The one that started running was generated back in August. I mean, it may die soon too ... but, are we sure that the work being created now is valid?

This is really odd to me, computers that just recently were successful in doing models now cannot even start one up?
ID: 19985 · Report as offensive     Reply Quote
KeeperC

Send message
Joined: 5 Aug 04
Posts: 66
Credit: 2,146,056
RAC: 0
Message 19987 - Posted: 5 Feb 2006, 16:48:51 UTC - in response to Message 19967.  

I hate to be brusk, but are any of the 4.22 models completing?


I have had two machines having problems with 4.22 but two others not showing any signs of stress. So far, one completed 4.22 model and the next most complete nearing the end of phase 3.

I don\'t take any precautions - no back-ups, never stop before defrag nor before shutdown. Internet on always.

ID: 19987 · Report as offensive     Reply Quote
old_user19523

Send message
Joined: 20 Sep 04
Posts: 14
Credit: 30,765
RAC: 0
Message 20662 - Posted: 23 Feb 2006, 13:32:06 UTC - in response to Message 19970.  

If people can finish pahse one, there is extra info in it, (compared to slab), that is very usefull. After that, the team need all of the rest.

Frequent backups are a must!



daily backups and internet disabled is a must for 4.22 :)

I keep 3 days of backups to be sure 100% :)

it\'s a challenge for me to finish this workunit :)


i managed to finish the first phase :) i don\'t know who has done more work, the cpu or I with the backups ;)
ID: 20662 · Report as offensive     Reply Quote
m.mitch
Avatar

Send message
Joined: 10 Jan 06
Posts: 55
Credit: 1,440,333
RAC: 6,379
Message 20907 - Posted: 1 Mar 2006, 15:02:01 UTC - in response to Message 19972.  


Well, if this is true, then it needs to be part of the application.
.......[big snip].........


I would have to agree with Paul on that. I\'ve only remembered to backup once so far and that was only one of the two CPDN\'s I have running.


Click here to join the #1 Aussie Alliance on Climate Prediction
ID: 20907 · Report as offensive     Reply Quote
Curtis

Send message
Joined: 16 Dec 05
Posts: 27
Credit: 227,145
RAC: 6,532
Message 20958 - Posted: 2 Mar 2006, 5:51:51 UTC

wow.. this many problems? hmmm sounds like both Climateprediction.net and Bionic have a lot of work ahead for them. Is it important that people keep having these errors and sending in results or would it be better for climateprediction to halt the 4.22s and have people work on a more stable work unit? I would like these results to help out as much as possible as im sure everyone else doing this is hoping. So what should we do? CPDN any comments on this?
Thanks
Oh there is no way that im going to make back ups on a daily basis or even a weekly basis, so somethings going to have to change.


ID: 20958 · Report as offensive     Reply Quote
belgix

Send message
Joined: 5 Aug 04
Posts: 85
Credit: 2,924,043
RAC: 0
Message 20960 - Posted: 2 Mar 2006, 6:54:06 UTC - in response to Message 20662.  

Frequent backups are a must!

daily backups and internet disabled is a must for 4.22 :)

I keep 3 days of backups to be sure 100% :)

it\'s a challenge for me to finish this workunit :)

i managed to finish the first phase :) i don\'t know who has done more work, the cpu or I with the backups ;)


It seems that some lucky bastards don\'t need to do extra work ... I do backups only once a month - just before steping up to next phase! Until now, no problem at all with Sulphur 4.22 & Boinc 5.3.x under Linux (& yes, connected to the Internet 24/7).
ID: 20960 · Report as offensive     Reply Quote
Curtis

Send message
Joined: 16 Dec 05
Posts: 27
Credit: 227,145
RAC: 6,532
Message 20961 - Posted: 2 Mar 2006, 6:58:19 UTC

actually now that belgix has put up his post i see that i could try to back up the model when it gets close to phase 2. what would be a recomended way to back up CPDN?
Thanks everyone!
ID: 20961 · Report as offensive     Reply Quote
old_user19523

Send message
Joined: 20 Sep 04
Posts: 14
Credit: 30,765
RAC: 0
Message 21133 - Posted: 7 Mar 2006, 16:13:06 UTC - in response to Message 20960.  

Frequent backups are a must!

daily backups and internet disabled is a must for 4.22 :)

I keep 3 days of backups to be sure 100% :)

it\'s a challenge for me to finish this workunit :)

i managed to finish the first phase :) i don\'t know who has done more work, the cpu or I with the backups ;)


It seems that some lucky bastards don\'t need to do extra work ... I do backups only once a month - just before steping up to next phase! Until now, no problem at all with Sulphur 4.22 & Boinc 5.3.x under Linux (& yes, connected to the Internet 24/7).


every time an application use 100% cpu for a while, the sulphur 4.22 crash.

it can be a game or another kind of application, the result is the same a crash.

I hope that after the BBC project release the dev\'s will correct this problem
ID: 21133 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Sulphur units constantly failing

©2024 climateprediction.net