climateprediction.net home page
Sulphur units constantly failing

Sulphur units constantly failing

Message boards : Number crunching : Sulphur units constantly failing
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Curtis

Send message
Joined: 16 Dec 05
Posts: 27
Credit: 227,145
RAC: 6,532
Message 21150 - Posted: 8 Mar 2006, 4:05:51 UTC

The crashing definatly has to do with your computer. I think my last model crashed once but it had to have been a fluke, I have used so many different aplications at the same time as the model and not had a crash, that I dont know when to expect one.

Therefore if I am going to do a back up, it will probably be once a month, but my question is what needs to be backed up if I am to back up the data? And how should I back up the information? Should i just copy the files to a different location on my c drive?

If your system is unstable with high cpu usage then good luck. I wouldn\'t do graphics intensive game while I modeled. That and make sure your computer can breathe.

What does a crash look like for those of you who are having crashes?


ID: 21150 · Report as offensive     Reply Quote
old_user19523

Send message
Joined: 20 Sep 04
Posts: 14
Credit: 30,765
RAC: 0
Message 21157 - Posted: 8 Mar 2006, 12:43:03 UTC - in response to Message 21150.  

The crashing definatly has to do with your computer. I think my last model crashed once but it had to have been a fluke, I have used so many different aplications at the same time as the model and not had a crash, that I dont know when to expect one.

Therefore if I am going to do a back up, it will probably be once a month, but my question is what needs to be backed up if I am to back up the data? And how should I back up the information? Should i just copy the files to a different location on my c drive?

If your system is unstable with high cpu usage then good luck. I wouldn\'t do graphics intensive game while I modeled. That and make sure your computer can breathe.

What does a crash look like for those of you who are having crashes?



I\'m Sorry but my computer is perfectly stable.

I can run climateprediction without problems for many hours. I tried prime95 with climateprediction 50/50 cpu time for 24hours without a problem.

The problem is when an higher priority program require 100% cpu the climateprediction application get out of sync (remember that climateprediction run with a very low priority).

I know it and now i stop boinc every time i know that another application will need 100% cpu time.

There is a post also on the boinc dev\'s mailing list about this issue
ID: 21157 · Report as offensive     Reply Quote
old_user3434
Avatar

Send message
Joined: 30 Aug 04
Posts: 77
Credit: 1,785,934
RAC: 0
Message 21198 - Posted: 11 Mar 2006, 11:12:30 UTC - in response to Message 21157.  

Yep, the Sulhur Clients are still highly unstable to certain standard Situations.

(I know since after the last desaster I paused and just recently fired it up again... Results appear unchanged, hardly any of my Sulphur Models will ever complete.)
Scientific Network : 44800 MHz - 77824 MB - 1970 GB
ID: 21198 · Report as offensive     Reply Quote
Curtis

Send message
Joined: 16 Dec 05
Posts: 27
Credit: 227,145
RAC: 6,532
Message 21220 - Posted: 12 Mar 2006, 23:24:28 UTC

I am wondering if there is any way to verify the data before the phase is up. I dont know what happens but maybe every 5% it should try to verify the data, and make sure that the program didnt run into some sort of error. This is my input on the matter.

ID: 21220 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 21223 - Posted: 13 Mar 2006, 1:58:05 UTC

It\'s constantly verifying the data.
If it has a problem, it rewinds a day and trys again. Then it will rewind a month, and then a year. If it still has a problem, then it quits.

You can see the files in the dataout folder of the model: restart, restart.month, restart.year

ID: 21223 · Report as offensive     Reply Quote
Curtis

Send message
Joined: 16 Dec 05
Posts: 27
Credit: 227,145
RAC: 6,532
Message 21225 - Posted: 13 Mar 2006, 2:45:11 UTC

ahh ok cause it just did that today to me
it was at 9.02% then it jumped back to 9.00% when it had a problem late yesturday. The bionic client said 0% progress after the benchmarks failed
cool thanks
ID: 21225 · Report as offensive     Reply Quote
enginerd

Send message
Joined: 31 Aug 04
Posts: 13
Credit: 134,268
RAC: 0
Message 21237 - Posted: 13 Mar 2006, 19:01:48 UTC

i just had one die halfway phase 4 - any help??!? this is the second of my sulfur units to fail, it was after a restart, but i suspended cpdn first. no backup. :(
result # 1754289


<core_client_version>5.2.13</core_client_version>
<message><file_xfer_error>
<file_name>sulphur_j55s_100893152_0_4.zip</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>
<file_xfer_error>
<file_name>sulphur_j55s_100893152_0_5.zip</file_name>
<error_code>-161</error_code>
<error_message></error_message>
</file_xfer_error>

</message>

ID: 21237 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 21238 - Posted: 13 Mar 2006, 20:01:19 UTC

christo
Sorry, no help.

This is why big business make daily backups of their computer data.

ID: 21238 · Report as offensive     Reply Quote
enginerd

Send message
Joined: 31 Aug 04
Posts: 13
Credit: 134,268
RAC: 0
Message 21241 - Posted: 13 Mar 2006, 21:19:10 UTC - in response to Message 21238.  

This is why big business make daily backups of their computer data.

so does my small business...

unfortunately after the last run failing, i didnt want to touch this wu at all. however i had to restart the computer, and it failed within about 6 hours.
ID: 21241 · Report as offensive     Reply Quote
Curtis

Send message
Joined: 16 Dec 05
Posts: 27
Credit: 227,145
RAC: 6,532
Message 21280 - Posted: 15 Mar 2006, 7:29:40 UTC

i have a question. I havent been keeping too close of a track on how much time i am crunching but I have a feeling that its crunching faster than it did in the beginning. I am wondering if it is possible that crunching rate would change as the percent complete increases?
If so can anyone tell me why this is or could this be related to failing work units?


I think this might be happening but i dont have actual data to calculate the rate difference.
Just wanted input
Thanks
ID: 21280 · Report as offensive     Reply Quote
Profile Andrew Hingston
Volunteer moderator

Send message
Joined: 17 Aug 04
Posts: 753
Credit: 9,804,700
RAC: 0
Message 21283 - Posted: 15 Mar 2006, 9:06:27 UTC

Speed does change over the lifetime of a model, but not normally by much. Bear in mind that s/TS is averaged over the entire progress of the model, so the most usual reason for a fall is that something happened earlier to cause a slowdown such as a rewind to an earlier point (which may well be what happened in your case).
ID: 21283 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Sulphur units constantly failing

©2024 climateprediction.net