climateprediction.net home page
Automatic Backup any good?

Automatic Backup any good?

Questions and Answers : Windows : Automatic Backup any good?
Message board moderation

To post messages, you must log in.

AuthorMessage
ChrisD

Send message
Joined: 8 Aug 04
Posts: 69
Credit: 1,561,341
RAC: 0
Message 20044 - Posted: 8 Feb 2006, 12:36:12 UTC

After 13 days of crunching, my laptop lost power and did not have the time to save work. :(
When BOINC was restarted, my experiment was at once reported as failed, rendering that WU unrecoverable.
CPDN Server says result completed, so I do not think it is a good idea to restore any backup from Yesterday and waste another 48 hours before the trikle goes in and gets rejected. So to avoid further trashing of WU\'s I have detatched that computer.
Now, with WU-runtimes from 3 weeks to maybe 6 or 8 months, likelihood of crashes are huge. Haven\'t checked statistics, but ratio of successfully completed Runs and crashed ditto must be depressing.

Having completed several THC-experiments, I sometimes had to restore a backup after a crash. Here the old client told me about the crash but asked me if the crash should be reported to CPDN.
By shutting down and restoring the backup from previous day, I was able to continue crunching.

This option is not available in BOINC, trashing even more Runs.

How about letting the client do a backup every 24 hours automatically. If the client discovers that the data-files have been corrupted, it should try restoring from the backup, and if this fails, ask the user for permission to report the crash to CPDN.
This way the user could restore a further backup if he had made such, and keep the experiment alive.

I know that there is one solution available now: Suspend network activity, thus preventing the client from reporting anything until You are sure it is OK, but it is error prone (Error 40, the one behind the keyboard.)

So how about implemeting the auto-backup suggested in the next client?

ChrisD

ID: 20044 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 20046 - Posted: 8 Feb 2006, 14:06:40 UTC

Oxford Uni has nothing to do with developing BOINC.
You\'ll have to discuss this on the BOINC forums <a href=\"http://boinc.berkeley.edu/dev/\"> here.</a>

ID: 20046 · Report as offensive     Reply Quote
ChrisD

Send message
Joined: 8 Aug 04
Posts: 69
Credit: 1,561,341
RAC: 0
Message 20047 - Posted: 8 Feb 2006, 16:50:35 UTC - in response to Message 20046.  

Oxford Uni has nothing to do with developing BOINC.
You\'ll have to discuss this on the BOINC forums <a href=\"http://boinc.berkeley.edu/dev/\"> here.</a>



Please Pardon me, but I may not have explained my problem correct.

As I understand this, BOINC manages the flow of data between my machine and the servers running the various projects.
Each project makes an application to do the actual math involved in that project.

As I see it, the CPDN application, when restarted after the crash, could not find its Data files and therefore aborted the WU.

This has nothing to do with BOINC. When the CPDN application gives up, BOINC faithfully and without delay, sends the required message issued by the CPDN app. reporting the crash and asks for replacement work.

What I was asking for is a CPDN client a little more rigid. One that does not give up because the work-area is corrupt, but tries to revert to a known good state.
I know it is already there: the files: restart.day restart.month and restart.year holds a restart point, but they are altered repeatedly by the client, and if the crash happens when one of these files are accessed, no salvage is possible unless these files are backed up somewhere safe.
When listening to my Computer, data are updated several times/hour. Each time, a disk error will crash the WU. Backing files up once every 24 hours, will reduce the risk by a large factor.

The CPDN Server does not state the number of crashed experiments, only the successful compled ones. However looking at the total model Years computed, there are Model Years enough for more than the double amount. Of course there will still be crashes, but looking at the posts, a lot of users mourn the loss of good computing time and the science that goes down the drain with it.

If a more fail safe CPDN client were made, more experiments just might make it to the finish line, thus contributing to science.

Thank You for Your time.

ChrisD

ID: 20047 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 20054 - Posted: 8 Feb 2006, 20:11:14 UTC

One of the problems with this project is that it uses the code developed by The Met for their forcasting.
This is 64 bit Fortran, is over 50 megs in size, and contains over a million lines of code. It took 2 or 3 years for the Oxford programmers to get it to run, in the pre-BOINC days.
It\'s somewhat amazing that it DOES run on desktops, let alone laptops, in spite of the wide variety of hardward/software combinations on which it is tried. And the pc programs against which it has to compete.

Backups are really a computer owner\'s responsibility, and not just for this project\'s programs and data.
And suspending and then exiting BOINC is a must BEFORE making a backup, so that the many files involved don\'t get out of sync.
Someday automatic backups may get builtin, but it\'s not likly to happen real soon.
The next phase, experiment 2, is due to be released in a few weeks, and there are hopes that the final pahse may be in 64bit code. But it isn\'t even funded yet.
And remember, this project has a limited lifetime, possibly another 3 years, so waiting for things to get better may not be an option. :)

ID: 20054 · Report as offensive     Reply Quote
Profile old_user17289

Send message
Joined: 13 Sep 04
Posts: 228
Credit: 354,979
RAC: 0
Message 20056 - Posted: 9 Feb 2006, 1:48:51 UTC

I had lots of failed WUs in the past, so I finally started making weekly manual backups. An amazing thing happened after this: no more crashes! It appears as if backups prevent WUs from crashing...
ID: 20056 · Report as offensive     Reply Quote
staffann

Send message
Joined: 23 Oct 05
Posts: 22
Credit: 526,746
RAC: 0
Message 20455 - Posted: 18 Feb 2006, 21:43:37 UTC - in response to Message 20056.  

I use this script to make automatic backups. It is run on a swedish WinXP, so you\'ll have to modify it for your language (the path to the BOINC folder, the name of the ntbackup window that is \"säkerhetskopiering\" in swedish).

Save it as a *.vbs file and just double-click to run. I also schedule it to run once a day. Dont forget to remove old backups, since this script doesnät overwrite them.


set WshShell = WScript.CreateObject(\"WScript.Shell\") 
WshShell.logevent 4, \"Starting backup of BOINC folder\"

REM Exit BOINC
ret = WshShell.AppActivate (\"BOINC Manager\" )
if ret = false then
	WshShell.logevent 1, \"Could not find BOIC to close it!\"
	WScript.quit -1
end if
WScript.Sleep 1000 
WshShell.SendKeys \"{F10}{k}{p}~\" 
WScript.Sleep 1000 
WshShell.SendKeys \"{F10}{a}{a}~\" 
WScript.Sleep 10000
ret = WshShell.AppActivate (\"BOINC Manager\" )
if ret = true then
	WshShell.logevent 1, \"BOIC is still running after attempt to close it!\"
	WScript.quit -1
end if

REM Backup BOINC
MyTime = Time
MyTime= Replace(MyTime, \":\", \"_\")
BackupName = \"C:\\BOINC_Backups\\BOINC_Backup_\"& Date & \"_\" & MyTime
BackupCommand = \"ntbackup backup c:\\program\\BOINC /J \"\"BOINC Backup\"\" /F \"\"\"+BackupName + \"\"\"\"
rem MsgBox BackupCommand 
WshShell.Run BackupCommand,1,false
WScript.Sleep 180000 
ret = WshShell.AppActivate (\"Säkerhetskopiering\" )
i=0
while ret = true
	i = i+1
	if i = 30 then
		WshShell.logevent 1, \"Ntbackup still running. Not restarting BOINC!\"
		WScript.quit -1
	end if
	WScript.Sleep 30000
	ret = WshShell.AppActivate (\"Säkerhetskopiering\" )
wend


REM Restart BOINC
REM set WshShell = WScript.CreateObject(\"WScript.Shell\") 
WScript.Sleep 2000 
WshShell.Run \"boincmgr.exe\",1,false
WScript.Sleep 10000 
ret = WshShell.AppActivate (\"BOINC Manager\" )
if ret = false then
	WshShell.logevent 1, \"Could not restart BOINC!\"
	WScript.quit -1
end if
WScript.Sleep 5000 
REM run always
WshShell.SendKeys \"{F10}{k}{k}~\" 
WScript.Sleep 5000 
REM Turn off network access
REM WshShell.SendKeys \"{F10}{k}{f}~\" 
REM WScript.Sleep 1000
WshShell.logevent 0, \"Successfully completed backup of BOINC folder\"



ID: 20455 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2169
Credit: 64,555,907
RAC: 5,858
Message 20471 - Posted: 19 Feb 2006, 5:22:10 UTC

Thanks staffann. I might have to give that a try.
ID: 20471 · Report as offensive     Reply Quote

Questions and Answers : Windows : Automatic Backup any good?

©2024 climateprediction.net