Posts by copycat

21) Questions and Answers : Unix/Linux : What is the meaning of this? (Message 15911) Posted 11 Sep 2005 by copycat Post: And you said you have both Windows and Linux on those PCs. Testing hardware to see if it\'s reliable in Windows should suffice for determining hardware stability in Linux for CPDN. And, while Prime95 Windows executable might be linked from that thread, there is a Linux binary for Prime95 as well. I believe I also said Windows XP is only on here to play certain games, and since I\'ve still got lots of DVD\'s I need to watch on this pc (I\'ve only got ONE pc, despite what it may say in my profile, and no stand-alone DVD-player) I rarely start it up anymore. I\'ve got something monitoring my hardware in Windows, from when I was trying to determine why I was unable to (re-)install one of my two Linux OS\'s, but it seems some buggy sectors on the XP partition were the cause. In fact, that\'s MBM version 5. However, although I tried, I was unable to set up logging, so I can\'t determine what went wrong when that OS (XP) crashes. As long as it didn\'t crash, all parameters were within safe range. Also, yesterday, the linkt to the Prime95-tests was unavailable, and I searched but could not find the Prime95 Linux-version whilst googling. Also, if something goes wrong: a) shouldn\'t S@H and E@H suffer too and b) can\'t it (CPDN) tell me WHAT\'s wrong in the logfile (or at least nudge me in the right direction) BEFORE it dumps the restart/rewind-message?
22) Questions and Answers : Unix/Linux : What is the meaning of this? (Message 15866) Posted 11 Sep 2005 by copycat Post: <a href=\"http://www.climateprediction.net/board/viewtopic.php?t=2124\"> Maintenance</a> <a href=\"http://www.climateprediction.net/board/viewtopic.php?t=2126\"> Tests</a> edit: Sorry about the long strings. Carl has updated the server software. You DO know this is a Linux-forum and those links are to exe-files, right? At least one of them seems rather un-wine-able.
23) Questions and Answers : Unix/Linux : What is the meaning of this? (Message 15858) Posted 10 Sep 2005 by copycat Post: I see that you don\'t have any trickles recorded, even after a month. Are they all still in your climateprediction.net folder, or have you done a computer merge and had them allocated to another ID? Have you read / tried the maintenance / stress testing written by UK_Nick? A small situation report: My machine has 2 OS\'s, Window$ XP Home and Linux (actually there\'s two Linux OS\'s, but we\'ll treat them as one). On both OS\'s there\'s BOINC installed. Window$ has become rater unstable (as all Window$ do), but I only need it to play certain games and it can still manage that, as long as I save often enough. When I try to run that BOINC, the OS crashes. Since I managed to make one of my Linux OS\'s a DVD-player I don\'t start it up that often anyways. On the Window$ BOINC there\'s some S@H WU\'s (behind deadline) and CPDN WU\'s (before deadline) present but I can\'t finish them for the the reasons I outlined above. On the Linux-BOINC I first had S@H, then CPDN, abandoned CPDN (full detach), then E@H, and now re-attached CPDN. The reason why I don\'t have any trickles recorded is rather obvious: none of the WU\'s I (try to) process can run long enough to produce any trickles! In Linux the WU crashes (as you\'ve seen) and in Window$ the OS crashes. I have done no computer merge on my CPDN account, only on the S@H and the E@H-account a short while ago. Strange, the number of computers seem to match the number of CPDN Wu\'s on each OS, but I guess that\'s a coincidence. I\'ve got a great deal of client errors on my result-page. Some of the (non errored) WU\'s are from before I detached (and thus I don\'t have anymore), some are on my Window$ partition, and two are currently in my Linux BOINC CPDN-folder. Currently BOINC is in EDF-mode and since there are two E@H WU\'s there, which obviously have shorter deadlines than the CPDN-WU, they are being crunched first. maintenance / stress test?
24) Questions and Answers : Unix/Linux : What is the meaning of this? (Message 15788) Posted 7 Sep 2005 by copycat Post: > In certain types of model instability, the model will go back to a hopefully > known good point, in this case at the start of the last month, and start from > there again. This gives it a chance to continue on after an error. If the > error was just some odd hardware glitch that doesn't reoccur, then it will > continue on OK. If the model is unstable, or the computer is unstable again, > it will give up and download a new model. Usually it rewinds a day, then a > month, then a year. You may not have noticed the rewind a day messages. Oh yes, I noticed the rewind a day message too, that just happened some time before. And yes, here we go again: 'Preparing for restart... Rewinding a model-year... Error: Restart files for dataout/restart.year not found Giving up, this result exceeded crash count for available restart files.' The EXACT same thing happened the first time I tried CPDN, so many months ago (march 2005). Apparently, it's STILL not fixed. :-\ First it rewinds a day, then it rewinds a month, then it tries to rewind a year, but can't because it hasn't gotten that far yet and then gives up. It has even happened so fast it couldn't even rewind a MONTH! I can understand S@H is not comparable to this, because a S@H-WU can be finished in one day, but an E@H-WU takes up several days too, so why can those WU's make it through that time whilst not crashing and not a CPDN-WU? I am very careful to suspend BOINC before shutting down, so everything can safely re-start the next time but it would seem that still isn't enough. I can't track the evolution of the SC-application, but the hadsm-application seems to have evolved 2 versions since that time. I'll abort SC, since it's either that or restart and probably face it crashing again. I'll see if the new hadsm3-application fares better than it's predecessor, although it looks doubtful.
25) Questions and Answers : Unix/Linux : What is the meaning of this? (Message 15754) Posted 6 Sep 2005 by copycat Post: It would seem my CPDN just lost a months worth of results. Why is it doing this? You can see the line before and after the rewinding-message, nothing extraordinary happened. 4843_200297411 - PH 1 TS 0004027 A - 24/02/1811 21:30 - H:M:S=0004:41:59 AVG= 4.20 DLT= 2.85 Preparing for restart... Rewinding a model-month... Copying restart files for model retry... Starting model ID 4843_200297411 Phase 1 Waiting for model startup, this may take a minute... 4843_200297411 - PH 1 TS 0002881 A - 01/02/1811 00:30 - H:M:S=0004:42:12 AVG= 5.88 DLT= 0.00
26) Questions and Answers : Unix/Linux : sulphur_cycle deadline unreachable (Message 15752) Posted 6 Sep 2005 by copycat Post: I know everything is put on hold while BOINC is trying to meet the SC deadline. However, that does not bother me. Those are the only two WU's left, S@H has none left, E@H has none left. I am not planning to abort the model if the results will still be accepted (and credited) after the deadline. I was just wondering, since in E@H I've had WU's which were sent in some time after the deadline and thus did not get any credit. Anyway, if I abort this one, CPDN will probably deposit another one in my BOINC so I'll be right back where I started. I am not doing this for the credit, but I don't like my computer working so long and hard on something and then not get anything in return for it. So, if I get confirmation the SC-WU will still get validated/credited when it's sent back weeks/months after the deadline (and thus miss the start of the Coupled Model) I'll keep it running. If I don't get anything in return, I'll abort it. I know S@H and E@H will come back once these (CPDN-WU's) are finished. Might be in a few months, but so what?
27) Questions and Answers : Unix/Linux : sulphur_cycle deadline unreachable (Message 15718) Posted 5 Sep 2005 by copycat Post: Running BOINC 4.43, with 3 projects: Seti@home, Einstein@home and, recently (re-)attached, CPDN. I tried CPDN before (some time ago) but it kept crashing so I abandoned it. The sulphur_cycle computations seem to be very cpu-intensive because the progress % is going up very slowly and the time to complete is enormous: 1463:06:30 as I write this, with a deadline of 2006/01/24. Given the fact my computer does not run 24/24, 7/7, far from it (15-20 hours/week at best, is my best guess), I calculated I can never reach the deadline. Although there was a \'no work fetch allowed\'-policy CPDN downloaded another WU, be it one for the hadsm3-application. That requires \'only\' 691:11:30 hours to completion, and has a deadline of 2006/08/16. So, what do I do? Abort the sulphur_cycle now, and thus relieve my pc of this \'overcommitted\'-burden? Wait for the deadline, and abort the sulphur-WU then? Wait until it has finished the sulphur-WU, although that will finish several months AFTER its deadline and will possibly shorten the time I can work on the other CPDN-WU? Please advise. Thanks in advance.
28) Questions and Answers : Unix/Linux : CPDN crashing regularly (Message 11935) Posted 19 Apr 2005 by copycat Post: I'm an energy-efficient user, meaning I shut down my computer when I stop at night, and turn it on again when I am ready to start up again. I am dualbooting, meaning BOINC (with CPDN) is installed both in Windows XP Home and Linux SuSe 9.2 64bit. Whenever BOINC is running in XP, something causes the OS to give me the dreaded BSOD, meaning BOINC is shutdown illegally so naturally that causes error messages. Therefore, whenever I'm running XP, that BOINC-client is now suspended most of the time. In Linux I am using a special script that automatically adds BOINC as a service when I start it up and provides a '2005-04-18 22:01:35 [---] Received signal 18 2005-04-18 22:01:35 [---] Received signal 15 2005-04-18 22:01:35 [---] Exit requested by user', I suspect, clean shutdown of the service when I stop. The script also provides a log. Even though Linux crashes are extremely rare, I got the following message from CPDN on April 2nd: 'Error: Restart files for dataout/restart.year not found Giving up, this result exceeded crash count for available restart files.' On April 18th, with another model, this one 'Error: Restart files for dataout/restart.year not found Giving up, this result exceeded crash count for available restart files.'. This way I am NEVER going to get any credit, so I might as well detach myself from the project, which I am seriously considering at this time. I NEVER have/had this kind of problem with Seti@home, but then again, the fact their data packets are a lot smaller probably has something to do with that.

Previous 20