Message boards : climateprediction.net Science : Sulphur model repeatedly respaws child process
Message board moderation
Author | Message |
---|---|
Send message Joined: 9 Feb 06 Posts: 4 Credit: 15,859 RAC: 0 |
Hi, some time ago, my sulphur model started respawning the child process (the one that sulphur_4.22_windows_intelx86.exe spawns). That is, it would start the process, it would flash onto the task manager, disappear, and it would start it again. I tried stopping and restarting the BOINC service, and other things. I finally \"resolved\" this by simply deleting the entire BOINC folder and replaced it with a copy I\'d taken a week before that. This seemed to work fine. Until now. Unfortunately I have been so busy lately that I forgot to take any backups of the BOINC folder. From what I can remember, the \"orginial\" cp.net client had some restart files, any chance I can use these? Also, I\'m curious as to why the child process is quitting like this, and not leaving an error in any of the logs. Unless it\'s using some \"hidden\" log files anywhere? Any help appreciated. |
Send message Joined: 5 Aug 04 Posts: 390 Credit: 2,475,242 RAC: 0 |
Hi there, log files can be found in stdoutdae.txt and stderrdae.txt in BOINC folder. There is no need to run older Sulphur cycle experiment so you may automatically get and updated Coupled model (one SC is finished or aborted). You may try to re-check your profile and set \"Preemped\" to on Leave applications in memory while preempted? (suspended applications will consume swap space if \'yes\') at http://climateapps2.oucs.ox.ac.uk/cpdnboinc/prefs.php?subset=global Also, you can merge host - I guess your 3 hosts are actually single AMD X2. Just a cosmetic thing. <i>phpBB forum for CPDN, all are </i><a href="http://www.climateprediction.net/board">invited</a> |
Send message Joined: 17 Aug 04 Posts: 753 Credit: 9,804,700 RAC: 0 |
See this thread for what may be a similar problem. |
Send message Joined: 9 Feb 06 Posts: 4 Credit: 15,859 RAC: 0 |
Sounds like it might be the same issue. Log files doesn\'t contain any indications of why the worker is exiting. I\'ve also read some other threads that indicate there is no way to recover since I do not have a full backup. If the system really relies on \"are you alive\" messages, as indicated in the other thread, then it\'s rather poorly designed imho. The processing polling the worker could easily know if the worker is alive but not responding due to cpu starvation by checking the process handle and GetProcessTimes. I use my computer at all parts of the day (often leave it over night to work on things etc), so setting \"working hours\" would be useless. And unless there\'s a way to perform some automatic backup (which of course wouldn\'t perform the backup if the worker is acting like it does now), do I see any way for me to continue. I guess I\'ll have to find another project then. :( In any case, cheers for the help! |
Send message Joined: 9 Feb 06 Posts: 4 Credit: 15,859 RAC: 0 |
Btw, sorry for beeing totally blind :) Somehow I managed to convince myself that the \"questions\" message board was for the classical client... |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
... I should point out that my theory in the other thread is (as usual) based on guesswork and observation, rather than knowledge and codereview. I may be right, I may be wrong... I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 9 Feb 06 Posts: 4 Credit: 15,859 RAC: 0 |
Yes, I\'m aware of that, hence my \"if\" :) It\'s not an unreasonable observation however, and the condition is difficult to catch during debugging unless you specifically test for it. |
©2024 cpdn.org