climateprediction.net (CPDN) home page
Thread 'Sulphur model repeatedly respaws child process'

Thread 'Sulphur model repeatedly respaws child process'

Message boards : climateprediction.net Science : Sulphur model repeatedly respaws child process
Message board moderation

To post messages, you must log in.

AuthorMessage
Lord Crc

Send message
Joined: 9 Feb 06
Posts: 4
Credit: 15,859
RAC: 0
Message 22234 - Posted: 20 Apr 2006, 23:01:57 UTC

Hi,

some time ago, my sulphur model started respawning the child process (the one that sulphur_4.22_windows_intelx86.exe spawns). That is, it would start the process, it would flash onto the task manager, disappear, and it would start it again.

I tried stopping and restarting the BOINC service, and other things. I finally \"resolved\" this by simply deleting the entire BOINC folder and replaced it with a copy I\'d taken a week before that. This seemed to work fine. Until now.

Unfortunately I have been so busy lately that I forgot to take any backups of the BOINC folder. From what I can remember, the \"orginial\" cp.net client had some restart files, any chance I can use these?

Also, I\'m curious as to why the child process is quitting like this, and not leaving an error in any of the logs. Unless it\'s using some \"hidden\" log files anywhere?

Any help appreciated.
ID: 22234 · Report as offensive     Reply Quote
ProfileHonza
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 390
Credit: 2,475,242
RAC: 0
Message 22245 - Posted: 21 Apr 2006, 7:49:46 UTC

Hi there,

log files can be found in stdoutdae.txt and stderrdae.txt in BOINC folder.

There is no need to run older Sulphur cycle experiment so you may automatically get and updated Coupled model (one SC is finished or aborted).

You may try to re-check your profile and set \"Preemped\" to on
Leave applications in memory while preempted?
(suspended applications will consume swap space if \'yes\')
at http://climateapps2.oucs.ox.ac.uk/cpdnboinc/prefs.php?subset=global

Also, you can merge host - I guess your 3 hosts are actually single AMD X2. Just a cosmetic thing.
<i>phpBB forum for CPDN, all are </i><a href="http://www.climateprediction.net/board">invited</a>
ID: 22245 · Report as offensive     Reply Quote
ProfileAndrew Hingston
Volunteer moderator

Send message
Joined: 17 Aug 04
Posts: 753
Credit: 9,804,700
RAC: 0
Message 22253 - Posted: 21 Apr 2006, 12:31:23 UTC

See this thread for what may be a similar problem.
ID: 22253 · Report as offensive     Reply Quote
Lord Crc

Send message
Joined: 9 Feb 06
Posts: 4
Credit: 15,859
RAC: 0
Message 22282 - Posted: 22 Apr 2006, 3:18:46 UTC

Sounds like it might be the same issue. Log files doesn\'t contain any indications of why the worker is exiting. I\'ve also read some other threads that indicate there is no way to recover since I do not have a full backup.

If the system really relies on \"are you alive\" messages, as indicated in the other thread, then it\'s rather poorly designed imho. The processing polling the worker could easily know if the worker is alive but not responding due to cpu starvation by checking the process handle and GetProcessTimes.

I use my computer at all parts of the day (often leave it over night to work on things etc), so setting \"working hours\" would be useless. And unless there\'s a way to perform some automatic backup (which of course wouldn\'t perform the backup if the worker is acting like it does now), do I see any way for me to continue. I guess I\'ll have to find another project then. :(

In any case, cheers for the help!
ID: 22282 · Report as offensive     Reply Quote
Lord Crc

Send message
Joined: 9 Feb 06
Posts: 4
Credit: 15,859
RAC: 0
Message 22292 - Posted: 22 Apr 2006, 12:05:36 UTC

Btw, sorry for beeing totally blind :) Somehow I managed to convince myself that the \"questions\" message board was for the classical client...
ID: 22292 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 22317 - Posted: 23 Apr 2006, 8:11:48 UTC - in response to Message 22282.  
Last modified: 23 Apr 2006, 8:12:29 UTC

...
If the system really relies on \"are you alive\" messages, as indicated in the other thread, then it\'s rather poorly designed imho. The processing polling the worker could easily know if the worker is alive but not responding due to cpu starvation by checking the process handle and GetProcessTimes.
...


I should point out that my theory in the other thread is (as usual) based on guesswork and observation, rather than knowledge and codereview. I may be right, I may be wrong...

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 22317 · Report as offensive     Reply Quote
Lord Crc

Send message
Joined: 9 Feb 06
Posts: 4
Credit: 15,859
RAC: 0
Message 22333 - Posted: 23 Apr 2006, 18:25:02 UTC

Yes, I\'m aware of that, hence my \"if\" :) It\'s not an unreasonable observation however, and the condition is difficult to catch during debugging unless you specifically test for it.
ID: 22333 · Report as offensive     Reply Quote

Message boards : climateprediction.net Science : Sulphur model repeatedly respaws child process

©2024 cpdn.org