Message boards :
Number crunching :
Project keeps resetting - any explanations?
Message board moderation
Author | Message |
---|---|
Send message Joined: 3 Feb 05 Posts: 1 Credit: 28,109 RAC: 0 |
A new project started running on my iMac a few days ago - UK Met Office HADAM3P Australia NZ. It seems to reset itself every few minutes, elapsed time and %completed returning to 0. Is it meant to do this? Running on Mac OSX 10.9.2 |
Send message Joined: 15 May 09 Posts: 4342 Credit: 16,497,933 RAC: 6,477 |
No it isn't meant to do this. Some models get stuck in a loop. Unless one of the moderators or someone with more knowledge than myself can think of anything you will have to abort. If it keeps happening on other tasks then we go back to the drawing board. |
Send message Joined: 16 Jan 10 Posts: 1081 Credit: 6,972,865 RAC: 3,926 |
One possibility is suggested by the history recorded for a model run earlier on that machine, hadcm3n_7vsn_1980_40_008452490_2. If you click on the '+' icon next to Stderr then a model log will appear. This shows a large number of entries of the form 'Suspended CPDN Monitor - Suspend request from BOINC...'. These entries occur because the default BOINC settings try to minimise the impact of BOINC on the computer, which is presumably used for something else most of the time. That standard setting does not work very well with the climate models, which are larger than most BOINC models. What may be happening is that two BOINC settings are interacting badly: the 'suspend when PC busy' setting and the 'leave application in memory' setting, such that each time the application is suspended it has to restart from the last save point, which for the ANZ models might be separated by a long time (> 10 minutes). So, if the model is suspended more often than the save interval then it will not make any progress. If this is indeed the cause, then the solution is twofold: in BOINC Manager, (1) make sure that 'leave applications in memory while suspended' is selected, and (2) make sure that the 'while processor usage is less than' setting is set to zero (which will stop the suspensions). These options are in Tools | Computing preferences: the suspension setting is on the 'processor usage' tab and the memory setting is on the 'disk and memory usage' tab. If that doesn't work then please post back here: someone else might have a better idea ... |
Send message Joined: 8 Aug 05 Posts: 12 Credit: 24,424,627 RAC: 0 |
I just experienced a significant issue with a Windows 7 64-bit machine with my CPDN project seemingly resetting - or maybe more accurately, attempting to restart multiple times - repeatedly. This discussion thread seems to fit closest the issue I encountered, because it appears to have been caused by my recently clearing the Local Preferences. The problem seems to have been caused by clearing the Local Preference settings and restarting BOINC Manager. That was yesterday. This morning, I found that Windows was warning of: 1) low memory; 2) the BOINC Manager trying multiple times to reconnect to a client (I assume CPDN as SETI was still running); and 3) a message that virtual memory (paging) was low. The machine was also clearly unstable. As I tried to investigate via Windows Task Manager, physical memory was full, showing only 1 MB free out of 12 GB physical memory installed. Also, through Advanced Settings, virtual memory was fixed at 512 MB, but Windows was recommending 18 GB (maximum; but 16 MB minimum). I believe I have fixed all these issues. Now, though, BOINC Manager does not list any CPDN tasks. SETI tasks are still present; and the SETI tasks appear to be running okay, too. BOINC Manager shows that CPDN Disk size is over 74 GB, which makes sense, because I had the Local Preference setting set for at least 10 days of work to be stored. First, any ideas why changing the Local Preferences might have caused memory to fill up and - apparently - crash CPDN? Do these events appear related? Second, any suggestions on how to get CPDN tasks restarted? Or, is this a hopeless cause requiring a "project reset?" Again, the Local Preferences have been reset to previous values - and in line with this thread's recommendations, in fact. Thanks for any help in advance! |
Send message Joined: 7 Aug 04 Posts: 2167 Credit: 64,477,979 RAC: 4,009 |
BOINC Manager shows that CPDN Disk size is over 74 GB, which makes sense, because I had the Local Preference setting set for at least 10 days of work to be stored. This sounds complex. Which PC of the 4 that you have listed is the one that has the problem? There is no way that there should be 74 GB in the CPDN project directory. Looking at all your PCs, none of them have enough tasks "in progress" to have anything like that. It's possible that some crashed tasks have left-over directories that are full of files. Look in the Tasks tab of BOINC Manager and make sure that it is set to Show All Tasks, i.e. if the button says "Show all tasks", click on it. If it says "Show active tasks", leave it as is. Any model directory under the climateprediction.net directory that doesn't correspond to a listed task can be deleted. That should get rid of many GB of space. As for preferences, cpdn seems to work best with the following preferences: Computing allowed: While computer in use Only after computer has been idle 0 minutes While processor use is less than 0 percent Use at most 100% CPU Time Leave applications in memory when suspended I'm not sure why you would have a memory problem. Generally cpdn executables are not memory hogs. If you have "Leave applications in memory when suspended" set to yes, and there are numerous projects with numerous tasks being suspended, then I could potentially see an issue. |
Send message Joined: 8 Aug 05 Posts: 12 Credit: 24,424,627 RAC: 0 |
Hi geophi, The computer with the issue is ID: 926174. (It has the name: mjo003). I verified that there are no CPDN tasks listed - active or pending. (I also left the command for Tasks set to "Show active tasks.") So, to summarize what you are telling me - since there are no active or pending CPDN project tasks, all 74 GB of data in the directory - associated with CPDN - can be cleared out. Another question come to mind given what you just had me do -- Could it be that my change in preference settings simply allowed the remaining tasks that were running on this machine to end quickly? I tried to make some heads-or-tails out of recent trickle info, but I am not sure I am interpreting it correctly. Anyway, the disk value has been typical for this machine for quite sometime, which is why I thought it was "normal." So -- I have one last question: Is it easiest to simply try a project "reset" to clean out the directory? I would dislike damaging the directory structure the way I go about wiping folders and disks... Thanks! |
Send message Joined: 5 Aug 04 Posts: 108 Credit: 19,032,850 RAC: 36,605 |
Anyway, the disk value has been typical for this machine for quite sometime, which is why I thought it was "normal." So -- I have one last question: Is it easiest to simply try a project "reset" to clean out the directory? I would dislike damaging the directory structure the way I go about wiping folders and disks... Thanks! Reset should work. |
Send message Joined: 8 Aug 05 Posts: 12 Credit: 24,424,627 RAC: 0 |
Things seem to be working right now. Again, unless I am way off base in interpreting what I am observing on my machine, I noted that the Disk tab's Disk Usage pie-chart did not drop. Rather, it increased to ~77 GB - about a 2+ GB increase! I'll wait a bit longer to see what is going on and then try deleting files that do not appear in in the Task listing. Thanks for the help! |
Send message Joined: 28 Mar 09 Posts: 126 Credit: 9,825,980 RAC: 0 |
An easier way to clean up the project directory would be to delete project CPDN and then add it back. For BOINC v6 that would be remove and then add the project. That is go to the Projects tab in BOINCmgr, select CPDN and then click on the delete (or remove) button. To add it back click on the Tools menu -> Add project and then select CPDN from the list That should clean out your project folder and is much easier that trying to work out what's needed and what is not. BOINC blog |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,888,554 RAC: 1,481,373 |
If cpdn is working for you, and you have lots of disk - prudent not to mess with it. But 77 Gibibytes is way way too much. Either manually remove the aged cruft - or wait until nothing running and reset the project. When convenient for you. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Michael A project Reset is less of a problem than a project Disconnect. Computer 926174 is the one with all of the crashed models. This crashing leaves behind small amounts of files, which can quickly add up to a lot of Gigs if you don't regularly clean then out. Which is EASY to do manually. Also, a manual clean up will show you just how many you're crashing, and how much is getting left behind. As for the size INCREASING a couple of Gigs, I'm not surprised, if you left the computer to download more, and then crashed some of them them too. 8 processors can quickly destroy a lot of data sets. And perhaps the reason for the crashing, is that you've left the option: Suspend work if CPU usage is above to the default of 25%. This may be OK for other projects, but here it can be fatal for climate models, which DON'T like being constantly interrupted. |
Send message Joined: 8 Aug 05 Posts: 12 Credit: 24,424,627 RAC: 0 |
Thanks all for the assistance! Les, you are correct about Suspend Work... - I thought I had followed geophi's list of recommended settings. Obviously, I didn't double-check. As for your advice, Eirik, I will plan a time to clean things up, because I am up to 80+GB as of today - 19 Jun (PST). Sorry to have been slow to ask questions about the disk usage before I wasted all those millions of CPU seconds around 14 Jun 2014! But, now I understand what I have been doing wrong. Still, it makes me sad to realize this large loss of work was mostly preventable... Best regards to all! |
©2024 climateprediction.net