climateprediction.net home page
Posts by SAK

Posts by SAK

1) Message boards : Number crunching : HadCM3s post-completion artifacts (Message 52472)
Posted 24 Aug 2015 by SAK
Post:
First, thanks all for the suggestions and comments!

Lots of people do indeed run multicore computers, with several models. (It's generally recommended that these models work best when there's 1 Gig of ram per model, plus 1 more for the OS. A bit overkill perhaps, but as a rough rule of thumb ...)

At 24GB, that gives me 2GB per model plus about 8GB for the OS/other processes so I would think I would be OK. After a reboot, when I start BOINC and everything kicks in to high gear, I am using 4-5GB total. As expected, that slowly creeps up the longer the models run, but when they complete instead of seeing a drop in memory usage, I see a big jump as the next set of models start up.

My two machines have 4 real processors, plus 4 hyperthreading. I only run 4 cores, as the others don't give that much extra, and I'm hoping that it leaves something for the OS when it needs it. (With 16 Gig of ram.)
This worked OK for around 8-9 years with Windows, and now with Linux.

Similar to my setup, maybe I will limit things to 4 concurrent models and see what happens. But I have a hard time believing this is some sort of hardware issue...

Have you made a simplified post, similar to your original post here, on BOINC/dev? Perhaps some of the people who frequent that forum may have a clue.

I have not, thanks for the suggestion. If I am the only person experiencing this, perhaps I just need to suck it up and reboot periodically. However, I will be leaving for about six weeks in Sept/Oct and if I can't figure this out, I won't be running CPDN while I am gone.

I have had bad experiences with AVs, and generally avoid them... Can you try disabling it to see if it makes a difference?

Will do, thanks for the suggestion.

Also, if BOINC is installed as a service, there is a User listed as "BOINC Master" that I would exclude also.

Not running as a service.

Also, I don't see your version of BOINC or OS listed. It should not matter, but maybe it does.

Sorry, Windows 8.1 and BOINC v7.4.42(x64)

To answer your other question, I run 6 Ivy Bridge cores on CPDN, and two more supporting GPUs (GTX 750 Ti). I don't think that matters, but I don't recall running all 8 cores on CPDN.

You are the second person to suggest running all cores may not be a good idea. I have not experienced any drop in day-to-day usability so I figured an unused CPU cycle is a wasted CPU cycle and would run "all out" on all my cores unless I experienced performance issues. I will try scaling back to only using 4 cores (I have four "real" cores and 4 hyper threading virtual cores on this CPU) and see what happens.

Please note that the Resource Monitor software keeps showing information about the software for quite some time even if you have closed the program. You can check this with a program that is easy to recognize on the list, like Excel.

Thanks for the observation Harri. In this case, my screen grab was from approximately 9-hours after the last batch of models had completed. Plus I find the memory usage information in the Resource Monitor to be more informative than in the Task Manager. However, I will be sure to take your observation into consideration in the future.

I have a simiar experience on usage compared to Les in that I am running 4 cores and using about 4Gb of memory. Is it a quirk of Win8.1 as I am running Win7 (64bit pro)?

I am starting to wonder that as well. However, I did a test a week or two ago and have found I can run eight SETI@Home units concurrently as well as two on my GPUs and my memory usage does not increase over time like it does running CPDN units on my CPU. That leads me to think that in my case there may be a CPDN issue, not a BOINC issue behind this.

Thanks again for all the thoughts and comments, I appreciate it. I will report back about any effects disabling the AV has, if any.
2) Message boards : Number crunching : HadCM3s post-completion artifacts (Message 52462)
Posted 23 Aug 2015 by SAK
Post:
Thanks Ian. The thing that doesn't make sense to me is that if I am using 23GB of memory and suspend processing, reboot and then restart the eight work units, why does my memory usage fall back to about 4GB overall on the system instead of picking back up to the 23GB previously being used? If it only needs a fraction of the 23GB when restarted, why is it using so much memory over time without releasing unused memory?

Is the fix for me to somehow limit the number of CPUs that CPDN can use so that it throttles back the number of concurrent work units? I still find it strange that I seem to be the only person experiencing this. Don't other folks processing on multi-core processors run all cores simultaneously?

Thanks again for the input, appreciate it.
3) Message boards : Number crunching : HadCM3s post-completion artifacts (Message 52458)
Posted 23 Aug 2015 by SAK
Post:
I guess I should feel special...

Hopefully the graphics above prove I am not making this stuff up! If there is anything else I can do to help troubleshoot please let me know.

Thanks again for the help and advice.
4) Message boards : Number crunching : HadCM3s post-completion artifacts (Message 52456)
Posted 23 Aug 2015 by SAK
Post:
Thanks for the info on the graphics Jim, I won't worry about units that don't have the Show Graphics option.

Going back to my original problem, the last round of work units finished processing about nine hours ago and I am about to hit the memory wall again. First the BOINC Manager listing the currently processing units:



And my current memory status:



And the processes still in memory, including completed CPDN work units:



Any thoughts appreciated... am I really the only person experiencing this?
5) Message boards : Number crunching : HadCM3s post-completion artifacts (Message 52453)
Posted 22 Aug 2015 by SAK
Post:
Thanks again astroWX, your help is much appreciated. I have enabled the setting to leave the application in memory when suspended. I forgot I had disabled that when I was trying to troubleshoot my memory issues. Regarding the antivirus, I had already disabled scanning of the \ProgramData\BOINC folder but just now excluded the \Program Files\BOINC folder as well.

Regarding the graphics, I had not noticed that functionality because whenever I select one of the seven short units currently running, the Show Graphics option in BOINC is grayed out. I am able to use it on the one Australia New Zealand 6.10 unit that is also running, very cool! Does the fact that the Show Graphics tab is not available for the short units indicate a problem and if so, should I abort them?

Thanks again...
6) Message boards : Number crunching : HadCM3s post-completion artifacts (Message 52450)
Posted 21 Aug 2015 by SAK
Post:
Forgot to answer one of your questions...

astroWX said:
Checking graphics, are all timesteps completed? Does CPU time continue to increase when tasks are "completed"? How many days? (It takes a lot of days to complete enough tasks to use-up 24 Gig of RAM.)


Afraid I am not sure what you mean by "checking graphics". Completely new to this, sorry. I find the memory use grows to consume my 24GB within 4-5 days typically. After a reboot and restarting BOINC, my memory use is at a baseline 4GB or so and then increases in jumps that seem to correspond to new work units starting until I have to eventually reboot again.

Thanks again for your help. I notice in your signature you say "Greetings from Coastal Washington State"... I am in South Puget Sound.
7) Message boards : Number crunching : HadCM3s post-completion artifacts (Message 52449)
Posted 21 Aug 2015 by SAK
Post:
Thank you astroWX, I will try to give you a bit more information. I am pretty confident this is a BOINC / ClimatePrediction.net issue because if I run just SETI@Home I don't have the memory creep issue. As far as my computing, I am only running SETI@Home and ClimatPrediction.net. I believe I have my settings configured to give ClimatePrediction.net priority, and that appears to be the case when I look at the Tasks tab in the BOINC Manager. The only thing using the GPUs is SETI@Home, so basically my system is running SETI@Home on the GPUs and ClimatePrediction.net on all the CPU cores unless there is no work to do and then it falls back to SETI@Home on the CPU. Below are my settings:









And my BOINC related processes currently running in memory:


And lastly a grab of my current processing tasks in BOINC Manager:


Not sure what else I can give you at this point to help. It looks like I will have the currently processing round of ClimatePrediction.net units complete tonight so I will report back tomorrow morning with where my memory stands. Right now my memory snapshot looks like this:


Thanks in advance for any help you can provide, and if you have any setting suggestions please let me know.
8) Message boards : Number crunching : HadCM3s post-completion artifacts (Message 52442)
Posted 20 Aug 2015 by SAK
Post:
Sorry if this is a noob question, but I started computing on this project about two months ago and have been noticing an issue with memory usage. I find that if I don't reboot after climateprediction models have completed it appears that the processes do not close out gracefully staying in memory and over a period of days I end up experiencing low memory problems that force my system to reboot. I have 24GB of memory on this system and have found that it is all used up with multiple hadXXXX processes still in place even after reporting a successful completion.

Is the answer to simply reboot every few days? I tried that once by suspending the climateprediction project and waiting for the threads to stop and then rebooting, but then upon restarting BOINC I received processing errors for the climateprediction projects that were underway and suspended prior to the reboot. I don't want to lose work unnecessarily, is there a way to gracefully pause computation for a reboot? Or even better, is there something I can do differently to prevent memory issues to begin with?

Thanks in advance...




©2024 climateprediction.net