climateprediction.net home page
App not removing files for completed tasks on opensuse linux

App not removing files for completed tasks on opensuse linux

Message boards : Number crunching : App not removing files for completed tasks on opensuse linux
Message board moderation

To post messages, you must log in.

AuthorMessage
tckosvic

Send message
Joined: 22 Oct 20
Posts: 5
Credit: 709,266
RAC: 3,188
Message 63966 - Posted: 16 May 2021, 16:56:18 UTC

I have been running climateprediction.net on my opensuse linux system for several months using boinc. After getting disk usage warnings for my root directory, I traced that /boinc/projects/climateprediction directory was 62 gig of my root partition. It appears that none of the completed task files had been removed. These files date back several months.

I think files for the task should be deleted after task is completed and results have been sent. If not done automatically, instructions should be provided to do same. If my system is not deleting files properly as it should, I need to diagnose that and fix it. I could use some help with this.

I deleted all the files in this directory and removed myself from the project until I hear an answer to my query.

As an old cfd guy, I am interested in this project but I can't jeopardize other operations by filling up my disk.

thanks, tom kosvic
ID: 63966 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2003
Credit: 52,771,797
RAC: 44,005
Message 63967 - Posted: 16 May 2021, 17:42:53 UTC - in response to Message 63966.  

Since your computers are hidden, we can't see anything in the stderr from the tasks (completed successfully or errored).

The behavior you describe of not cleaning up tasks after completion (successful or not) is not normal behavior. Occasionally, certain types of errors may result in a task directory not being deleted, but I've never seen anything on such a scale like what you are describing, and certainly not from the current and recent batches of tasks.

In some ways it sounds like a permissions problem on the boinc directory/sub-directories where the boinc service, if you are running it as a service, does not have permission to remove directories. It's hard to imagine how that would occur however.
ID: 63967 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3067
Credit: 7,156,299
RAC: 11,435
Message 63968 - Posted: 16 May 2021, 18:00:45 UTC - in response to Message 63966.  

On Ubuntu and other distributions I have used over the years, the only time files haven't been cleaned up is sometimes after tasks have crashed. In Ubuntu, to delete them I would go to /var/lib/boinc-client/projects/climateprediction/ and then delete the individual task directories. This needs to be done either as root or using su.

If not there the directory structure once you get to boinc-client will be the same.
ID: 63968 · Report as offensive     Reply Quote
tckosvic

Send message
Joined: 22 Oct 20
Posts: 5
Credit: 709,266
RAC: 3,188
Message 63969 - Posted: 16 May 2021, 18:02:51 UTC - in response to Message 63967.  
Last modified: 16 May 2021, 18:03:30 UTC

Note: I have been running 9 tasks simultaneously on 9 cpus. Don't know if that could be a factor.

Checked permissions on /boinc/climateprediction directory. Directory permission are:

drwxrwx--x this is the same as the other boinc project directories.

For the boinc client, permissions are:

-rwxr-xr-x the client is used on other projects which do not demonstrate this problem.

I will restart climate prediction.net and run only one task and observe what goes on.

If anyone has any ideas, let me know.

thanks, tom kosvic
ID: 63969 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2003
Credit: 52,771,797
RAC: 44,005
Message 63970 - Posted: 16 May 2021, 18:07:41 UTC - in response to Message 63969.  

Go to the task page of a successfully completed task, and for an errored task, and copy the contents of stderr on those task pages into a reply here. I'm not sure if it will reveal anything, but it might.
ID: 63970 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3067
Credit: 7,156,299
RAC: 11,435
Message 63971 - Posted: 16 May 2021, 18:08:34 UTC - in response to Message 63969.  

Following on from George's post, did the tasks complete?
ID: 63971 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 429
Credit: 6,363,949
RAC: 16,450
Message 63972 - Posted: 16 May 2021, 19:34:19 UTC - in response to Message 63966.  

I have been running climateprediction.net on my opensuse linux system for several months using boinc. After getting disk usage warnings for my root directory, I traced that /boinc/projects/climateprediction directory was 62 gig of my root partition. It appears that none of the completed task files had been removed. These files date back several months.


I am running Red Hat Enterprise Linux release 8.3 (Ootpa) on my machine that has 16-cores: 8 real and 8 hyperthreaded. I allow boinc to use at most 8 cores and at most 4 cores at a time for ClimatePrediction tasks. It usually runs 4 ClimatPrediction tasks at a time My machine is normally up 24/7.

$ locate hadam4h | grep ".zip"
/var/lib/boinc/projects/climateprediction.net/hadam4h_10uf_209605_5_902_012078454.zip
/var/lib/boinc/projects/climateprediction.net/hadam4h_20iv_209405_5_903_012080138.zip
/var/lib/boinc/projects/climateprediction.net/hadam4h_21e1_209905_5_903_012081260.zip
/var/lib/boinc/projects/climateprediction.net/hadam4h_a08h_200611_4_852_011937190.zip
/var/lib/boinc/slots/0/hadam4h_10uf_209605_5_902_012078454.zip
/var/lib/boinc/slots/1/hadam4h_21e1_209905_5_903_012081260.zip
/var/lib/boinc/slots/10/hadam4h_20iv_209405_5_903_012080138.zip
/var/lib/boinc/slots/8/hadam4h_a08h_200611_4_852_011937190.zip

Computer 1511241
Computer information
IP address Show IP address
Domain name localhost.localdomain
Local Standard Time UTC -4 hours
Created 14 Nov 2020, 15:37:02 UTC
Total credit 2,222,187
Average credit 14,774.43
Cross project credit BOINCstats.com Free-DC
CPU type GenuineIntel
Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7]
Number of processors 16
Coprocessors ---
Virtualization None
Operating System Linux Red Hat Enterprise Linux
Red Hat Enterprise Linux 8.3 (Ootpa) [4.18.0-240.22.1.el8_3.x86_64|libc 2.28 (GNU libc)]
BOINC version 7.16.11
Memory 62.41 GB
Cache 16896 KB
Swap space 15.62 GB
Total disk space 117.21 GB
Free Disk Space 86.4 GB
ID: 63972 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7412
Credit: 23,446,854
RAC: 4
Message 63973 - Posted: 16 May 2021, 21:52:01 UTC - in response to Message 63966.  
Last modified: 16 May 2021, 22:46:43 UTC

62 gig

Sounds as though you're crashing lots of tasks.

And the program DOES clean up after each one is finished. But that part is at the end of the program, and if it crashes before it gets to that, then the remnants get left there.

Also, the N216 models like lots of L3 cache. We found early last year, that they run best with 4 Megs of L3.
ID: 63973 · Report as offensive     Reply Quote
tckosvic

Send message
Joined: 22 Oct 20
Posts: 5
Credit: 709,266
RAC: 3,188
Message 63974 - Posted: 18 May 2021, 0:28:20 UTC - in response to Message 63973.  
Last modified: 18 May 2021, 0:42:32 UTC

I am unable to diagnose the 62 gig in the /boinc/project/climateprediction directory as I deleted the contents to free up disk space. I have restarted and am only allowing 1 task to run. Bur, 9 tasks downloaded. 8 tasks are suspended; 1 running. It says 15d to complete the running task.

I am confused about knowing whether a task was successful or whether the 62 gig was failed runs. Only measure of success I see is an increase in points. I have .7M points on climateprediction.net. I have only been a member since about march. Some runs must have been successful or else successful runs were not properly deleted.
If these are not completing, why am I getting points?

What can I do to increase completion percentage? Looks like increase L3 cache is only thing?

I am running 1 task now and "disk" graphs in boinc show 21.49 gig of disk space; although 9 tasks did download ( 2+ gig per task?). Is that normal for these projects?

Also, can one task run on more than 1 processor to finish quicker? If so, how do I implement that?

thanks for info, tom kosvic
ID: 63974 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7412
Credit: 23,446,854
RAC: 4
Message 63975 - Posted: 18 May 2021, 5:39:06 UTC - in response to Message 63974.  

OK, forget the massive data part; it's history.

*********************

On your Account page:
3rd blue section down, labelled: Computing - 4th line down, the section labelled Tasks

This is a list of all the tasks/models that your computer has run. You can see the Successes & Fails here.
And for the fails, there is a list in each one about what happened.

*********************

Credits/points are awarded all through the processing.
They're based on the trickle_up files, that get returned at regular intervals.

But only a fully completed task is of any real use to the researchers.

*********************

To increase completion success, look at why the fails did fail, and fix the problem.

To increase the L3 cache size, get a processor with more cache. As a rough rule, AMD processors have more L3 than Intel processors.

*********************

Running one task across several processors was tested by the project years ago, and the science results were garbage.
It didn't even get to beta testing.
ID: 63975 · Report as offensive     Reply Quote
tckosvic

Send message
Joined: 22 Oct 20
Posts: 5
Credit: 709,266
RAC: 3,188
Message 63976 - Posted: 18 May 2021, 15:12:28 UTC

Les, my login page seems somewhat different than as you describe but I found "tasks". There are 4 pages of my tasks starting in march. Breakdown is approximately:

Error while downloading - 16
Error while computing - 17
Completed - 23.

The completed ones are skewed toward march/april when, I believe, I was using less processors, as I recall. There seems to be more "Error while computing" recently while using the 9 tasks/9 processors. The Error while downloading problems are recent and I have no speculation as to the cause. My computer is always on as is my internet.

I am not changing processors to increase L3. I have a medium to high end i7 intell cpu. I will see if L3 can be adjusted but I know nothing about this.

Currently I am running 3 tasks on 3 processors. Tasks are 17 days or so. I'll see if they complete.

Let me know if there are any adjustment I should make.

thanks for your insights, tom kosvic
ID: 63976 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 429
Credit: 6,363,949
RAC: 16,450
Message 63977 - Posted: 18 May 2021, 19:59:59 UTC - in response to Message 63976.  

I am not changing processors to increase L3. I have a medium to high end i7 intell cpu. I will see if L3 can be adjusted but I know nothing about this.

Currently I am running 3 tasks on 3 processors. Tasks are 17 days or so. I'll see if they complete.


I am running the following processor, that is a little different from i7. In particular, it has a rather large cache.
But in any case, you cannot adjust the L3 cache either in the BIOS or a configuration file. You do that when you place the order for the processor itself.
I run boinc on 8 of the 16 processors I have. At most four processors run ClimatePrediction tasks and currently four processors are running WorldCommunityGrid tasks. Chances are they will all complete correctly. These N216 CPDN tasks take about 8 days to complate on my machine.

Here is the distribution of CPDN task results:

State: All (84) · In progress (5) · Validation pending (0) · Validation inconclusive (0) · Valid (71) · Invalid (0) · Error (8)
Application: All (84) · OpenIFS (0) · UK Met Office Coupled Model Full Resolution Ocean (0) · UK Met Office HadAM4 at N144 resolution (0) · UK Met Office HadAM4 at N216 resolution (73) · UK Met Office HadCM3 short (10) · UK Met Office HadSM4 at N144 resolution (1) · Weather At Home 2 (wah2) (0) · Weather At Home 2 (wah2) (region independent) (0)

And here is what my machine is:

CPU type GenuineIntel
Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7]
Number of processors 16
Coprocessors ---
Virtualization None
Operating System Linux Red Hat Enterprise Linux
Red Hat Enterprise Linux 8.3 (Ootpa) [4.18.0-240.22.1.el8_3.x86_64|libc 2.28 (GNU libc)]
BOINC version 7.16.11
Memory 62.41 GB
Cache 16896 KB
Swap space 15.62 GB
Total disk space 117.21 GB
ID: 63977 · Report as offensive     Reply Quote
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 387
Credit: 17,711,932
RAC: 4,248
Message 63978 - Posted: 19 May 2021, 22:35:11 UTC - in response to Message 63976.  


I am not changing processors to increase L3. I have a medium to high end i7 intell cpu. I will see if L3 can be adjusted but I know nothing about this.

Let me know if there are any adjustment I should make.

thanks for your insights, tom kosvic


L3 cache is built in to the CPU and cannot be changed. On a i7 you probably have 256kb L1 cache, 2Mb L2 cache and 8Mb L3 cache depending on your chip. If you have Win10 open the Task Manager and then open the Performance tab. The CPU menu item will give you the data for the different cache levels.
ID: 63978 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3067
Credit: 7,156,299
RAC: 11,435
Message 63979 - Posted: 20 May 2021, 8:32:04 UTC

The Error while downloading problems are recent and I have no speculation as to the cause. My computer is always on as is my internet.

Most often download issues are a problem wth project servers. enabling http debug prior to requesting new work helps in diagnosing this problem but it is important to unenable it afterwards as keeping it enabled quickly fills the event log up with largely useless messages.
ID: 63979 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 108
Credit: 4,335,651
RAC: 11,230
Message 63980 - Posted: 23 May 2021, 12:15:13 UTC

The L3 cache may be a problem but how much RAM does he have? He is running nine tasks and somewhere along the line, we decided each task needs 3Gig's of RAM, plus some of the operating systems requirements. I could barely run three tasks in winter and now I am down to one on my twelve thread machine. Our outside temperature has reached 40c and I have to also manage the heat.
ID: 63980 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3067
Credit: 7,156,299
RAC: 11,435
Message 63981 - Posted: 23 May 2021, 13:44:00 UTC - in response to Message 63980.  

The L3 cache may be a problem but how much RAM does he have? He is running nine tasks and somewhere along the line, we decided each task needs 3Gig's of RAM, plus some of the operating systems requirements. I could barely run three tasks in winter and now I am down to one on my twelve thread machine. Our outside temperature has reached 40c and I have to also manage the heat.


Though before I lost video on my laptop it had managed quite well if slowly running tasks on all four cores with only 8GB RAM. (When I get a replacement for it I will get at least 4GB/core as going out to swap really slows things down. (Don't know how true that is with latest nvme ssd disks?))
ID: 63981 · Report as offensive     Reply Quote
tckosvic

Send message
Joined: 22 Oct 20
Posts: 5
Credit: 709,266
RAC: 3,188
Message 64026 - Posted: 3 Jun 2021, 2:12:16 UTC - in response to Message 63980.  

I have 32 gig of memory. That should be sufficient

tom kosvic
ID: 64026 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 3067
Credit: 7,156,299
RAC: 11,435
Message 64027 - Posted: 3 Jun 2021, 5:09:14 UTC - in response to Message 64026.  

I have 32 gig of memory. That should be sufficient

tom kosvic


It is enough for now but certainly won't be enough when/if OpenIFS tasks make it to the main site from testing. The last ones used over 6GB/task and peaked at over 9GB of disk space. On crashing they left behind a total of over a GB each of zips to be deleted.
ID: 64027 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7412
Credit: 23,446,854
RAC: 4
Message 64028 - Posted: 3 Jun 2021, 6:47:09 UTC - in response to Message 64026.  

I have 32 gig of memory. That should be sufficient

tom kosvic



Only if your computer is not constantly crashing tasks, and you're not cleaning up afterwards.
With your computers hidden, those of us here can't see what's happening, and so can't help the way we can when computers aren't hidden.
ID: 64028 · Report as offensive     Reply Quote

Message boards : Number crunching : App not removing files for completed tasks on opensuse linux

©2021 climateprediction.net