climateprediction.net home page
New work Discussion

New work Discussion

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 73 · 74 · 75 · 76 · 77 · 78 · 79 . . . 91 · Next

AuthorMessage
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 65126 - Posted: 9 Feb 2022, 4:12:55 UTC - in response to Message 65121.  

This new model type is still at the Alpha testing stage.
it could be months before they go live.


I wondered because one project, WCG, has a project known as Beta Testing that you can volunteer for and if you do, sometimes you get tasks that are not yet suitable for general use, but helps the projects by discovering problems not discovered by normal development and system tests. Or they may find everything is just fine and can release it to the general users. I even get credit for running those.

Now it may not make sense for volunteers to even run tasks that are only at the Alpha testing stage.

____________________

We have our Africa Rainfall Project. MCM, Mapping Cancer Markers, I think, do not clean up the RAM after finishing. Please check, I am suspicious about them.
ID: 65126 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1045
Credit: 16,506,818
RAC: 5,001
Message 65127 - Posted: 9 Feb 2022, 5:00:54 UTC - in response to Message 65126.  
Last modified: 9 Feb 2022, 5:04:30 UTC

We have our Africa Rainfall Project. MCM, Mapping Cancer Markers, I think, do not clean up the RAM after finishing. Please check, I am suspicious about them.


I run WCG ARP1 and MCM1 and they do clean up after finishing. While I allow Beta Testing work units, I have not received any in a long time.

Beta Testing Intermittent 0:060:21:02:33 147 83,416
ID: 65127 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 65129 - Posted: 9 Feb 2022, 18:25:44 UTC - in response to Message 65127.  

I run WCG ARP1 and MCM1 and they do clean up after finishing.

That is my experience too. I have been running them from the beginning with no problems.
Currently, I have one machine on each, but sometimes mix them with other projects.
ID: 65129 · Report as offensive
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 65134 - Posted: 10 Feb 2022, 13:27:24 UTC - in response to Message 65127.  

We have our Africa Rainfall Project. MCM, Mapping Cancer Markers, I think, do not clean up the RAM after finishing. Please check, I am suspicious about them.


I run WCG ARP1 and MCM1 and they do clean up after finishing. While I allow Beta Testing work units, I have not received any in a long time.

Beta Testing Intermittent 0:060:21:02:33 147 83,416

____________

You might be on Windows, while mine is on Linux. What happens is, if I run ARP after a re-boot the WU runs happily. If I run ARP after I have done a few MCM Wu's because of the queue, it after a while says, waiting for memory. Where has the memory gone? Anyway, I was running it on Linux because of CPDN. I will revert back to Windows.
ID: 65134 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1045
Credit: 16,506,818
RAC: 5,001
Message 65135 - Posted: 10 Feb 2022, 14:16:48 UTC - in response to Message 65134.  

You might be on Windows, while mine is on Linux. What happens is, if I run ARP after a re-boot the WU runs happily. If I run ARP after I have done a few MCM Wu's because of the queue, it after a while says, waiting for memory. Where has the memory gone? Anyway, I was running it on Linux because of CPDN. I will revert back to Windows.


Actually, I do most of my Boinc work on my Linux machine.

Computer 1511241
Computer information

Created 	14 Nov 2020, 15:37:02 UTC
Total credit 	5,443,321
Average credit 	8,696.18

CPU type 	GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7]
Number of processors 	16

Operating System 	Red Hat Enterprise Linux 8.5 (Ootpa) [4.18.0-348.12.2.el8_5.x86_64|libc 2.28 (GNU libc)]
BOINC version 	7.16.11
Memory 	62.4 GB
Cache 	16896 KB


Something must be strange in how you run your system. I have never run out of memory running Linux, and I have been running it since1998 in various versions and on a variety of machines. All versions have been Red Hat.

If you wish to run Windows, go ahead. But that does not solve the problem.
ID: 65135 · Report as offensive
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 65136 - Posted: 10 Feb 2022, 16:37:19 UTC - in response to Message 65135.  

You might be on Windows, while mine is on Linux. What happens is, if I run ARP after a re-boot the WU runs happily. If I run ARP after I have done a few MCM Wu's because of the queue, it after a while says, waiting for memory. Where has the memory gone? Anyway, I was running it on Linux because of CPDN. I will revert back to Windows.


Actually, I do most of my Boinc work on my Linux machine.

Computer 1511241
Computer information

Created 	14 Nov 2020, 15:37:02 UTC
Total credit 	5,443,321
Average credit 	8,696.18

CPU type 	GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7]
Number of processors 	16

Operating System 	Red Hat Enterprise Linux 8.5 (Ootpa) [4.18.0-348.12.2.el8_5.x86_64|libc 2.28 (GNU libc)]
BOINC version 	7.16.11
Memory 	62.4 GB
Cache 	16896 KB


Something must be strange in how you run your system. I have never run out of memory running Linux, and I have been running it since1998 in various versions and on a variety of machines. All versions have been Red Hat.

If you wish to run Windows, go ahead. But that does not solve the problem.


VM. 10GB RAM. Mint. My Windows is also running Boinc. I will just shut down the VM until there is more CPDN, Linux work. Then the Windows, BOINC can have the full 16GB.
ID: 65136 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,378,503
RAC: 3,632
Message 65138 - Posted: 10 Feb 2022, 19:02:17 UTC - in response to Message 65136.  

Rather than reboot, if you get this memory problem in Linux run

sync; echo 1 | sudo tee /proc/sys/vm/drop_caches 


This will free up memory not released when tasks end. (I use it mostly when Firefox is being tardy about releasing memory after I have had a lot of tabs open.)
ID: 65138 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1045
Credit: 16,506,818
RAC: 5,001
Message 65139 - Posted: 10 Feb 2022, 20:04:03 UTC - in response to Message 65138.  

Rather than reboot, if you get this memory problem in Linux run

sync; echo 1 | sudo tee /proc/sys/vm/drop_caches


This will free up memory not released when tasks end. (I use it mostly when Firefox is being tardy about releasing memory after I have had a lot of tabs open.)


I suppose that would work under some circumstances. Since I have never run out of RAM, even in those days when I had only 64 MegaBytes of it, or even less, I am certainly not an authority on dealing with this problem. I have run Netscape, Mozilla, and Firefox mainly as browsers. I can run Chromium, but I do not like it. So if I were go too far running too many memory hog processes at once, the performance would go to hell because of the memory thrashing to disk, but nothing breaks.

But consider that as you start needing more memory to start a process, the first thing the kernel will do is start using the CPU RAM input cache. If that is used up, it will page out the CPU RAM output buffer space. If yet more RAM is needed, it will start paging out some of the RAM still in use (on an LRU basis). Only if the swap space on disk is exhausted, will the kernel invoke a process killer so that vital processes can run. As I said earlier, this has never happened to me in 24 years or so running Linux.
ID: 65139 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1045
Credit: 16,506,818
RAC: 5,001
Message 65140 - Posted: 10 Feb 2022, 22:03:56 UTC - in response to Message 65138.  
Last modified: 10 Feb 2022, 22:07:30 UTC

Rather than reboot, if you get this memory problem in Linux run

sync; echo 1 | sudo tee /proc/sys/vm/drop_caches

This will free up memory not released when tasks end. (I use it mostly when Firefox is being tardy about releasing memory after I have had a lot of tabs open.)

It sure does work. Could the O.P. just have not enough RAM?

My machine before and after running sync, No CPDN work at the moment.
# free -hw
              total        used        free      shared     buffers       cache   available
Mem:           62Gi       5.3Gi       1.8Gi       130Mi       327Mi        54Gi        56Gi
Swap:          15Gi        70Mi        15Gi
# sync; echo 1 | sudo tee /proc/sys/vm/drop_caches
1
l# free -hw
              total        used        free      shared     buffers       cache   available
Mem:           62Gi       5.3Gi        55Gi       118Mi       0.0Ki       1.3Gi        56Gi
Swap:          15Gi        70Mi        15Gi

ps -fu boinc
UID          PID    C TIME CMD
boinc       2072     0 00:11:52 /usr/bin/boinc  [boinc-client]
boinc     519234    93 06:35:21 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu -beta -
boinc     520042    91 06:08:27 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu -beta -
boinc     522179    98 05:51:09 ../../projects/boinc.bakerlab.org_rosetta/rosetta_4.20_x86_64-pc-linux-gnu -beta -
boinc     528257    98 03:48:01 ../../projects/www.worldcommunitygrid.org/wcgrid_arp1_wrf_7.32_x86_64-pc-linux-gnu
boinc     536973    98 01:26:41 ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu
boinc     539412    98 00:50:44 ../../projects/www.worldcommunitygrid.org/wcgrid_opn1_autodock_7.21_x86_64-pc-linu
boinc     540515    99 00:32:54 ../../projects/universeathome.pl_universe/BHspin2_19_x86_64-pc-linux-gnu
boinc     540526    99 00:32:36 ../../projects/www.worldcommunitygrid.org/wcgrid_opn1_autodock_7.21_x86_64-pc-linu

ID: 65140 · Report as offensive
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1079
Credit: 6,904,878
RAC: 6,593
Message 65175 - Posted: 17 Feb 2022, 15:56:11 UTC

Three batch #926 models have now completed successfully on my very slow Mac, so anecdotally it looks like the switch by the project of that batch to Mac-only was a good decision.

The batch #927 Mac-only models are running fine so far (presumably they are the same set as batch #926), though a twice-failed model has just downloaded, so there is clearly the usual background rate of failures. No pattern that I can see yet.
ID: 65175 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1045
Credit: 16,506,818
RAC: 5,001
Message 65223 - Posted: 3 Mar 2022, 1:14:08 UTC - in response to Message 65193.  

I just upgraded yesterday to

boinc-manager-7.16.11-9.el8.x86_64
boinc-client-7.16.11-9.el8.x86_64

They were

boinc-manager-7.16.11-8.el8.x86_64
boinc-client-7.16.11-8.el8.x86_64

For several years before that. This is the latest version for my
Red Hat Enterprise Linux release 8.5 (Ootpa)
release of Linux.
ID: 65223 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 65224 - Posted: 3 Mar 2022, 2:01:27 UTC - in response to Message 65222.  

Don

There hasn't been any work for Windows machines for a long time now. It's all Linux.
And that has run out too, leaving only a few Mac tasks, that were originally part of a Linux/Mac batch.
ID: 65224 · Report as offensive
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 65228 - Posted: 4 Mar 2022, 11:05:18 UTC - in response to Message 65138.  

Rather than reboot, if you get this memory problem in Linux run

sync; echo 1 | sudo tee /proc/sys/vm/drop_caches 


This will free up memory not released when tasks end. (I use it mostly when Firefox is being tardy about releasing memory after I have had a lot of tabs open.)



Thank you, Dave. As it is, I am out of Linux WU's. Let us see if we get a new lot.
ID: 65228 · Report as offensive
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 65229 - Posted: 4 Mar 2022, 11:08:09 UTC - in response to Message 65139.  

Rather than reboot, if you get this memory problem in Linux run

sync; echo 1 | sudo tee /proc/sys/vm/drop_caches


This will free up memory not released when tasks end. (I use it mostly when Firefox is being tardy about releasing memory after I have had a lot of tabs open.)


I suppose that would work under some circumstances. Since I have never run out of RAM, even in those days when I had only 64 MegaBytes of it, or even less, I am certainly not an authority on dealing with this problem. I have run Netscape, Mozilla, and Firefox mainly as browsers. I can run Chromium, but I do not like it. So if I were go too far running too many memory hog processes at once, the performance would go to hell because of the memory thrashing to disk, but nothing breaks.

But consider that as you start needing more memory to start a process, the first thing the kernel will do is start using the CPU RAM input cache. If that is used up, it will page out the CPU RAM output buffer space. If yet more RAM is needed, it will start paging out some of the RAM still in use (on an LRU basis). Only if the swap space on disk is exhausted, will the kernel invoke a process killer so that vital processes can run. As I said earlier, this has never happened to me in 24 years or so running Linux.


The next machine of mine will be dedicated to Linux.
ID: 65229 · Report as offensive
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 233
Credit: 31,062,873
RAC: 30,045
Message 65232 - Posted: 4 Mar 2022, 17:15:23 UTC

Don't drop caches, all you're doing there is telling Linux to free up memory of pages it's read off disk, so it has to touch disk again.

Linux "using all the memory" is a feature, it will use any surplus memory to keep disk pages in RAM such that it can improve disk speed access, but it will drop those when more memory is needed before evicting other things out to swap. About the only time that's useful is when you're doing benchmarking runs that involve a lot of disk IO you want to test with the full normal system caching behavior, but want to clear it between runs.
ID: 65232 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,378,503
RAC: 3,632
Message 65318 - Posted: 2 Apr 2022, 7:34:13 UTC - in response to Message 65232.  

Don't drop caches, all you're doing there is telling Linux to free up memory of pages it's read off disk, so it has to touch disk again.


My experience is that flushing memory using this command makes the suspend either to disk or to RAM function much more reliable.
ID: 65318 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,378,503
RAC: 3,632
Message 65319 - Posted: 2 Apr 2022, 7:38:07 UTC

Following sorting out some problems with some of the input files over on testing site, there should be some windows tasks for NZ region in the next week or so. There are also some N144 Linux tasks in the pipeline which had some issues stopping them getting onto main site but not sure if those are resolved yet.
ID: 65319 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,378,503
RAC: 3,632
Message 65320 - Posted: 4 Apr 2022, 15:11:28 UTC - in response to Message 65319.  

There are also some N144 Linux tasks in the pipeline which had some issues stopping them getting onto main site but not sure if those are resolved yet.


Unless something unexpected is broken this submission has started but there are some files being generated by the submission script that are taking a while so seven hours after it started, nothing has appeared yet. My guess is that they will appear by tomorrow morning at the latest if nothing unexpected pops up. (Speaking ex cathedra from my belly button)
ID: 65320 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,378,503
RAC: 3,632
Message 65321 - Posted: 5 Apr 2022, 8:25:24 UTC - in response to Message 65320.  
Last modified: 5 Apr 2022, 10:50:05 UTC

Second batch also being prepared by no sign of the first appearing on the server yet. Waiting to see if this is a problem or the file perturbations taking a long time.

Tasks have started appearing on the server. Linux only for this lot unless you use WSL or a VM of some description. In three minutes time when my box updates after the backoff from last attempt to get work I will be able to confirm that they are downloading OK.

Edit: four tasks downloaded and running.
ID: 65321 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1045
Credit: 16,506,818
RAC: 5,001
Message 65323 - Posted: 5 Apr 2022, 13:19:51 UTC - in response to Message 65321.  

Tasks have started appearing on the server. Linux only for this lot unless you use WSL or a VM of some description. In three minutes time when my box updates after the backoff from last attempt to get work I will be able to confirm that they are downloading OK.


I got a task recently and it has over an hour of execution time on it; i.e., way more than the 30 seconds that the fast-crashers or recent memory used to do.

Task 22198935
Name 	hadam4_a11l_200010_13_928_012134123_0
Workunit 	12134123

ID: 65323 · Report as offensive
Previous · 1 . . . 73 · 74 · 75 · 76 · 77 · 78 · 79 . . . 91 · Next

Message boards : Number crunching : New work Discussion

©2024 climateprediction.net