climateprediction.net home page
Dr Lisa Su says up to 192MB L3 on newer Ryzen -- hope it's true

Dr Lisa Su says up to 192MB L3 on newer Ryzen -- hope it's true

Message boards : Number crunching : Dr Lisa Su says up to 192MB L3 on newer Ryzen -- hope it's true
Message board moderation

To post messages, you must log in.

AuthorMessage
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 64021 - Posted: 2 Jun 2021, 6:54:43 UTC

There's been some discussion here about the high demand for L3 cache with many recent climate models.
If this leaked unverified "news" turns out true --
No links here, totally unconfirmed.
But good news, if/when it happens.

e
ID: 64021 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4341
Credit: 16,497,933
RAC: 6,477
Message 64022 - Posted: 2 Jun 2021, 7:11:19 UTC - in response to Message 64021.  

There's been some discussion here about the high demand for L3 cache with many recent climate models.
If this leaked unverified "news" turns out true --
No links here, totally unconfirmed.
But good news, if/when it happens.

e


Saw this on Tom's Hardware site. However having just upgraded to a Ryzen7 I suspect one of these will be beyond my price range when available.
ID: 64022 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 64023 - Posted: 2 Jun 2021, 14:51:48 UTC - in response to Message 64021.  

Well if Dr. Lisa Su says so, I would say that it is confirmed enough.
https://www.tomshardware.com/amp/news/amd-shows-new-3d-v-cache-ryzen-chiplets-up-to-192mb-of-l3-cache-per-chip-15-gaming-improvement

I was planning on a Ryzen 5900X towards the end of this year anyway. I wonder how much this will add to the cost?
There are a number of projects that could use more cache these days.
ID: 64023 · Report as offensive     Reply Quote
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 254
Credit: 31,632,053
RAC: 32,536
Message 64024 - Posted: 2 Jun 2021, 16:50:50 UTC

They've demonstrated it.

https://www.anandtech.com/show/16725/amd-demonstrates-stacked-vcache-technology-2-tbsec-for-15-gaming

And according to their presentation, they intend to put it into production.


AMD says that it has made great strides with the technology, and is set to put it into production with its highest-end processors by the end of the year. It wasn’t stated on what products it would be coming to, whether that was consumer or enterprise. Apropos of this, AMD has said that Zen 4 is set for launch in 2022.


Now, what that will be in, or what it will cost, is up in the air. But it certainly looks like something that's coming, and I agree, it's very exciting.

I've been messing around with some older Intel eDRAM chips for CPDN, and it seems to help, but I definitely can't run 8 threads of CPDN with very good performance, even with 128MB L4. My turnaround time on the N216s is up towards 2 months wall clock, which I've been told is fine, just... it takes me a while, since I don't run most of the workloads overnight (solar powered off grid office, using the surplus for BOINC).
ID: 64024 · Report as offensive     Reply Quote
mmonnin

Send message
Joined: 28 May 17
Posts: 49
Credit: 15,134,650
RAC: 23,861
Message 64025 - Posted: 3 Jun 2021, 0:53:14 UTC

Several die of near bleeding edge SRAM at 6mm/sq. It's not going to be cheap
ID: 64025 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1055
Credit: 16,516,801
RAC: 955
Message 64153 - Posted: 9 Jul 2021, 1:57:22 UTC - in response to Message 64024.  

I definitely can't run 8 threads of CPDN with very good performance, even with 128MB L4. My turnaround time on the N216s is up towards 2 months wall clock, which I've been told is fine,


My machine has about 64 Bytes of RAM and it runs N216 models, four at a time usually, at about 8 1/2 days each. I normally run 8 threads of BOINC, but Ilimit CPDN to 4.

Machine Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7]
Number of processors 16 [8 real, 8 hyperthreaded]
Red Hat Enterprise Linux 8.4 (Ootpa) [4.18.0-305.7.1.el8_4.x86_64|libc 2.28 (GNU libc)]
BOINC version 7.16.11
Memory 62.4 GB
Cache 16896 KB
ID: 64153 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 64999 - Posted: 24 Jan 2022, 21:11:31 UTC

Now, for the real kicker. The 5800X3D has a 96MB L3 cache (32+64) as compared to the 32MB cache found on the standard 5800X. That’s a whole 64MB more, putting it above even the Ryzen 9 5950X in terms of cache.
https://appuals.com/ryzen-7-5800x3d-is-the-first-ryzen-chip-to-use-the-3d-v-cache-tech-and-its-faster-than-the-core-i9-12900k/

Looks good to me. I would go for it if OpenIFS ever comes along, and needs a lot of cache.
But the glaciers may have receded by then anyway.
ID: 64999 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4341
Credit: 16,497,933
RAC: 6,477
Message 65001 - Posted: 25 Jan 2022, 7:10:45 UTC - in response to Message 64999.  

Looks good to me. I would go for it if OpenIFS ever comes along, and needs a lot of cache.
But the glaciers may have receded by then anyway.
I haven't noticed in testing the OpenIFS needing lots of cache, just lots of RAM though I suppose increasing the cache might reduce how often stuff (technical term) gets swapped out to RAM. Increasing the speed of RAM I guess could also make a significant difference.
ID: 65001 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1055
Credit: 16,516,801
RAC: 955
Message 65002 - Posted: 25 Jan 2022, 12:29:00 UTC - in response to Message 65001.  

I haven't noticed in testing the OpenIFS needing lots of cache, just lots of RAM though I suppose increasing the cache might reduce how often stuff (technical term) gets swapped out to RAM. Increasing the speed of RAM I guess could also make a significant difference.


How do you measure your cache consumption? Here is what my machine looks like:
Computer 1511241

CPU type 	GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7]
Number of processors 	16

Operating System 	Linux Red Hat Enterprise Linux
Red Hat Enterprise Linux 8.5 (Ootpa) [4.18.0-348.12.2.el8_5.x86_64|libc 2.28 (GNU libc)]
BOINC version 	7.16.11
Memory 	62.4 GB
Cache 	16896 KB


At the moment, I am running the following BOINC processes (and not much else):
# ps -fu boinc
UID          PID    PPID  C STIME TTY          TIME CMD
boinc      19484       1  0 Jan23 ?        00:02:19 /usr/bin/boinc
boinc      45446   19484  0 Jan23 ?        00:01:29 ../../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-gnu hadam4h_20l6_209402_4
boinc      45448   19484  0 Jan23 ?        00:01:33 ../../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-gnu hadam4h_h1av_200602_4
boinc      45453   19484  0 Jan23 ?        00:01:29 ../../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-gnu hadam4h_h06e_201108_4
boinc      45457   45446 97 Jan23 ?        1-07:57:44 /var/lib/boinc/projects/climateprediction.net/hadam4_um_8.52_i686-pc-linux-gnu 175955 
boinc      45473   45448 96 Jan23 ?        1-07:49:05 /var/lib/boinc/projects/climateprediction.net/hadam4_um_8.52_i686-pc-linux-gnu 181965 
boinc      45477   45453 95 Jan23 ?        1-07:21:02 /var/lib/boinc/projects/climateprediction.net/hadam4_um_8.52_i686-pc-linux-gnu 178635 
boinc     138843   19484 99 Jan24 ?        09:02:38 ../../projects/www.worldcommunitygrid.org/wcgrid_arp1_wrf_7.32_i686-pc-linux-gnu
boinc     168519   19484 99 06:20 ?        00:47:13 ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu -Sett
boinc     168580   19484 99 06:22 ?        00:45:23 ../../projects/www.worldcommunitygrid.org/wcgrid_opn1_autodock_7.21_x86_64-pc-linux-gnu 
boinc     170495   19484 99 06:51 ?        00:16:38 ../../projects/www.worldcommunitygrid.org/wcgrid_opn1_autodock_7.21_x86_64-pc-linux-gnu 
boinc     171173   19484 99 07:01 ?        00:06:28 ../../projects/universeathome.pl_universe/BHspin2_19_x86_64-pc-linux-gnu

And my cache is supplying about half the requested memory references:
# perf stat -aB -e cache-references,cache-misses
^C
 Performance counter stats for 'system wide':

    33,364,888,278      cache-references                                            
    17,805,920,648      cache-misses              #   53.367 % of all cache refs    

      64.185688537 seconds time elapsed


I suppose the instructions are mostly in the cache, and very little of the data are in there.

Increasing the speed of the RAM can help only to the extent that the processor(s) (including the associated chip set) can use the information; there could be some improvement there if you put slower RAM on your machine that it could use. But who does that? Only other way to speed up the RAM is replacement of the whole computer. Are you confusing the speed between the cache and the RAM with the speed between RAM and the swap space? If you are using a lot of swap space, you certainly do need more RAM for your task load.
ID: 65002 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4341
Credit: 16,497,933
RAC: 6,477
Message 65003 - Posted: 25 Jan 2022, 13:05:40 UTC
Last modified: 25 Jan 2022, 13:06:16 UTC

How do you measure your cache consumption?

Normally just use the free command. I have to look it up any time I want to look at what applications are using cache because I don't use it often enough to remember it.

I think I had to actually install something to be able to see that.
ID: 65003 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1055
Credit: 16,516,801
RAC: 955
Message 65004 - Posted: 25 Jan 2022, 16:48:00 UTC - in response to Message 65003.  

How do you measure your cache consumption?

Normally just use the free command. I have to look it up any time I want to look at what applications are using cache because I don't use it often enough to remember it.


The free command tells you nothing about your processor cache; it gives you the use of the RAM in the first line and the amount of swap space used in the second line. So below I have 55.9 GBytes of RAM available although most of it is currently used as an input data cache (input from hard drives). As far as swap space is concerned, I seem to be using 47 megabytes of disk for that: negligible.
$ free
              total        used        free      shared  buff/cache   available
Mem:       65435804     8673136     1682200       99192    55080468    55932264
Swap:      16375804       47104    16328700


If you want to see your usage of the processor cache, you need the perf command as shown in my previous post.
ID: 65004 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 65006 - Posted: 25 Jan 2022, 20:53:44 UTC - in response to Message 65002.  
Last modified: 25 Jan 2022, 21:01:36 UTC

And my cache is supplying about half the requested memory references:
# perf stat -aB -e cache-references,cache-misses
^C
 Performance counter stats for 'system wide':

    33,364,888,278      cache-references                                            
    17,805,920,648      cache-misses              #   53.367 % of all cache refs    

      64.185688537 seconds time elapsed


I suppose the instructions are mostly in the cache, and very little of the data are in there.

Thanks, I have been trying to measure my cache in Ubuntu.
I was not able to get that command to fly on my Ryzen 3600 with Ubuntu 20.04.3, but that is not my concern. I probably could with some work.

I have been using the "cachestat" command, but am not quite sure how to interpret the results.
When running the HadAM4 (N216) on 8 cores, plus two Rosetta pythons on 2 cores (85% of the cores), I see:

$ sudo ./cachestat

Counting cache functions... Output every 1 seconds.
    HITS   MISSES  DIRTIES    RATIO   BUFFERS_MB   CACHE_MB
   18658        0       45   100.0%          139      36237
   57334        0       43   100.0%          139      36237
   30930        0       26   100.0%          139      36237
   21124        0       31   100.0%          139      36237
   92343        0      108   100.0%          139      36237
   26557        0       75   100.0%          139      36237
   25485        0       26   100.0%          139      36237
   97719        2       75   100.0%          139      36237
   21042        0       25   100.0%          139      36237
   38118        0       60   100.0%          139      36237
   46525        0       29   100.0%          139      36237
   25127        0       44   100.0%          139      36237
   98529        0       64   100.0%          139      36237
   25745        1       15   100.0%          139      36237
   23106        0       66   100.0%          139      36237
   92583        0       72   100.0%          139      36237
    8580        0       50   100.0%          139      36237
   38967        0       55   100.0%          139      36237
   64163        0       43   100.0%          139      36237
   25698        0       29   100.0%          139      36237
   86728        0       61   100.0%          139      36237
   24077        0       44   100.0%          139      36237
   21742        0       17   100.0%          139      36237
   77441        0       63   100.0%          139      36237
   26411        0       38   100.0%          139      36237
   18575        0       24   100.0%          139      36237
   85779        0       60   100.0%          139      36237
   29630        0       34   100.0%          139      36237
   41840        0       45   100.0%          139      36237
   30779        0       81   100.0%          139      36238
   35510        0       44   100.0%          139      36238
   98186        0      109   100.0%          139      36238
   20706        0       17   100.0%          139      36238
   16524        0        9   100.0%          139      36238
   72171        0       54   100.0%          139      36238
    1469        0       13   100.0%          139      36238
   43854        0       66   100.0%          140      36238
   52263        0       39   100.0%          140      36238

My guess is that my cache hits are not really 100%, but probably more in line with what you see.

But if you want to try it, you can install it as follows:
To install perf-tools, open terminal and run:
sudo apt-get install linux-tools-common linux-tools-generic

Then, to install cachestat, run:
wget https://raw.githubusercontent.com/brendangregg/perf-tools/master/fs/cachestat

To make it executable, run:
chmod +x cachestat

Finally run it:
sudo ./cachestat

It probably is not measuring the CPU cache. I have a large write-cache (12.5 GB) in main memory (DDR4), and that may be what it is seeing.
ID: 65006 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1055
Credit: 16,516,801
RAC: 955
Message 65007 - Posted: 26 Jan 2022, 1:25:43 UTC - in response to Message 65006.  

It probably is not measuring the CPU cache. I have a large write-cache (12.5 GB) in main memory (DDR4), and that may be what it is seeing.


I do not know what you call a CPU cache. I infer you refer to the part of your RAM that is currently devoted to that purpose. In a normally running modern Linux, (almost) all RAM not used for something else is given over to the disk input cache. Anytime the kernel wants more RAM for a process, it can grab it from the disk input cache. If that is not enough, it can get it from the output buffer, but it would have to write it out first. And I suppose cachestat can tell you about that, but it is deprecated and not available for my distro. It seems to me that by the time you need a tool like that, you have long since passed the point where you seriously should increase the size of your RAM.

So it seems that you still need to find a version of perf that will run on your system.

# perf stat -aB -e cache-references,cache-misses
ID: 65007 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 65008 - Posted: 26 Jan 2022, 2:52:24 UTC - in response to Message 65007.  
Last modified: 26 Jan 2022, 2:52:47 UTC

I do not know what you call a CPU cache. I infer you refer to the part of your RAM that is currently devoted to that purpose.
No, it is the cache on the CPU itself. A Ryzen 3600 has
Total L1 Cache: 384KB
Total L2 Cache: 3MB
Total L3 Cache: 32MB
It is the L3 cache that distinguishes one CPU from another, and largely determines how many work units you should run at a time so that they fit mainly in the cache.
I usually run six of the N216 for that purpose, though running eight may give slightly more output. But beyond a certain point, the total output actually decreases.


In a normally running modern Linux, (almost) all RAM not used for something else is given over to the disk input cache. Anytime the kernel wants more RAM for a process, it can grab it from the disk input cache. If that is not enough, it can get it from the output buffer, but it would have to write it out first. And I suppose cachestat can tell you about that, but it is deprecated and not available for my distro. It seems to me that by the time you need a tool like that, you have long since passed the point where you seriously should increase the size of your RAM.
I have 64 GB on the Ryzen 3600, so however Linux handles it, that is more than enough.
It is the on-chip CPU cache that I need to monitor. Maybe perf can do it. I will look some more.
ID: 65008 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1055
Credit: 16,516,801
RAC: 955
Message 65009 - Posted: 26 Jan 2022, 4:54:25 UTC - in response to Message 65008.  

No, it is the cache on the CPU itself. A Ryzen 3600 has
Total L1 Cache: 384KB
Total L2 Cache: 3MB
Total L3 Cache: 32MB
It is the L3 cache that distinguishes one CPU from another, and largely determines how many work units you should run at a time so that they fit mainly in the cache.
I usually run six of the N216 for that purpose, though running eight may give slightly more output. But beyond a certain point, the total output actually decreases.


I do not doubt your processor is as you say. My (by comparison) little Intel Xeon W-2245 is like this:

Level 1 cache size  	8 x 32 KB 8-way set associative instruction caches
	  	 	8 x 32 KB 8-way set associative data caches
Level 2 cache size  	 8 x 1 MB 16-way set associative caches
Level 3 cache size	  16.5 MB

I suppose those L1 caches are one per (real) core and the L2 caches are one per core (real or hyperthreaded). So ideally, I would like the working set of instructions (the "inner loop" to fit into the L1 instruction cache or, lacking that, into the L2 cache. I wonder about my L3 cache size. Why is it not 16.384 MB? Why is it 16.384+0.512 MB?

If this web page correctly describes cachestat, it is concerned with paging disk pages into RAM, not paging regular RAM into the L1, L2, or L3 caches.

https://www.brendangregg.com/blog/2021-08-30/high-rate-of-paging.html
ID: 65009 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 24,510,982
RAC: 1,493
Message 65010 - Posted: 26 Jan 2022, 16:27:44 UTC - in response to Message 65006.  
Last modified: 26 Jan 2022, 16:28:28 UTC

In my case, simply installing
linux-tools-common
linux-tools-generic
which should link to the latest kernel tools did not work
using perf pointed to possible missing tool libraries, and looking at my current kernel number and available packages
I went to add
linux-tools-generic-hwe-20.04 which points to the latest kernel

Then ran perf as superuser and it showed this for my Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz

Performance counter stats for 'system wide':

    12,511,363,439      cache-references                                            
     6,135,922,943      cache-misses              #   49.043 % of all cache refs    

      73.725181985 seconds time elapsed


I run 1/2 of the cores = 4 CPDN WUs, RAM is 16Gb
ID: 65010 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1055
Credit: 16,516,801
RAC: 955
Message 65011 - Posted: 26 Jan 2022, 16:55:36 UTC - in response to Message 65010.  

Then ran perf as superuser and it showed this for my Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz

Performance counter stats for 'system wide':

12,511,363,439 cache-references
6,135,922,943 cache-misses # 49.043 % of all cache refs

73.725181985 seconds time elapsed



I run 1/2 of the cores = 4 CPDN WUs, RAM is 16Gb


Very much like what I get with four times the amount of RAM.
I, too, use half my cores for Boinc. Right now,
It is set up to run at most 4 CPDN work units, at most 5 WCG work units, and a few rosetta and universe work units.
$ ps -fu boinc
UID          PID    PPID  C STIME            TIME  CMD
boinc      19484       1  0 Jan23         00:04:09 /usr/bin/boinc
boinc      45446   19484  0 Jan23         00:01:59 ../../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-gnu hadam4h_20l6_209402_4
boinc      45448   19484  0 Jan23         00:02:56 ../../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-gnu hadam4h_h1av_200602_4
boinc      45453   19484  0 Jan23         00:02:01 ../../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-gnu hadam4h_h06e_201108_4
boinc      45457   45446 96 Jan23         2-11:00:37 /var/lib/boinc/projects/climateprediction.net/hadam4_um_8.52_i686-pc-linux-gnu 175955 
boinc      45473   45448 96 Jan23         2-11:25:40 /var/lib/boinc/projects/climateprediction.net/hadam4_um_8.52_i686-pc-linux-gnu 181965 
boinc      45477   45453 95 Jan23         2-10:18:10 /var/lib/boinc/projects/climateprediction.net/hadam4_um_8.52_i686-pc-linux-gnu 178635 
boinc     235585   19484 98 02:40         08:53:06 ../../projects/www.worldcommunitygrid.org/wcgrid_arp1_wrf_7.32_i686-pc-linux-gnu
boinc     260823   19484 98 09:54         01:43:48 ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu -Sett
boinc     263368   19484 98 10:33         01:04:29 ../../projects/www.worldcommunitygrid.org/wcgrid_opn1_autodock_7.21_x86_64-pc-linux-gnu 
boinc     264875   19484 98 10:55         00:43:37 ../../projects/www.worldcommunitygrid.org/wcgrid_opn1_autodock_7.21_x86_64-pc-linux-gnu 
boinc     267032   19484 98 11:29         00:09:32 ../../projects/universeathome.pl_universe/BHspin2_19_x86_64-pc-linux-gnu


CPU type 	GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7]
Number of processors 	16
Operating System 	Red Hat Enterprise Linux 8.5 (Ootpa) [4.18.0-348.12.2.el8_5.x86_64|libc 2.28 (GNU libc)]
BOINC version 	7.16.11
Memory 	62.4 GB
Cache 	16896 KB


# perf stat -aB -e cache-references,cache-misses

 Performance counter stats for 'system wide':

    33,368,527,491      cache-references                                            
    18,222,615,823      cache-misses              #   54.610 % of all cache refs    

      59.656775576 seconds time elapsed

ID: 65011 · Report as offensive     Reply Quote

Message boards : Number crunching : Dr Lisa Su says up to 192MB L3 on newer Ryzen -- hope it's true

©2024 climateprediction.net