climateprediction.net home page
Effect of L2 cache size and FSB on models?

Effect of L2 cache size and FSB on models?

Message boards : Number crunching : Effect of L2 cache size and FSB on models?
Message board moderation

To post messages, you must log in.

AuthorMessage
Steve Bergman

Send message
Joined: 5 Aug 08
Posts: 22
Credit: 501,217
RAC: 0
Message 34570 - Posted: 8 Aug 2008, 23:57:18 UTC

First of all, hello. This is my first post here, and I\'ve just been running my first model for a couple of days. It\'s a HADCM. And I did reel a bit upon the realization of just how big a work unit actually is! But I understand why that must be. As tightly bound to that each element in the simulation is to all the other elements, its not like anything can be broken up into smaller pieces. The simulation must be able to quickly interrelate vast quantities of data for each time step. Which brings me to my actual question. To make it a little more concrete, let me say that I am running a Core 2 Duo E2140, and have just ordered a Core 2 Duo E7200.

The E7200:

1. Runs a 58% higher clock, internally
2. Has 3 times the L2 cache. (3M vs 1M)
3. Runs a 33% faster FSB. (1066 vs 800)

It seems to me that these large models probably access huge amounts of data for each time step. So the additional cache, memory bandwidth, and reduced memory latency should be pretty significant. Is that true?

I have compared some of the values for those processors from the cpdn user machine pages, and it looks like those synthetic benchmarks show about a 55% higher integer performance and a 70% higher floating point for the E7200. (I\'m guessing that it is the floating point performance which dominates for cpdn.) However, I suspect that the benchmarks don\'t really concern themselves with the effects of memory bandwidth and latency.

Hopefully, I will receive the new processor Monday and will be able to tell for sure. I\'m hoping that it will cut the 4.5 month ETA for this model down to about half that.

Any thoughts would be welcome.

Thanks,
Steve Bergman
ID: 34570 · Report as offensive     Reply Quote
Steve Bergman

Send message
Joined: 5 Aug 08
Posts: 22
Credit: 501,217
RAC: 0
Message 34571 - Posted: 9 Aug 2008, 2:02:45 UTC - in response to Message 34570.  

Apologies for posting this thread which is nearly identical to another recent thread in the forum. I searched for L2 and FSB before I posted and found nothing. But when I do a search for L2 now, I see this thread... but not this one (which clearly has L2 in the subject and also in the body):

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=6220
ID: 34571 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 34576 - Posted: 9 Aug 2008, 12:56:43 UTC

Steve,

It might be worthwhile letting the model run for a few trickles (5-10) before doing the upgrade. You\'ll then be able to see whether the old numbers are stable, and whether the new numbers are significantly different. Some HADCM3 models slow down initially.

Tell us what happens.

Iain
ID: 34576 · Report as offensive     Reply Quote
Steve Bergman

Send message
Joined: 5 Aug 08
Posts: 22
Credit: 501,217
RAC: 0
Message 34580 - Posted: 9 Aug 2008, 22:02:51 UTC - in response to Message 34576.  
Last modified: 9 Aug 2008, 22:18:34 UTC

Steve,

It might be worthwhile letting the model run for a few trickles (5-10) before doing the upgrade.


Thanks for the tip. Actually, since I posted my previous message, I noticed a boxed Q6600 quad core for only $74 more ($194 for boxed, $184 for OEM), so I ordered a boxed unit and will be sending back the E7200 when it arrives. (I can\'t wait!)

The Q6600 has 2x4MB cache, a 1066MHz FSB and a 2.4GHz internal clock. Even though it uses a 65nm process as opposed to the E7200\'s 45 nm, it is remarkably efficient. TDP for the 4 core is only 95W, vs the E7200\'s 65 watts. My power supply and the rest of my machine are high efficiency, so I will be able to crunch 4 boinc projects on only 125W of total system power, or about 31W per core vs the 7200\'s 44W per core. Each core of the quad is clocked about 5% slower, but it has 33% more L2 cache per core, so each core should be comparable to one of the 7200\'s cores.

-Steve
ID: 34580 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2169
Credit: 64,550,937
RAC: 6,170
Message 34581 - Posted: 10 Aug 2008, 0:08:37 UTC
Last modified: 10 Aug 2008, 0:14:15 UTC

On the subject of front side bus, I was able to run slab models on the equally clocked E6700 and E6750 (1066 vs. 1333 FSB). There was maybe a 1 to 2% speed enhancement for the 6750 (this was an experiment on the same task, so there shouldn\'t be any parameter speed differences). With a quad processor and/or different types of models, the FSB may be more important.

On the search feature, the default only goes back 30 days. You have to use the Advanced Search to go back farther. Thankfully boinc server software now allows you to search for short terms like L2. In the past, it wouldn\'t give you any results unless the term was over 3 or 4 characters. I\'m sure someone may come along and refresh my foggy memory if I got that wrong. ;-)
ID: 34581 · Report as offensive     Reply Quote
Steve Bergman

Send message
Joined: 5 Aug 08
Posts: 22
Credit: 501,217
RAC: 0
Message 34582 - Posted: 10 Aug 2008, 7:46:48 UTC - in response to Message 34581.  

OK. That certainly explains the search mystery.

I expected that L2 cache size might be more likely to affect model performance. And yet, in the other thread, someone reports that doubling the L2 cache from 512k to 1MB made a scant 5% difference in their testing.

This has reminded me of this excellent Ars article from 2002 which explains processor caching and what kinds of things affect cache efficacy:

http://arstechnica.com/articles/paedia/cpu/caching.ars/1

If the model is cycling through a large amount of data which does not get reused for a long time (demonstrates poor temporal locality) then shuffling the bits through the cache hierarchy is ineffective, and a bit of a waste. Then again, the results of one time step are referenced in the next. So it may be that if the L2 were large enough, a substantial speed up might be observed when it reached some critical size and the data had not been evicted from the cache by the time it needs it again. I suspect that the model cycles through quite *a lot* of data, though, before going back and referencing the results of the previous time step. More than 8MB, I\'d guess.

Or, maybe it is something as simple as the time required by the processor to perform the floating point calculations dominating over the time required to retrieve the data to be processed, even though most of it has to come all the way from main memory. These models are not exactly the typical server bean counting app. ;-)
ID: 34582 · Report as offensive     Reply Quote
Profile old_user81594

Send message
Joined: 11 Jun 05
Posts: 67
Credit: 1,222,916
RAC: 0
Message 34583 - Posted: 10 Aug 2008, 11:22:52 UTC - in response to Message 34570.  

To make it a little more concrete, let me say that I am running a Core 2 Duo E2140, and have just ordered a Core 2 Duo E7200.

The E7200:

1. Runs a 58% higher clock, internally
2. Has 3 times the L2 cache. (3M vs 1M)
3. Runs a 33% faster FSB. (1066 vs 800)



Can your current mobo support 1066 FSB? If you have a \"pre-packaged\" Dell system (or similar - and I guess you have with an E2140 chip) then the extra capability of the E7200 CPU might not be fully exploited by the board, meaning it will run at 800FSB, dropping its Clock speed by a proportional amount.

Worst case scenario is that your PC won\'t boot, as dropping a new CPU into an existing set-up can cause problems, requiring a full install.

1. Back up everything valuable (photos and software)
2. Make a note of what Programs you have installed
3. Download CPU-Z and find out exactly your mobo\'s make/model
4. Check it will support your proposed new CPU.

Good luck

Neil.

Just last week, I upgraded from an E6600 to Q6600 and had no such issues
ID: 34583 · Report as offensive     Reply Quote
Steve Bergman

Send message
Joined: 5 Aug 08
Posts: 22
Credit: 501,217
RAC: 0
Message 34587 - Posted: 10 Aug 2008, 18:05:44 UTC - in response to Message 34583.  
Last modified: 10 Aug 2008, 18:11:44 UTC

Can your current mobo support 1066 FSB?


Hi Neil,

Yes, it can handle 1066 but not 1333. That\'s why I was targeting the E7200, and then the Q6600. (The Q6700 is also 1066 but is only a little faster for significantly more money.) The machine in question was an ASUS bare-bones which I originally spec\'d low because it was just acting as an X thin client into one of my Red Hat servers. The E2140 was way overkill. Then the on-board NIC went out on my Athlon64 4000+ desktop and I decided it was time to just make the move to a nice Core 2, and this box was conveniently available.

I\'m hoping my Q6600 shows up tomorrow (And the new memory; I spec\'d that low, too.) as I am anxious to see how it does. Looks like you are getting about 1.7 s/ts for CM3\'s on yours, which is comparable to what my single core 4000+ has done in some informal spot testing. So the overall package should have about 4x the crunching power of what I had before. (Truly, multi-core is the industry\'s response to the fact that they can\'t make individual cores faster like they used to... so they dumped the burden on the software guys.)

I\'m just getting back into BOINC, after a long hiatus. With 4 cores, I\'ve selected CPDN, SETI, FightAIDS@Home and ConquerCancer@Home as satisfying projects in which to participate. And all for just (an expected) 125W for the full system (monitor off), which appeals to the green in me. :-)
ID: 34587 · Report as offensive     Reply Quote
old_user92639

Send message
Joined: 13 Aug 05
Posts: 54
Credit: 117,227
RAC: 0
Message 34597 - Posted: 10 Aug 2008, 23:10:08 UTC

:) windows vista [(32 bits) BIOS ]

FSB 1333 Mhz ==> 1Go RAM DDR2 667Mhz

Intel Pentium E2160 hardware monitor

Mainboard Vendor:	MSI
Mainboard Model:		MS-7267


Processor: 1 (ID = 0)
Number of cores:		2 (max 2)
Number of threads:	2 (max 2)
Name:			Intel Pentium E2160
Codename:		Conroe
Specification:		Intel(R) Pentium(R) Dual  CPU  E2160  @ 1.80GHz
Package:			Socket 775 LGA (platform ID = 0h)
CPUID:			6.F.D
Extended CPUID:		6.F
Core Stepping:		M0
Technology:		65 nm
Core Speed:		1795.7 MHz (9.0 x 199.5 MHz)
Rated Bus speed:		798.1 MHz
Stock frequency:		1800 MHz
Instructions sets:	MMX, SSE, SSE2, SSE3, SSSE3, EM64T
L1 Data cache:		2 x 32 KBytes, 8-way set associative, 64-byte line size
L1 Instruction cache:	2 x 32 KBytes, 8-way set associative, 64-byte line size
L2 cache:		1024 KBytes, 4-way set associative, 64-byte line size
FID/VID Control:		yes
FID range:		6.0x - 9.0x
max VID:			1.325 v


sympa
ID: 34597 · Report as offensive     Reply Quote

Message boards : Number crunching : Effect of L2 cache size and FSB on models?

©2024 climateprediction.net