Effect of L2 cache size and FSB on models?

Author	Message
Steve Bergman Send message Joined: 5 Aug 08 Posts: 22 Credit: 501,217 RAC: 0	Message 34570 - Posted: 8 Aug 2008, 23:57:18 UTC First of all, hello. This is my first post here, and I\'ve just been running my first model for a couple of days. It\'s a HADCM. And I did reel a bit upon the realization of just how big a work unit actually is! But I understand why that must be. As tightly bound to that each element in the simulation is to all the other elements, its not like anything can be broken up into smaller pieces. The simulation must be able to quickly interrelate vast quantities of data for each time step. Which brings me to my actual question. To make it a little more concrete, let me say that I am running a Core 2 Duo E2140, and have just ordered a Core 2 Duo E7200. The E7200: 1. Runs a 58% higher clock, internally 2. Has 3 times the L2 cache. (3M vs 1M) 3. Runs a 33% faster FSB. (1066 vs 800) It seems to me that these large models probably access huge amounts of data for each time step. So the additional cache, memory bandwidth, and reduced memory latency should be pretty significant. Is that true? I have compared some of the values for those processors from the cpdn user machine pages, and it looks like those synthetic benchmarks show about a 55% higher integer performance and a 70% higher floating point for the E7200. (I\'m guessing that it is the floating point performance which dominates for cpdn.) However, I suspect that the benchmarks don\'t really concern themselves with the effects of memory bandwidth and latency. Hopefully, I will receive the new processor Monday and will be able to tell for sure. I\'m hoping that it will cut the 4.5 month ETA for this model down to about half that. Any thoughts would be welcome. Thanks, Steve Bergman ID: 34570 · Reply Quote

Steve Bergman Send message Joined: 5 Aug 08 Posts: 22 Credit: 501,217 RAC: 0	Message 34571 - Posted: 9 Aug 2008, 2:02:45 UTC - in response to Message 34570. Apologies for posting this thread which is nearly identical to another recent thread in the forum. I searched for L2 and FSB before I posted and found nothing. But when I do a search for L2 now, I see this thread... but not this one (which clearly has L2 in the subject and also in the body): http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=6220 ID: 34571 · Reply Quote

Iain Inglis Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317	Message 34576 - Posted: 9 Aug 2008, 12:56:43 UTC Steve, It might be worthwhile letting the model run for a few trickles (5-10) before doing the upgrade. You\'ll then be able to see whether the old numbers are stable, and whether the new numbers are significantly different. Some HADCM3 models slow down initially. Tell us what happens. Iain ID: 34576 · Reply Quote

Steve Bergman Send message Joined: 5 Aug 08 Posts: 22 Credit: 501,217 RAC: 0	Message 34580 - Posted: 9 Aug 2008, 22:02:51 UTC - in response to Message 34576. Last modified: 9 Aug 2008, 22:18:34 UTC Steve, It might be worthwhile letting the model run for a few trickles (5-10) before doing the upgrade. Thanks for the tip. Actually, since I posted my previous message, I noticed a boxed Q6600 quad core for only $74 more ($194 for boxed, $184 for OEM), so I ordered a boxed unit and will be sending back the E7200 when it arrives. (I can\'t wait!) The Q6600 has 2x4MB cache, a 1066MHz FSB and a 2.4GHz internal clock. Even though it uses a 65nm process as opposed to the E7200\'s 45 nm, it is remarkably efficient. TDP for the 4 core is only 95W, vs the E7200\'s 65 watts. My power supply and the rest of my machine are high efficiency, so I will be able to crunch 4 boinc projects on only 125W of total system power, or about 31W per core vs the 7200\'s 44W per core. Each core of the quad is clocked about 5% slower, but it has 33% more L2 cache per core, so each core should be comparable to one of the 7200\'s cores. -Steve ID: 34580 · Reply Quote

geophi Volunteer moderator Send message Joined: 7 Aug 04 Posts: 2169 Credit: 64,550,937 RAC: 6,170	Message 34581 - Posted: 10 Aug 2008, 0:08:37 UTC Last modified: 10 Aug 2008, 0:14:15 UTC On the subject of front side bus, I was able to run slab models on the equally clocked E6700 and E6750 (1066 vs. 1333 FSB). There was maybe a 1 to 2% speed enhancement for the 6750 (this was an experiment on the same task, so there shouldn\'t be any parameter speed differences). With a quad processor and/or different types of models, the FSB may be more important. On the search feature, the default only goes back 30 days. You have to use the Advanced Search to go back farther. Thankfully boinc server software now allows you to search for short terms like L2. In the past, it wouldn\'t give you any results unless the term was over 3 or 4 characters. I\'m sure someone may come along and refresh my foggy memory if I got that wrong. ;-) ID: 34581 · Reply Quote

Steve Bergman Send message Joined: 5 Aug 08 Posts: 22 Credit: 501,217 RAC: 0	Message 34582 - Posted: 10 Aug 2008, 7:46:48 UTC - in response to Message 34581. OK. That certainly explains the search mystery. I expected that L2 cache size might be more likely to affect model performance. And yet, in the other thread, someone reports that doubling the L2 cache from 512k to 1MB made a scant 5% difference in their testing. This has reminded me of this excellent Ars article from 2002 which explains processor caching and what kinds of things affect cache efficacy: http://arstechnica.com/articles/paedia/cpu/caching.ars/1 If the model is cycling through a large amount of data which does not get reused for a long time (demonstrates poor temporal locality) then shuffling the bits through the cache hierarchy is ineffective, and a bit of a waste. Then again, the results of one time step are referenced in the next. So it may be that if the L2 were large enough, a substantial speed up might be observed when it reached some critical size and the data had not been evicted from the cache by the time it needs it again. I suspect that the model cycles through quite a lot of data, though, before going back and referencing the results of the previous time step. More than 8MB, I\'d guess. Or, maybe it is something as simple as the time required by the processor to perform the floating point calculations dominating over the time required to retrieve the data to be processed, even though most of it has to come all the way from main memory. These models are not exactly the typical server bean counting app. ;-) ID: 34582 · Reply Quote

old_user81594 Send message Joined: 11 Jun 05 Posts: 67 Credit: 1,222,916 RAC: 0	Message 34583 - Posted: 10 Aug 2008, 11:22:52 UTC - in response to Message 34570. To make it a little more concrete, let me say that I am running a Core 2 Duo E2140, and have just ordered a Core 2 Duo E7200. The E7200: 1. Runs a 58% higher clock, internally 2. Has 3 times the L2 cache. (3M vs 1M) 3. Runs a 33% faster FSB. (1066 vs 800) Can your current mobo support 1066 FSB? If you have a \"pre-packaged\" Dell system (or similar - and I guess you have with an E2140 chip) then the extra capability of the E7200 CPU might not be fully exploited by the board, meaning it will run at 800FSB, dropping its Clock speed by a proportional amount. Worst case scenario is that your PC won\'t boot, as dropping a new CPU into an existing set-up can cause problems, requiring a full install. 1. Back up everything valuable (photos and software) 2. Make a note of what Programs you have installed 3. Download CPU-Z and find out exactly your mobo\'s make/model 4. Check it will support your proposed new CPU. Good luck Neil. Just last week, I upgraded from an E6600 to Q6600 and had no such issues ID: 34583 · Reply Quote

Steve Bergman Send message Joined: 5 Aug 08 Posts: 22 Credit: 501,217 RAC: 0	Message 34587 - Posted: 10 Aug 2008, 18:05:44 UTC - in response to Message 34583. Last modified: 10 Aug 2008, 18:11:44 UTC Can your current mobo support 1066 FSB? Hi Neil, Yes, it can handle 1066 but not 1333. That\'s why I was targeting the E7200, and then the Q6600. (The Q6700 is also 1066 but is only a little faster for significantly more money.) The machine in question was an ASUS bare-bones which I originally spec\'d low because it was just acting as an X thin client into one of my Red Hat servers. The E2140 was way overkill. Then the on-board NIC went out on my Athlon64 4000+ desktop and I decided it was time to just make the move to a nice Core 2, and this box was conveniently available. I\'m hoping my Q6600 shows up tomorrow (And the new memory; I spec\'d that low, too.) as I am anxious to see how it does. Looks like you are getting about 1.7 s/ts for CM3\'s on yours, which is comparable to what my single core 4000+ has done in some informal spot testing. So the overall package should have about 4x the crunching power of what I had before. (Truly, multi-core is the industry\'s response to the fact that they can\'t make individual cores faster like they used to... so they dumped the burden on the software guys.) I\'m just getting back into BOINC, after a long hiatus. With 4 cores, I\'ve selected CPDN, SETI, FightAIDS@Home and ConquerCancer@Home as satisfying projects in which to participate. And all for just (an expected) 125W for the full system (monitor off), which appeals to the green in me. :-) ID: 34587 · Reply Quote

old_user92639 Send message Joined: 13 Aug 05 Posts: 54 Credit: 117,227 RAC: 0	Message 34597 - Posted: 10 Aug 2008, 23:10:08 UTC :) windows vista [(32 bits) BIOS ] FSB 1333 Mhz ==> 1Go RAM DDR2 667Mhz Intel Pentium E2160 hardware monitor Mainboard Vendor: MSI Mainboard Model: MS-7267 Processor: 1 (ID = 0) Number of cores: 2 (max 2) Number of threads: 2 (max 2) Name: Intel Pentium E2160 Codename: Conroe Specification: Intel(R) Pentium(R) Dual CPU E2160 @ 1.80GHz Package: Socket 775 LGA (platform ID = 0h) CPUID: 6.F.D Extended CPUID: 6.F Core Stepping: M0 Technology: 65 nm Core Speed: 1795.7 MHz (9.0 x 199.5 MHz) Rated Bus speed: 798.1 MHz Stock frequency: 1800 MHz Instructions sets: MMX, SSE, SSE2, SSE3, SSSE3, EM64T L1 Data cache: 2 x 32 KBytes, 8-way set associative, 64-byte line size L1 Instruction cache: 2 x 32 KBytes, 8-way set associative, 64-byte line size L2 cache: 1024 KBytes, 4-way set associative, 64-byte line size FID/VID Control: yes FID range: 6.0x - 9.0x max VID: 1.325 v sympa ID: 34597 · Reply Quote