Advanced CPU support

Author	Message
Jesse Viviano Send message Joined: 20 Dec 14 Posts: 23 Credit: 2,450,095 RAC: 296	Message 51266 - Posted: 18 Jan 2015, 19:05:45 UTC Last modified: 18 Jan 2015, 20:03:07 UTC I think that many crunchers would want support for more advanced CPUs. Since you already require SSE2 according to your system requirements page, many people would want support for more advanced CPU instruction sets to speed up the applications even more. 64-bit versions of applications: This can help in three areas. The first and more important area is that x86 in 32 bit modes are register starved, and AMD fixed this issue when designing AMD64. Its 32-bit mode was designed when memory ran at the same speed as the processor, so memory operations were cheap back then. They are quite expensive today because DRAM and most caches are slower than the CPU core. Therefore, one study on some Pentium Pro processors cited in one of my old college textbooks found that they spent over half of their time waiting for the memory subsystem when executing code. Having the additional registers added by the AMD64 architecture allows the core to stay busy doing more real work and spend less time waiting for the memory system, and can sometimes keep 64-bit capable NetBurst CPUs from entering the pathologically energy-wasting replay mode by keeping more data in the registers rather than only in the memory system where a failure to keep data in the level 1 cache will guarantee entry into replay mode. The second area is that programs can directly use more than 4 gibibytes of DRAM. The third area is that 64-bit integers are supported, which is probably worthless for this application. SSE3: This adds some flexibility to the 128-bit wide vector unit that might help maintain a higher consistent operation rate in some situations depending on the code, and therefore might or might not be helpful depending on your code. AVX: This doubles the width of the floating point vector unit to 256 bits as compared to the 128-bit SSE/SSE2/SSE3 instruction sets. The integer vector unit is not affected by this instruction set. FMA4: This instruction multiplies two numbers and keeps all of the bits of the product without rounding, adds a third number to the product, and then finally rounds the result, with the whole thing done in one cycle as an atomic operation. This instruction therefore doubles the peak floating point operations per second figure if an FMA operation is counted as two floating point operations. This instruction is required for AMD's Bulldozer processors to perform decently in floating point, because otherwise their floating point units are pathologically slow. FMA3: This instruction does the same thing as FMA4, but requires that one of the source variables is overwritten with the result. AMD's Piledriver processors and above support this instruction as well as FMA4, and require either of them to be used to perform floating point at an acceptable speed because all of the processors of the Bulldozer family have floating point units that are otherwise garbage. Intel's Haswell processors and above support this instruction. Poor coordination between AMD and Intel and the discovery by Intel that FMA4 would require an extensive rework of its vector unit generated this confusion between which FMA instruction should be supported. AVX2: This does two major things: it doubles the width of the integer vector unit to 256 bits as compared to the 128-bit SSE/SSE2/SSE3 instruction sets, and includes the FMA3 instruction. It is found in Intel's Haswell processors and above, and in AMD's Excavator and above. EDIT: Explain that Piledriver and above members of the Bulldozer family require either FMA4 or FMA3. ID: 51266 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 51267 - Posted: 18 Jan 2015, 19:36:21 UTC The UK Met Office only supplies 32 bit applications. The other items that you mention may be constrained by the available compilers. And the OS of choice for professional climatologists is Unix/Linux. But it's all definitely constrained by the funding allocated to the OeRC for this work. To say nothing about needing backwards compatibility with the vast number of older computers being used. ID: 51267 · Reply Quote

Jesse Viviano Send message Joined: 20 Dec 14 Posts: 23 Credit: 2,450,095 RAC: 296	Message 51269 - Posted: 18 Jan 2015, 20:12:20 UTC - in response to Message 51267. Last modified: 18 Jan 2015, 20:13:05 UTC BOINC does supply a mechanism to supply different applications to different computers as seen in http://boinc.berkeley.edu/trac/wiki/AppPlanSpec, so supplying the best binary to each computer is just a configuration job for the server. Therefore, each version of the application will not need to worry about backwards compatibility because the scheduler will assign the best version of each application to each computer if it is configured correctly. However, I will acknowledge the difficulty and expense of developing different versions of each application for different computers. This is why I posted this in the wish list and do not expect it to be granted unless some major donor shows up or the OeRC budget is increased. Even if one of the two happens, my proposal would have to compete against every other item in the wish list. ID: 51269 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4345 Credit: 16,523,697 RAC: 5,963	Message 51379 - Posted: 9 Feb 2015, 16:33:40 UTC - in response to Message 51269. A more likely route than supplying binaries for each model type is as is happening to some extent already, some model types only being for one OS. While it provides less choice for us crunchers it does simplify things for the developers. Just about any recent CPU can cope with all the model types on offer at the moment. However there have been a number of batches of work which have worked much better on one platform than another. That said, I would love to have tasks for 64bit linux as that would obviate the many serial killers around who do not have the 32bit libs required installed. ID: 51379 · Reply Quote