climateprediction.net home page
Bad Performance

Bad Performance

Questions and Answers : Windows : Bad Performance
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 1834
Credit: 36,646,763
RAC: 12,228
Message 758 - Posted: 11 Aug 2004, 23:33:26 UTC - in response to Message 750.  

> although with "classic CPDN" that is "wall clock time" of 3.3 seconds per
> timestep, whereas BOINC is actual CPU time. So if your CPU was used a lot for
> things other than cpdn the old cpdn times are underestimating.
>
> I tried the -tpp7 switch today and didn't notice any big improvement. When I
> get time I will investigate other Fortran compilers. I haven't found any
> AMD64 compiler yet that works with the model; it's a very touchy million
> lines!
>
>
>
Carl,

I don't know how much it might apply to this situation...and whether the Fortran compiler has such switches, but something of an interesting read, with a link, on the Intel compiler and AMD64 here...

http://www.theinquirer.net/?article=14077

and here

http://www.theinquirer.net/?article=13821

Like I said, it may not apply, and even so there is a lot to read off the first articles link.
ID: 758 · Report as offensive     Reply Quote
Profile old_user83

Send message
Joined: 5 Aug 04
Posts: 30
Credit: 39,745
RAC: 0
Message 777 - Posted: 12 Aug 2004, 4:45:41 UTC - in response to Message 758.  

> Carl,
>
> I don't know how much it might apply to this situation...and whether the
> Fortran compiler has such switches, but something of an interesting read, with
> a link, on the Intel compiler and AMD64 here...
>
> http://www.theinquirer.net/?article=14077
>
> and here
>
> http://www.theinquirer.net/?article=13821
>
> Like I said, it may not apply, and even so there is a lot to read off the
> first articles link
There looks like some great information from the google link off http://www.theinquirer.net/?article=14077; a snippet:

**** Snippet Start ****
I started mucking around with a dissassembly of the Intel-specific binary and found one particular call (proc_init_N) that appeared to be performing this check. As far as I can tell, this call is supposed to verify that the CPU supports SSE and SSE2 and it checks the CPUID to ensure that its an Intel processor. I wrote a quick utility which I call iccOut, to go through a binary that has been compiled with this Intel-only flag and remove that check.

Once I ran the binary that was compiled with the Intel-specific flag (-QxN) through iccOut, it was able to run on the FX51. Much to my surprise, it ran fine and did not miscompare. On top of that, it got the same 22% performance boost that I saw on the Pentium4 with an actual Intel processor. This is very interesting to me, since it appears that in fact no Intel-specific optimization has been done if the AMD processor is also capable to taking advantage of these same optimizations. If I'm missing something, I'd love for someone to point it out for me. From the way it looks right now, it appears that Intel is simply "cheating" to make their processors look better against competitor's processors.
**** Snippet End ****

This sounds quite promising, as ultimately Intel might be considered hardware as opposed to software gurus. If we could recreate what this guy did (detailed in the google link), which doesn’t seem to be that difficult, then this might significantly reduce the amount of work required for FORTRAN compilers. Easy to test I would imagine with a few of the AMD CPDN BOINC gurus.

I’m also perplexed as to the benchmark figures, is there a ~22% performance improvement for Intel Pentium 4 processors due to FORTRAN compilation efficiencies as stated above or not? Or is it a little more complicated than that, and we’re not comparing eggs with eggs? Presumably something like CPFarmView would compare eggs with eggs?
<i> UK4CP @ www.uk4cp.co.uk (United Kingdom Group) Celeron 2.6GHz XP Pro SP1 768MB RAM<i>
ID: 777 · Report as offensive     Reply Quote
old_user73375

Send message
Joined: 4 May 05
Posts: 1
Credit: 41,980
RAC: 0
Message 12309 - Posted: 5 May 2005, 4:10:27 UTC - in response to Message 777.  

&gt; &gt; Carl,
&gt; &gt;
&gt; &gt; I don't know how much it might apply to this situation...and whether the
&gt; &gt; Fortran compiler has such switches, but something of an interesting read,
&gt; with
&gt; &gt; a link, on the Intel compiler and AMD64 here...
&gt; &gt;
&gt; &gt; http://www.theinquirer.net/?article=14077
&gt; &gt;
&gt; &gt; and here
&gt; &gt;
&gt; &gt; http://www.theinquirer.net/?article=13821
&gt; &gt;
&gt; &gt; Like I said, it may not apply, and even so there is a lot to read off
&gt; the
&gt; &gt; first articles link
&gt; There looks like some great information from the google link off
&gt; http://www.theinquirer.net/?article=14077; a snippet:
&gt;
&gt; **** Snippet Start ****
&gt; I started mucking around with a dissassembly of the Intel-specific binary and
&gt; found one particular call (proc_init_N) that appeared to be performing this
&gt; check. As far as I can tell, this call is supposed to verify that the CPU
&gt; supports SSE and SSE2 and it checks the CPUID to ensure that its an Intel
&gt; processor. I wrote a quick utility which I call iccOut, to go through a binary
&gt; that has been compiled with this Intel-only flag and remove that check.
&gt;
&gt; Once I ran the binary that was compiled with the Intel-specific flag (-QxN)
&gt; through iccOut, it was able to run on the FX51. Much to my surprise, it ran
&gt; fine and did not miscompare. On top of that, it got the same 22% performance
&gt; boost that I saw on the Pentium4 with an actual Intel processor. This is very
&gt; interesting to me, since it appears that in fact no Intel-specific
&gt; optimization has been done if the AMD processor is also capable to taking
&gt; advantage of these same optimizations. If I'm missing something, I'd love for
&gt; someone to point it out for me. From the way it looks right now, it appears
&gt; that Intel is simply "cheating" to make their processors look better against
&gt; competitor's processors.
&gt; **** Snippet End ****
&gt;

Greetings,

I think the foregoing snippet quite clearly identified the problem with CPDN code compiled with the Intel Fortran Compiler and running on the AMD64 architecture: the programming SHOULD HAVE TREATED THE AMD64 IDENTICALLY with the Pentium processors -- all support the SSE2 instructions. The current BOINC CPDN programming erroneously TURNS OFF SSE2 optimizations for the AMD processors.

Yet, in reading through this thread, I see alot of unnecessary speculation about the reasons for the BOINC performance problems on the AMD (e.g. maybe the AMD procs don't have P4 optimizations, etc. etc.) -- when at the very beginning of the thread, the programming error is very clearly identified! It has nothing to do with differences between the AMD and Intel processors -- the code turns off optimizations for the AMD when the optimizations would work fine!

Could I ask when the bug identified in the snippet above will be fixed so that the massive waste of processing capacity (maybe 25% of all AMD horsepower available to CPDN) can be stopped? Or has this already been addressed?

Contrablue


ID: 12309 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7008
Credit: 20,926,388
RAC: 5,087
Message 12314 - Posted: 5 May 2005, 8:03:30 UTC

This a very old thread. Carl did a fair bit of work on the models after the last message was posted, including getting the old legacy os Win98 ME to work. Which has recently been broken again.
The subject of 64bit processors has been discussed many times since then on nearly all the boards.
A search may give you something more recent.
Also check the community forum. There is a thread there hinting that a 64bit version is currently being tested on the Sulphur Cycle model, which is still in alpha testing.
And keep in mind that the compiler used back when this thread was created was one for 32bits, not 64. And the project had not even gone BOINC.

So, sometime this year, perhaps?
And I suppose that you DO know that there is only one programmer on this project?

Les


ID: 12314 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 1834
Credit: 36,646,763
RAC: 12,228
Message 12316 - Posted: 5 May 2005, 11:43:07 UTC - in response to Message 12314.  
Last modified: 5 May 2005, 12:18:35 UTC

&gt; And keep in mind that the compiler used back when this thread was created was
&gt; one for 32bits, not 64. And the project had not even gone BOINC.
&gt;
The discussion in this thread was on the BOINC version of CPDN, and the reason I brought up The Inquirer threads in the first place was because of the slowdown for AMD64 processors in BOINC CPDN vs. Classic. The posts were during, or just after Beta testing of the BOINC version. The interesting thing, was that with the above compiler, with the Intel P3 optimizations switch, the AMD64 procs were <b>much</b> faster than they were with the version compiled with the default switches, and faster than "Classic", but there were a couple problems in testing, and Carl didn't have time to sort them out.
&gt;
&gt; So, sometime this year, perhaps?
&gt; And I suppose that you DO know that there is only one programmer on this
&gt; project?
&gt;

Yes, the apparent big hope for a major speedup is the 64 bit compilation. This speeded up the Sulphur alpha experiment in 64 bit Linux a <b>huge</b> amount, but so far is unstable (along with all other Linux compilations of that alpha experiment). The problem, as Les correctly points out, is the <b>unbelievable overworking</b> of the lone IT person (Tolu). Tis really a shame as the much longer/more computationally intensive Sulphur and Coupled model experiments would benefit greatly from any more platform optimizations that they could get. As long as there is only one real IT person working on compiling the code (as well as everything else computer-related), I don't see how all the things that should be done in this area, can be done.
ID: 12316 · Report as offensive     Reply Quote
Previous · 1 · 2

Questions and Answers : Windows : Bad Performance

©2019 climateprediction.net