climateprediction.net home page
Benchmarks and other problems

Benchmarks and other problems

Questions and Answers : Unix/Linux : Benchmarks and other problems
Message board moderation

To post messages, you must log in.

AuthorMessage
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 382
Credit: 3,690,501
RAC: 0
Message 60379 - Posted: 20 Jun 2019, 12:19:49 UTC - in response to Message 60376.  

It will be nice to see some work done by the Linux boxes that are currently trashing everything they get because of missing 32bit libs.:)


Are they sending work-units to Linux boxes that are trashing everything?

I have a Linux box that has the 32-bit compatibility libraries but I have received almost no work units in about a year except a coupla retreads. Then day before yesterday, I got the current four work units that are crunching away. One has generated two trickles and the other three have generated one trickle each.

It seems to me that if they were sending out 32-bit work units to boxen missing necessary libraries, that I would have been getting some of them too. But I have not.

Wed 19 Jun 2019 03:27:26 PM EDT | Finished upload of hadam4_a027_200610_12_825_011882434_0_r411654165_1.zip
Wed 19 Jun 2019 04:11:36 PM EDT | Sending scheduler request: To send trickle-up message.
ID: 60379 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2790
Credit: 3,659,580
RAC: 11,331
Message 60380 - Posted: 20 Jun 2019, 12:33:36 UTC

It seems to me that if they were sending out 32-bit work units to boxen missing necessary libraries, that I would have been getting some of them too. But I have not.


Batch 825 550 (HADAM4) tasks for linux has the statistics shown. Of the 38% hard fails, i.e. failed on all three attempts, each one has either 2 or three fails because of missing 32bit libs.


Success: 0 (0%)
Fails: 208 (38%)
Hard Fail: 37 (7%)
Running: 513 (93%)
Unsent: 0 (0%)
ID: 60380 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 382
Credit: 3,690,501
RAC: 0
Message 60381 - Posted: 20 Jun 2019, 14:38:25 UTC - in response to Message 60380.  

I failed one of these on work unit 21490395. Not because of missing libraries, but because my machine crashed, and it was the version that could not tolerate machine restarts.

UK Met Office HadAM4 at N144 resolution v8.08
i686-pc-linux-gnu
stderr out

<core_client_version>7.2.33</core_client_version>
<![CDATA[
<message>
process exited with code 22 (0x16, -234)
</message>
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...

Model crashed: READDUMP: BAD BUFFIN OF DATA tmp/xnnuj.pipe_dummy

Model crashed: READDUMP: BAD BUFFIN OF DATA tmp/xnnuj.pipe_dummy

Model crashed: READDUMP: BAD BUFFIN OF DATA tmp/xnnuj.pipe_dummy

Model crashed: READDUMP: BAD BUFFIN OF DATA tmp/xnnuj.pipe_dummy

Model crashed: READDUMP: BAD BUFFIN OF DATA tmp/xnnuj.pipe_dummy

Model crashed: READDUMP: BAD BUFFIN OF DATA tmp/xnnuj.pipe_dummy
Sorry, too many model crashes! :-(
13:45:02 (3083): called boinc_finish(22)

</stderr_txt>
]]>

This problem seems to have been fixed with the UK Met Office HadAM4 at N144 resolution v8.09 version of the software. I have four of those running and they are uploading and trickling OK.

They seem to be running about twice as fast as the expected completion time. It was expected that they would take about 1050 hours to complete, but one is at 18.455% complete after running 72.5 hours.
ID: 60381 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2790
Credit: 3,659,580
RAC: 11,331
Message 60382 - Posted: 20 Jun 2019, 14:53:19 UTC

[quote]I failed one of these on work unit 21490395.[/quote

And that one which failed 3 times, one of its three was a lack of 32bit libs.
ID: 60382 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 1942
Credit: 41,861,826
RAC: 19,854
Message 60383 - Posted: 20 Jun 2019, 15:25:05 UTC - in response to Message 60381.  

They seem to be running about twice as fast as the expected completion time. It was expected that they would take about 1050 hours to complete, but one is at 18.455% complete after running 72.5 hours.

The initial estimate for time to completion is partially based on the boinc floating point benchmark. For some reason the 7.2.33 version that comes with Redhat 6 type installations has unrealistically low benchmarks. Some thing for my Phenom II 945 on CentOS 6. 7.2.33 gives a FP benchmark of about 1600 whereas the later versions of boinc on Ubuntu have about 3000 for the same CPU.
ID: 60383 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 382
Credit: 3,690,501
RAC: 0
Message 60384 - Posted: 20 Jun 2019, 16:21:20 UTC - in response to Message 60383.  

The initial estimate for time to completion is partially based on the boinc floating point benchmark. For some reason the 7.2.33 version that comes with Redhat 6 type installations has unrealistically low benchmarks.


If that benchmark is still the widely-used "whetstone" benchmark, it is possibly the worst benchmark there could be for computing floating-point operations.

It is based on the statistics obtained by running tests on "typical" programs written in Algol 60 in an interpreter (not a compiler). The interpreter was used because it was easy to implement necessary timings into the code automatically.

There are about a dozen loops in the program, each executed 10,000 times if I remember correctly. Each one does either some simple calculations, or calls subroutines. The one whose subroutine does floating point operations is in the benchmark to evaluate the cost of function calls, not loop overhead, not floating point operations. It just happens to do some floating point operations.

I was involved in the writing of an optimizer for the C compiler at Bell Labs in the early 1980s. One of the optimizations we used was to expand called functions inline when it made sense. So it defeated the measurement of the call and return operations that was the original purpose of that module. But then my loop invariant code motion optimization moved all those operations out of the loop, since they did not change from one iteration to another. Then a live-dead analysis eliminated all the floating point operations remaining because because the results were not used. These optimizations resulted in an enormous speed-up of our execution of that benchmark. Since these optimizations were common by 1990, they are probably in almost all compilers by now. So whatever that benchmark may have measured in 1965, it does not measure them today.
ID: 60384 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7261
Credit: 23,235,750
RAC: 5,211
Message 60387 - Posted: 20 Jun 2019, 22:21:08 UTC
Last modified: 20 Jun 2019, 22:31:00 UTC

I've moved all of the preceding posts out of the thread intended to discuss the new OpenIFS models.
ID: 60387 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7261
Credit: 23,235,750
RAC: 5,211
Message 60388 - Posted: 20 Jun 2019, 22:47:04 UTC

Are they sending work-units to Linux boxes that are trashing everything?

Projects don't send work, it's requested by computers connected to it.


*******************

It seems to me that if they were sending out 32-bit work units to boxen missing necessary libraries, that I would have been getting some of them too. But I have not.

Your computer got 4 of the latest version of that model on 17 Jun 2019, 12:21:33 UTC. This was 3 days before you posted.

******************

One of the reasons for your low benchmarks, is that your computer is horribily slow compared to the latest computers.

And if you ever expect to get anywhere with the new type of model when it's finally release, (will we EVER get to that point? ), then people should be looking at processor times in the 3 GHz area.
With a lot more memory than you have.

All of which will be posted about when the time comes.
ID: 60388 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 382
Credit: 3,690,501
RAC: 0
Message 60389 - Posted: 21 Jun 2019, 0:24:25 UTC - in response to Message 60387.  

A very good idea.
ID: 60389 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2790
Credit: 3,659,580
RAC: 11,331
Message 60390 - Posted: 21 Jun 2019, 7:03:42 UTC

I've moved all of the preceding posts out of the thread intended to discuss the new OpenIFS models.


Thanks Les, I was thinking about moving some of it myself.
ID: 60390 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 382
Credit: 3,690,501
RAC: 0
Message 60394 - Posted: 21 Jun 2019, 17:10:08 UTC - in response to Message 60388.  

And if you ever expect to get anywhere with the new type of model when it's finally release, (will we EVER get to that point? ), then people should be looking at processor times in the 3 GHz area.
With a lot more memory than you have.


When I got the machine, 1.8 GHz was not all that slow, but 3 GHz is less than double the speed of mine. It is a 64-bit Intel Xeon.

Is 16 GBytes of RAM all that small? (I just doubled it from the 8 GBytes that came with the machine.) To add more RAM with this motherboard, I would need to add the second processor. I could easily put a second processor in that mother board, but it would be the same speed, though I could then handle 8 processes at once instead of only four. And I could double the RAM at the same time. I do not think any one process could get over 16 Gbytes of RAM in my setup. At some point (when the money tree blooms), getting a new machine is probably the way to go.

Time Sent (UTC) Host ID Result ID
Result Name Phase Timestep CPU Time (sec)
Average (sec/TS)
20 Jun 2019 08:17:32 1256552 21718833 hadam4_a0c5_201310_12_825_011882792_0 1 8,741 224,334
25.6646
ID: 60394 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 140
Credit: 55,519,048
RAC: 14,801
Message 60395 - Posted: 21 Jun 2019, 19:26:00 UTC

Jean-David -

I think you are going OK with your setup.

I am currently running the N144 tasks on four machines

1) AMD Phenom II X4 945 3.6 GHz 23.5 sec/TS
2) AMD Phenom II X4 945 3.6 GHz 24.5 sec/TS
3) AMD FX 8370 Eight Core 4.0 GHz 18-5-20.5 sec/TS
4) AMD FX 8370 Eight Core 4.0 GHz 18-5-20.5 sec/TS

I don't believe your have a "tortoise" for a machine since you are reporting 25 sec/TS.

As far a memory goes, these tasks seem to be using about 650 MB each. Depending on what else you are doing on your machine, you may or may not have enough memory.

Look at your memory usage (% of total). If it is over 85% when running these tasks, I would strongly consider more memory if you are using the computer for anything else.

There are many other factors that determine throughput (i.e. memory speed).
ID: 60395 · Report as offensive     Reply Quote
Alex Plantema

Send message
Joined: 3 Sep 04
Posts: 122
Credit: 26,128,640
RAC: 0
Message 60396 - Posted: 21 Jun 2019, 21:06:41 UTC

Xeons are faster than other processors with the same clock frequency, and 16 GB should be more than enough. The project is supposed to run on home computers.
ID: 60396 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : Benchmarks and other problems

©2020 climateprediction.net