climateprediction.net home page
Running Two Models Simultaneously on an HT Machine

Running Two Models Simultaneously on an HT Machine

Message boards : Number crunching : Running Two Models Simultaneously on an HT Machine
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Purple Rabbit
Avatar

Send message
Joined: 1 Sep 04
Posts: 23
Credit: 5,079,591
RAC: 2,560
Message 11194 - Posted: 20 Mar 2005, 23:24:50 UTC
Last modified: 20 Mar 2005, 23:52:28 UTC

My immediate experiences tell me to not to do this, but I may have other problems.

I just created a Linux computer out of parts (Host 129514). It's a 3.0 GHz Prescott P4 with 512MB Kingston memory. It has a Syntax MB (whatever that is, their home pages are in Chinese). I have downloaded 3 models so far.

As I ran the first model I got a 2.2 s/TS rate. This was with several other projects, but not CPDN. It bombed at 129624 with a 251 error.

The next two models I received were sequential. When they were running together I saw a 2.73 s/TS rate. Alone they were 2.2 s/TS. Model 1 bombed at 129514 TS (error 251 again). When the second model was alone it ran at 2.16 s/TS. This was when there were no other projects with work, but still hyper threaded.

I understand the interactions of memory and the board specifications. I haven't run mprime to test the system (yet).

My point is that 2 models running simultaneously yielded 2.73 s/TS. One alone on the same HT computer was 2.0 s/TS or better. This kind of tells me not to run 2 CPDN mdels together on the same HT machine.

OK, my examples bombed, but I think(?) they are valid up to that point.
ID: 11194 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2169
Credit: 64,555,907
RAC: 5,858
Message 11195 - Posted: 20 Mar 2005, 23:48:48 UTC

Well, if you are running two models each at 2.7 sec/TS then you are doing 2 x .37 TS/sec = .74 TS/sec

At 2 sec/TS, with one model, your PC is doing .5 TS/sec. Thus, in an given sec/min/hour/day you are obviously doing more work when running two models at once. However, it will take you longer to finish an individual run. But you will finish two runs faster with them running concurrently under HT than you will running them end to end, but only running one model at a time.

That is, of course, if they run stably to completion. It might be worth running several hours of memtest86+ to make sure your memory is stable

<a href="http://www.memtest.org/">http://www.memtest.org/</a>

and Prime95 for several hours to test CPU stability.

<a href="http://www.mersenne.org/freesoft.htm">http://www.mersenne.org/freesoft.htm</a>
ID: 11195 · Report as offensive     Reply Quote
Profile Purple Rabbit
Avatar

Send message
Joined: 1 Sep 04
Posts: 23
Credit: 5,079,591
RAC: 2,560
Message 11197 - Posted: 21 Mar 2005, 0:06:42 UTC - in response to Message 11195.  
Last modified: 21 Mar 2005, 1:33:33 UTC

Thanks geophi.

There just seemed a lot of discrepancy in the numbers. More testing is on my schedule.

I'll plead guilty to being ignorant. I thought I had a case. I've read all the through-put vs rate threads. This seemed to be different, but I guess not.
ID: 11197 · Report as offensive     Reply Quote
Profile Andrew Hingston
Volunteer moderator

Send message
Joined: 17 Aug 04
Posts: 753
Credit: 9,804,700
RAC: 0
Message 11226 - Posted: 21 Mar 2005, 9:45:54 UTC

A 3 GHz P4 running hyperthreaded under Linux is a good combination for CPDN, so I wish you luck. You should be able to return under 2 s/ts running singly, and the typical gain from running two models hyperthreaded rather than sequentially is of the order of 15% if I remember correctly. The ratio of 2.2 to 2.72 does not sound quite right, but it is difficult to make a direct comparison based on the figures calculated by the BOINC model itself.

I agree with Geophi that it is worth finding out what is going wrong. The number of possibilities is large, but then you would not have a sense of achievement if building your own machine were necessarily that easy ;)
ID: 11226 · Report as offensive     Reply Quote
Profile Purple Rabbit
Avatar

Send message
Joined: 1 Sep 04
Posts: 23
Credit: 5,079,591
RAC: 2,560
Message 11290 - Posted: 22 Mar 2005, 21:46:57 UTC - in response to Message 11226.  
Last modified: 22 Mar 2005, 22:41:28 UTC

&gt; I agree with Geophi that it is worth finding out what is going wrong. The
&gt; number of possibilities is large, but then you would not have a sense of
&gt; achievement if building your own machine were necessarily that easy ;)

Thanks Andrew. I built my first computer in 1979 (Heathkit H89) soldering iron and all. This one was a piece of cake compared to the first one. I'm more of a hardware guy. Linux is a new challenge for me.

One point I was making was that I was getting 2.2 s/TS while running OTHER projects as well. A double CPDN run caused the big hit. Clearly the 2 CPDN models were competing for the same resources. Mixing with other projects seems to give a better time. This was the first time I had 2 simultaneous CPDN models on the same HT computer. This doesn't negate anything geophi said. It's just an observation.

I haven't run the other tests yet. I'm waiting for the third model to do something. Since the other 2 bombed exactly halfway in Phase 1 I expected a similar result. As this model approached the midpoint of Phase 1 it rewound 1 day. The others didn't rewind.

There's definitely something wrong somewhere, but I'll take the rewinding as a minor victory :-)
ID: 11290 · Report as offensive     Reply Quote
Profile old_user7444

Send message
Joined: 1 Sep 04
Posts: 2
Credit: 297,724
RAC: 0
Message 11292 - Posted: 22 Mar 2005, 22:58:22 UTC - in response to Message 11290.  

HI,

I have a p4 northwood processor running at 3.0 ghz. I run cpdn boinc on this computer and run 2 models at the same time in hyperthreading mode.

So far I have run 10 models to completion with 9 successes. the one failure completed the entire model and uploaded it but came up with a computing error.

so harware wise it is possible to run two models in ht mode without to many problems. I have heard it is not a good idea to time slice cpdn with other boinc projects.

The only difference in our setups is I amm using windows XP home sp2. My timesteps are 3.72 seconds and have been consistently this fast since september 1 when this machine started model 1 and 2.

Carl the former oxford dc guy remarked that a p4 running a linux core 8 or above in ht mode is the ultimate combo for cpdn.

also some asus boards dont react well with certain memory types. I ran into this and had a computer not calculating anything right. all mem test passed. but out of desperation i swapped memory out with another brand and the computer has been rock solid ever since.
<img src='http://www.boincsynergy.com/images/stats/comb-1101.jpg'>
ID: 11292 · Report as offensive     Reply Quote
Profile Purple Rabbit
Avatar

Send message
Joined: 1 Sep 04
Posts: 23
Credit: 5,079,591
RAC: 2,560
Message 11295 - Posted: 23 Mar 2005, 1:53:22 UTC - in response to Message 11292.  
Last modified: 23 Mar 2005, 3:29:50 UTC

&gt; The only difference in our setups is I am using windows XP home sp2. My
&gt; timesteps are 3.72 seconds and have been consistently this fast since
&gt; september 1 when this machine started model 1 and 2.

Thanks Bruce,

Good info.

I'm time slicing, but I leave the results in memory. It works just fine on the WinXP computers. It just seemed horrendously slow when 2 CPDN models ran together on my new Linux computer.

One of my potential problems is the BOINC version. I haven't ruled out the use of BOINC ver 4.27 as a problem. The CPDN server is known to be behind the current version. Maybe I ought to revert to 4.19 (for Linux)?

I'm running (all WinXP on ver 2.25) models on a 1.3 GHz Celeron, an 866 MHz Mobile P3 (Laptop), and a 3.4 GHz Dell computer on CPDN on my other computers. They are all doing OK. It's just my new computer-from-parts-Linux-computer that seems to be doing strange things.

I either have a hardware problem (but 3 other projects don't care), my hardware is CPDN adverse, or I have an operator-head-space problem. The latter is my choice. I'm new to Linux.
ID: 11295 · Report as offensive     Reply Quote

Message boards : Number crunching : Running Two Models Simultaneously on an HT Machine

©2024 climateprediction.net