climateprediction.net home page
MPICH version?

MPICH version?

Questions and Answers : Unix/Linux : MPICH version?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile old_user43408

Send message
Joined: 28 Jan 05
Posts: 7
Credit: 14,244
RAC: 0
Message 7855 - Posted: 28 Jan 2005, 0:03:26 UTC

G'day all

Has anyone out there in LinuxLand got BONIC on a Beowulf with MPICH (or similar)?

I've got a little cluster I could use for this gig when I'm not running Nbody sims :)

Cheers
Steve
ID: 7855 · Report as offensive     Reply Quote
old_user30025

Send message
Joined: 15 Nov 04
Posts: 19
Credit: 35,499
RAC: 0
Message 8728 - Posted: 6 Feb 2005, 12:32:18 UTC

I do not have a Beowulf but I was looking into running this beast on an openMosix.org cluster. I have started a bit of discussion on the NG comp.distributed and the boinc ML@berkeley. Quite understandably the devs said they have other priorities first and unfortunately I am not enough of a programmer to do this myself. But I am certainly interested to hear from you on how you come along.

PS: openMosix has the advantage that it should run Boinc OOTB and you can add and remove clients at will. Have a look and let us know what you think.
ID: 8728 · Report as offensive     Reply Quote
Profile old_user43408

Send message
Joined: 28 Jan 05
Posts: 7
Credit: 14,244
RAC: 0
Message 8754 - Posted: 6 Feb 2005, 22:20:56 UTC - in response to Message 8728.  

G'day Leggewie

Yeah. I've got the option of switching to OpenMOSIX (good to have your own cluster ;) It's been on the cards as something to look at. I think you're right about Boinc OOTB. I'm not sure if it'd be worth the complication though. Might just be as easy to let each node run its own model :/

Understandable on the dev's behalf too. I was curious as a distributed wx model could be a nice app for a Beowulf. ...All those cells to be processed and all ;)

I'll post back on this forum if I take it further :)

Cheers
Steve

ID: 8754 · Report as offensive     Reply Quote
old_user30025

Send message
Joined: 15 Nov 04
Posts: 19
Credit: 35,499
RAC: 0
Message 8870 - Posted: 7 Feb 2005, 23:53:52 UTC - in response to Message 8754.  

> right about Boinc OOTB. I'm not sure if it'd be worth the complication though.
> Might just be as easy to let each node run its own model :/

What I do like about the idea even without being able to run just a single model on the cluster is that is becomes possible to have something like a "boinc server" where processes are started. Then you can add and remove nodes to the cluster (for example computer that only run at day time) and the "server" distributes the load on the cluster.

As far as complications go that does not seem to be such a biggy. AFAIK it does not involve much more than patching and recompiling the kernel.

> I'll post back on this forum if I take it further :)

Please do!
ID: 8870 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7226
Credit: 23,113,811
RAC: 4,388
Message 8871 - Posted: 8 Feb 2005, 1:17:21 UTC
Last modified: 8 Feb 2005, 1:21:58 UTC

quote
.... does not seem to be such a biggy.

Over 1 million lines of fortran developed and evolved over 20-30 years by a long line of
scientists / programmers, and running on a 64 bit supercomputer. And then ported to desktops.

If you're comfortable working with that sort of thing, Oxford Uni had <a href="http://www.climateprediction.net/newsb.php?id=4"> a vacancy advertised </a> in mid-November.

Are you going to be providing your own computer / compiler?
And a lot of users are desperate to get hold of a 64 bit version for AMD and Intel
processors if you have the time.

Les
ID: 8871 · Report as offensive     Reply Quote
Profile old_user43408

Send message
Joined: 28 Jan 05
Posts: 7
Credit: 14,244
RAC: 0
Message 8875 - Posted: 8 Feb 2005, 2:09:01 UTC - in response to Message 8871.  

The complications in this case relate to having more than one processor act on the same model. You need to divvy up the work amongst the processors to try and keep them all busy at the same time.

As it explains in the excellent intro on this site, your typical wx model divides the atmosphere into "cells". You then apply the required physics to those cells and for fun - also add some neighbourly cell interaction.

That's the "magic" of a parallel wx model. Mutilple processors each working on their own cell and sharing the results with the cell's neighbours.

As I understand it, OpenMOSIX environments works best when the app and the data can be readily separated. Something I suspect might not be case in the "vanilla" CPnet model.

I think that maybe we're looking for a smart scheduler. The workflow is not dissimilar to how they render 3DCG movies - multiple compute nodes talking to a central data store. If a node is free it asks the scheduler for a model to work on and then mounts the relevant model's data. That node then "restarts" the model and adds to the data. No biggie if a node bombs... it's the equiv of an unexpected hup. I believe the model app can cope with such problems without too much drama.

...shouldn't require any mods to the core wx model...

But then again, I'm sure this idea isn't anything new. Most likely it's either been done or dismissed in this or other BOINC projects.

Stevo
ID: 8875 · Report as offensive     Reply Quote
Profile old_user43408

Send message
Joined: 28 Jan 05
Posts: 7
Credit: 14,244
RAC: 0
Message 8877 - Posted: 8 Feb 2005, 2:17:44 UTC - in response to Message 8875.  


Oh and in case you really want to do your head in... Check out PUMA - Portable University Model of the Atmosphere

I spent many an enjoyable rainy day watching my martian model shake itself apart =)

http://puma.dkrz.de/puma/

Cheers
Stevo
ID: 8877 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7226
Credit: 23,113,811
RAC: 4,388
Message 8887 - Posted: 8 Feb 2005, 3:30:49 UTC

Look at Top Teams, then the team at top of the list, then the person at the top of THAT list, then at his computers.

222 computers. No special progamming, no clusters, just LOTS of machines.

Les
ID: 8887 · Report as offensive     Reply Quote
Profile old_user43408

Send message
Joined: 28 Jan 05
Posts: 7
Credit: 14,244
RAC: 0
Message 8888 - Posted: 8 Feb 2005, 3:48:53 UTC - in response to Message 8887.  
Last modified: 8 Feb 2005, 3:51:24 UTC

Oh yes indeedy.

This relates to the thread how?

I was looking for technical input on a parallel version of the model.

S
ID: 8888 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7226
Credit: 23,113,811
RAC: 4,388
Message 8898 - Posted: 8 Feb 2005, 8:12:28 UTC

quote:
This relates to the thread how?

I was looking for technical input on a parallel version of the model.



NO parallel version! At all! Only single computers, a lot of them with HT.

Closest you'll get to fast processing is the person with the 3 computers at the top of the Top Hosts list.
Servers with 8 HT processors; so, 16 models at a time.

Les
ID: 8898 · Report as offensive     Reply Quote
old_user30025

Send message
Joined: 15 Nov 04
Posts: 19
Credit: 35,499
RAC: 0
Message 8907 - Posted: 8 Feb 2005, 11:09:50 UTC - in response to Message 8871.  

&gt; quote
&gt; .... does not seem to be such a biggy.
&gt;
&gt; Over 1 million lines of fortran developed and evolved over 20-30 years by a
&gt; long line of
&gt; scientists / programmers, and running on a 64 bit supercomputer. And then
&gt; ported to desktops.
&gt;
&gt; If you're comfortable working with that sort of thing, Oxford Uni had <a> href="http://www.climateprediction.net/newsb.php?id=4"&gt; a vacancy advertised
&gt; </a> in mid-November.

Les Bayliss, if you quote, please try to understand what I write and please do not quote out of context! Thank you.

What I quoted and what I was referring to is the following from steve_vmwx:

&gt; right about Boinc OOTB. I'm not sure if it'd be worth the complication though.
&gt; Might just be as easy to let each node run its own model :/

Steve said that it might not be worth the complication for him to run Boinc on openMosix OOTB, i.e. as it is today. I *specifically* said that even *without* changes to Boinc and/or CPDN to make it run just one model pre cluster instead of one per node, there are some benefits to be had even today and that installing an openMosix cluster does not involve much more than recompilation of the Kernel. This *clearly* relates to benefits vs. costs of switching with what is available software wise ATM.

In any case, getting a program to run on openMosix is still much easier than with one of the other cluster types. CPDN can be run on it ATM and OOTB, it just does not yet benefit too much from it.
ID: 8907 · Report as offensive     Reply Quote
old_user30025

Send message
Joined: 15 Nov 04
Posts: 19
Credit: 35,499
RAC: 0
Message 8911 - Posted: 8 Feb 2005, 12:23:37 UTC - in response to Message 8875.  

&gt; As I understand it, OpenMOSIX environments works best when the app and the
&gt; data can be readily separated.

Where do you infer that from? I am not sure that is the case.

Take a look at http://howto.x-tend.be/openMosixWiki/index.php/FAQ, especially "Generally, how do I write an openMosix-aware program?".
ID: 8911 · Report as offensive     Reply Quote
Profile old_user43408

Send message
Joined: 28 Jan 05
Posts: 7
Credit: 14,244
RAC: 0
Message 8945 - Posted: 8 Feb 2005, 21:46:33 UTC - in response to Message 8911.  

Sorry Leggewie, it's been a while since I looked into OpenMOSIX. You're probably right :)

I just vaguely recall that there are some apps that use a memory or IO pattern that won't "migrate". If push comes to shove I'll have to do more homework ;)

With the source available anybody with the skills can have a go at it.

I'm keeping an email "eye" on this thread so we'll see if anyone takes the bait.

BTW, *I* got your context re "not a biggie" and agree that giving it a burl isn't a hard ask.

Cheers
Steve
ID: 8945 · Report as offensive     Reply Quote
Profile old_user32502

Send message
Joined: 9 Dec 04
Posts: 3
Credit: 10,123
RAC: 0
Message 9378 - Posted: 16 Feb 2005, 1:38:03 UTC - in response to Message 8945.  

Having used openmosix in the past, I remember that anything using shared memory doesn't migrate. That includes things like 'X' (the core server) as well as mozilla. Not sure whether boinc would migrate.

You'd still be limited to running N separate boinc processes on your 'central' machine and hoping they'd migrate to N other machines. No speedup over logging on to each machine manually and starting boinc on each one.

OpenMosix only migrates work to other machines at the process level (not e.g. thread level) as I recall. So since boinc is one process, that still hogs 100% of 1 machine.

The only advantage I see (if you can get it to work) is ease of administration - log into a single machine and control many. But then, couldn't you achieve all that with a few 'ssh remotemachine /usr/local/boinc/boinc_xxxx' commands.

Will Smith
Banbury, UK
ID: 9378 · Report as offensive     Reply Quote
Profile old_user43408

Send message
Joined: 28 Jan 05
Posts: 7
Credit: 14,244
RAC: 0
Message 9379 - Posted: 16 Feb 2005, 1:49:14 UTC - in response to Message 9378.  

Thanks Will.

Always good to have confirmation from someone that actually used the thang! (OpenMOSIX).

I'll wait for the MPI port ;)

Cheers
Stevo
ID: 9379 · Report as offensive     Reply Quote
old_user30025

Send message
Joined: 15 Nov 04
Posts: 19
Credit: 35,499
RAC: 0
Message 9380 - Posted: 16 Feb 2005, 1:57:33 UTC - in response to Message 9378.  

&gt; You'd still be limited to running N separate boinc processes on your 'central'
&gt; machine and hoping they'd migrate to N other machines. No speedup over
&gt; logging on to each machine manually and starting boinc on each one.
&gt;
&gt; OpenMosix only migrates work to other machines at the process level (not e.g.
&gt; thread level) as I recall. So since boinc is one process, that still hogs
&gt; 100% of 1 machine.
&gt;
&gt; The only advantage I see (if you can get it to work) is ease of administration
&gt; - log into a single machine and control many. But then, couldn't you achieve
&gt; all that with a few 'ssh remotemachine /usr/local/boinc/boinc_xxxx' commands.

You made explicit the points I was referring to when saying "CPDN can be run on it ATM and OOTB, it just does not yet benefit too much from it." Your analysis is absolutely correct. There are only few benefits to be had today but it would be great if BOINC/CPDN actually were written such that it spawned several processes. How much work that would involve? I have no idea, but I do know that the change could be gradual. And if it does not happen, then it just does not happen.
ID: 9380 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : MPICH version?

©2020 climateprediction.net