climateprediction.net home page
UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03

UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03

Message boards : Number crunching : UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1019
Credit: 5,511,676
RAC: 2
Message 49018 - Posted: 2 May 2014, 13:57:00 UTC

The run-time estimates should now be better for new models. Any feedback on that would be appreciated.

Thanks.
ID: 49018 · Report as offensive     Reply Quote
MyLittleBoinc

Send message
Joined: 31 Mar 13
Posts: 44
Credit: 6,950,896
RAC: 0
Message 49019 - Posted: 2 May 2014, 14:12:16 UTC - in response to Message 49018.  
Last modified: 2 May 2014, 14:14:35 UTC

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=16607393

This one has an estimated run-time of 259 hours. Much better than 1800 hours!
ID: 49019 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1019
Credit: 5,511,676
RAC: 2
Message 49020 - Posted: 2 May 2014, 14:27:01 UTC - in response to Message 49019.  

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=16607393

This one has an estimated run-time of 259 hours. Much better than 1800 hours!

Great. That should return HADAM3PM2 to model BOINC citizen status, missing Zips notwithstanding.
ID: 49020 · Report as offensive     Reply Quote
tullus

Send message
Joined: 16 May 13
Posts: 48
Credit: 447,908
RAC: 0
Message 49026 - Posted: 2 May 2014, 22:21:54 UTC - in response to Message 49020.  

Except that all of the ~60 000 tasks now seems to have been pulled?
http://ob.cakebox.net/cpdn_status/server_status.html

Or is this just temporary while they are updated to this new version?
ID: 49026 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7253
Credit: 23,154,247
RAC: 0
Message 49027 - Posted: 2 May 2014, 23:06:56 UTC - in response to Message 49026.  

I posted about it here in Jim's thread.

The run time estimates was just one issue.
Now it's serious surgery for at least one of the others. Which means more testing, and another wait.

ID: 49027 · Report as offensive     Reply Quote
M0CZY
Avatar

Send message
Joined: 17 Jul 08
Posts: 9
Credit: 337,264
RAC: 0
Message 49028 - Posted: 3 May 2014, 4:23:13 UTC

I started 2 MOSES II work units, but after 7.5 hours they had only reached 0.213% complete.
I worked out that at that rate they would take nearly 150 days to finish!
After reading in this thread that you must run them continuously without ever rebooting your computer or they would fail, I made the decision to abort them.
The biggest threat to public safety and security is not terrorism, it is Government abuse of authority.
Bitcoin Donations: 1Le52kWoLz42fjfappoBmyg73oyvejKBR3
ID: 49028 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7253
Credit: 23,154,247
RAC: 0
Message 49029 - Posted: 3 May 2014, 4:57:56 UTC - in response to Message 49028.  

That's the run time estimate problem that we've been talking about. It's now been fixed. As it was, just before the models finished, they were still showing 8-9% completed. They actually took about 9 days to complete.

Now working on the next problem.

ID: 49029 · Report as offensive     Reply Quote
M0CZY
Avatar

Send message
Joined: 17 Jul 08
Posts: 9
Credit: 337,264
RAC: 0
Message 49030 - Posted: 3 May 2014, 5:21:24 UTC

just before the models finished, they were still showing 8-9% completed. They actually took about 9 days to complete.
I understand now that you've explained it. In that case I may have been too hasty in aborting the work units. Sorry about that!
The biggest threat to public safety and security is not terrorism, it is Government abuse of authority.
Bitcoin Donations: 1Le52kWoLz42fjfappoBmyg73oyvejKBR3
ID: 49030 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 167
Credit: 5,744,442
RAC: 0
Message 49032 - Posted: 4 May 2014, 10:38:39 UTC

Thought I'd report that I've just received the task hadam3pm2_e934_1991_10_008714819_2 & the estimated run time is 1188 hours. Wondering if this means that the problem hasn't been fixed. I'm assuming from what's been posted here that I should just let it run to completion. ( Which shouldn't be that long?)
ID: 49032 · Report as offensive     Reply Quote
Profile Greg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 49033 - Posted: 4 May 2014, 11:10:58 UTC - in response to Message 49032.  

It probably doesn't matter either way, Dave.

I note from the "server status" page that the admins have deleted the bulk of this batch of hadam3pm2s.

I'm guessing that they'll fix the two major problems, incredibly wrong and long running time estimate and failing with error if the computer (or boinc) is ever restarted while they run, and then re-issue the batch.

With luck there'll be a Windows version as well, and a fix for the cross-project Boinc stats too ... but maybe those are two fixes too far at this stage. ;-)

When they re-issue the work, though, the admins will probably delete any results dating from before the re-issue, in order to have a clean data set for the scientists.

So: run it, or don't run it. If you do, you'll get credits (eventually), but the scientists probably won't use the results. If you are supporting other projects, perhaps they could use the CPU cycles instead. (I note that there are a few HadCM3N tasks available from CPDN, too. I don't know what that's about.)
ID: 49033 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1019
Credit: 5,511,676
RAC: 2
Message 49053 - Posted: 6 May 2014, 12:32:43 UTC
Last modified: 7 May 2014, 23:40:21 UTC

The batch has indeed been withdrawn to fix some HADAM3PM2-specific problems but has also suffered from two general problems: machines with BOINC Manager 7.2.39 have been unable to correctly download models and many Linux machines do not have the required 32-bit libraries installed. These latter two problems can be fixed by users: of course, BM versions will naturally be replaced as time passes and new BM versions appear, but the library fix requires manual intervention (as more machines go 64-bit, the failure rate may get worse).

The situation is illustrated in the following charts, which show the progress of ~1200 models in 500 work units. The first chart shows all models and the second chart shows only those models that have reached at least the first trickle.

HADAM3PM2 - all models

Looking at the Mac models (i.e. Darwin-Intel, dashed blue line), the number reaching at least the first trickle is ~55%, which is normal for CPDN batches. The number is somewhat reduced compared with normal Mac rates because of the BOINC Manager download problem. However, it is apparent that the Linux starting rate is severely depressed by the combined effect of the BM and library problems.

HADAM3PM2 - models reaching first trickle

For those models that start properly, the attrition rate shows an early annual effect, every 12 trickles, but then becomes more constant until something goes very wrong entering the final year. In the work unit range scanned, no models completed; however, in other WU ranges there have been completions on both Linux and Mac.

One proviso on these charts is that the models were only issued recently and would not normally be expected to have finished by now, so the progress looks worse than it will be in a few weeks' time. Some users, noting the withdrawal of the batch, have also aborted their models.

One way and another, not a very happy batch of models.

PS The charts have been updated from 200 to 500 units.
PPS The charts will likely disappear at some random point as my Web hosting is being shifted.
ID: 49053 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 368
Credit: 137,914,252
RAC: 55
Message 49057 - Posted: 7 May 2014, 5:19:25 UTC
Last modified: 7 May 2014, 5:29:12 UTC

Sometime soon, I think, for CPDN on Linux, it will make sense to compile the CPDN models linked to the 64-bit libs, and abandon the volunteers who are running 32-bit.
Maybe that time is already past.
The evidence Iain presents seems to show that the (probable) majority of Linux users running 64-bit versions can't be expected to know how to install the backward-compatible 32-bit libs.
It seems that the choices for the linux apps are:

A) as now. link with 32-bit. Works on old linux machines, but any linux installed in the last 5 years has to also install the backward-compatible 32-bit libs -- (hey - even cellphones are about to go 64-bit if they haven't already)
Requires users to do an obscure install of backwards-compatible libs that most volunteers either don't understand, don't notice, or even if they do notice, have to post for help to get the 32-64 thing working.

B) Just link the apps with 64-bit, and lose whatever few ancient slow 32-bit Linux boxes are successfully running CPDN models. The obvious fact that some versions of some of the CPDN models run on current linux on supercomputers makes it obvious that linking 64-bit works and works well.

C) figure some way for BOINC to assess volunteer hosts capabilities and automatically assign a custom model. Hah hah.


It's time to dump the 32-bit. Big win.
Go 64-bit all the way.

B) is the obvious winner.
ID: 49057 · Report as offensive     Reply Quote
Profile tullio

Send message
Joined: 6 Aug 04
Posts: 264
Credit: 965,476
RAC: 0
Message 49058 - Posted: 7 May 2014, 5:48:39 UTC - in response to Message 49057.  

I am running 7 BOINC projects on 32-bit Linux on 2 real machines and one Virtual Machine, including also Virtual Box. I don't have to install any compatibility library. If CPDN goes 64-bit I shall abandon it. I have also a 64-bit Solaris Virtual Machine on my 64-bit Opteron 1210 but no BOINC project is available on Solaris.
Tullio
ID: 49058 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2772
Credit: 3,449,240
RAC: 58
Message 49060 - Posted: 7 May 2014, 7:10:20 UTC

and many Linux machines do not have the required 32-bit libraries installed.


While waiting for this model type to reappear, is there any way for me to check if additional 32bit libs on top of those I need for the regional and full resolution ocean models are needed? Alternatively, is there a way to download the files and check before starting computation. Easy if I still have tasks computing but if I have run out of work by then...........
ID: 49060 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 368
Credit: 137,914,252
RAC: 55
Message 49061 - Posted: 7 May 2014, 7:31:59 UTC - in response to Message 49058.  

I am running 7 BOINC projects on 32-bit Linux on 2 real machines and one Virtual Machine, including also Virtual Box. I don't have to install any compatibility library. If CPDN goes 64-bit I shall abandon it. I have also a 64-bit Solaris Virtual Machine on my 64-bit Opteron 1210 but no BOINC project is available on Solaris.
Tullio


What you say - exactly true -

No way for the project to guess how many reliable 32-bit contributors the project will lose if go 64-bit. And no way to guess how many 64-bit contributors will be gained.

Is a puzzlement.
ID: 49061 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 1939
Credit: 41,514,751
RAC: 18
Message 49063 - Posted: 7 May 2014, 12:33:44 UTC - in response to Message 49060.  

and many Linux machines do not have the required 32-bit libraries installed.


While waiting for this model type to reappear, is there any way for me to check if additional 32bit libs on top of those I need for the regional and full resolution ocean models are needed? Alternatively, is there a way to download the files and check before starting computation. Easy if I still have tasks computing but if I have run out of work by then...........


You should be fine as I didn't need to download any new 32bit libraries for the hadam3pm2 tasks. But you can check them out at:

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/download/mirror.php?file=/hadam3pm2_7.03_i686-pc-linux-gnu
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/download/mirror.php?file=/hadam3pm2_se_7.03_i686-pc-linux-gnu.zip
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/download/mirror.php?file=/hadam3pm2_um_7.03_i686-pc-linux-gnu.zip
ID: 49063 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2772
Credit: 3,449,240
RAC: 58
Message 49064 - Posted: 7 May 2014, 12:43:51 UTC - in response to Message 49063.  

Thanks,

being at times rather paranoid I will download them and check it out.
ID: 49064 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2772
Credit: 3,449,240
RAC: 58
Message 49065 - Posted: 7 May 2014, 18:37:28 UTC

I notice that the odd Moses task still appears. Out of interest will they be ones that have failed being reissued because once out there they can't be pulled so easily?
ID: 49065 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 1939
Credit: 41,514,751
RAC: 18
Message 49066 - Posted: 7 May 2014, 19:52:19 UTC - in response to Message 49065.  

I notice that the odd Moses task still appears. Out of interest will they be ones that have failed being reissued because once out there they can't be pulled so easily?


Yes, unfortunately. I just had 4 tasks on one PC crash after the year 9 upload (a common failure point according to Iain's graph). All those will be downloaded by someone else, unfortunately.
ID: 49066 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 381
Credit: 3,690,501
RAC: 0
Message 49069 - Posted: 8 May 2014, 1:16:50 UTC - in response to Message 48831.  

I am running Red Hat Enterprise Linux Server release 6.5 (Santiago) that I believe is up-to-date as of today. This is the real paid thing from Red Hat, not something derived from it. I cannot find that library where you say, but it is as below:

$ strings /usr/lib/libstdc++.so.6 | grep GLIBCXX
GLIBCXX_3.4
GLIBCXX_3.4.1
GLIBCXX_3.4.2
GLIBCXX_3.4.3
GLIBCXX_3.4.4
GLIBCXX_3.4.5
GLIBCXX_3.4.6
GLIBCXX_3.4.7
GLIBCXX_3.4.8
GLIBCXX_3.4.9
GLIBCXX_3.4.10
GLIBCXX_3.4.11
GLIBCXX_3.4.12
GLIBCXX_3.4.13
GLIBCXX_FORCE_NEW
GLIBCXX_DEBUG_MESSAGE_LENGTH

This is the 32-bit compatibility version. The regular 64-bit stuff is in

/usr/lib64/libstdc++.so.6

P.S.: I am not having any climateprediction problems with this.

ID: 49069 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03

©2020 climateprediction.net