climateprediction.net home page
UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03

UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03

Message boards : Number crunching : UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
klepel

Send message
Joined: 9 Oct 04
Posts: 18
Credit: 48,185,713
RAC: 10,543
Message 48816 - Posted: 17 Apr 2014, 0:12:29 UTC


ID: 48816 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2738
Credit: 3,388,112
RAC: 2,679
Message 48817 - Posted: 17 Apr 2014, 5:54:01 UTC - in response to Message 48816.  

I notice the other tasks in the work unit have also errored out. Bit early to say whether it is a universal problem with the current batch or not. Hopefully if it is things will be sorted out soon.
ID: 48817 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7233
Credit: 23,154,247
RAC: 341
Message 48818 - Posted: 17 Apr 2014, 8:09:56 UTC - in response to Message 48816.  

I was not able to find any information!
HYDRA
There's a link on the front page.

ID: 48818 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1121
Credit: 20,460,788
RAC: 4,016
Message 48829 - Posted: 17 Apr 2014, 20:12:48 UTC

I see that the hadam3pm 2 with MOSES II has been released, but, only in versions for Mac and Linux. Is there a version for Windows anywhere in the pipeline or are Windows users going to have to wander around in desert for 40 years waiting to get to the promised land.

ID: 48829 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7233
Credit: 23,154,247
RAC: 341
Message 48830 - Posted: 17 Apr 2014, 20:20:58 UTC - in response to Message 48829.  

No Windows version was tested, so none will be available.
And everything was Rush Rush Rush. Hence the No Graphics part, which caused problems, and forced a return to an earlier version.

Someday perhaps.

ID: 48830 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 48831 - Posted: 17 Apr 2014, 22:47:36 UTC - in response to Message 48816.  

Klepel, you might be running into an old issue that keeps cropping up with RHEL derived distributions like CentOS. Unfortunately CPDN's application developers target a distribution with newer libraries than those in these distributions. See this sticky.

If you:
strings /usr/lib/i386-linux-gnu/libstdc++.so.6 | grep GLIBCXX

(modify the path for your libstdc++.so.6 location)

...the most recent version supported should be 3.4.10 or greater.
ID: 48831 · Report as offensive     Reply Quote
pvh

Send message
Joined: 9 Apr 14
Posts: 14
Credit: 1,962,018
RAC: 0
Message 48840 - Posted: 18 Apr 2014, 17:13:02 UTC

I am running one of the new WUs (hadam3pm2_b8q0_1967_10_008669491_1) under openSUSE 13.1 and it is running fine for 24 hours now. The projected total run time seems extreme though. After 24 hours only 0.5% of the WU has completed. I hope that speeds up later on since 200 days run time really ties up your computer for a long time...
ID: 48840 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1019
Credit: 5,511,676
RAC: 4,392
Message 48841 - Posted: 18 Apr 2014, 17:20:46 UTC - in response to Message 48840.  

I am running one of the new WUs (hadam3pm2_b8q0_1967_10_008669491_1) under openSUSE 13.1 and it is running fine for 24 hours now. The projected total run time seems extreme though. After 24 hours only 0.5% of the WU has completed. I hope that speeds up later on since 200 days run time really ties up your computer for a long time...

Thanks for pointing this out, pvh. The issue of excessive run-time estimates was identified during beta testing and I am surprised that no correction has been made, if this is indeed a general problem and not some peculiarity of that particular machine. Your comment has been passed onto the project team, as run-time estimates can affect work flow: CPDN should be a good BOINC citizen in this regard.

Welcome!
ID: 48841 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1121
Credit: 20,460,788
RAC: 4,016
Message 48843 - Posted: 18 Apr 2014, 17:57:54 UTC


ID: 48843 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1019
Credit: 5,511,676
RAC: 4,392
Message 48844 - Posted: 18 Apr 2014, 18:07:16 UTC
Last modified: 18 Apr 2014, 18:10:15 UTC

Don't worry, Jim. They don't take seven months. If the beta testing was anything to go by, the estimate of run time and the percentage progress were both wrong. However, there were so many version changes that I got thoroughly confused: my two-year Mac model took 90 hours as I recall, so these ten-year models should be multiplied proportionally. I believe some Linux users did finish their ten-year models so they may be able to offer a more authoritative estimate.
ID: 48844 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 1931
Credit: 41,487,636
RAC: 4,640
Message 48845 - Posted: 18 Apr 2014, 18:21:10 UTC

The hadam3pm2 ten year model took about 125 hours on my i7 3770 running Linux Mint 5 in a virtual machine on Win7. Yes, the time estimates and percent done are WAY off, useless and misleading.

ID: 48845 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7233
Credit: 23,154,247
RAC: 341
Message 48847 - Posted: 18 Apr 2014, 20:17:15 UTC - in response to Message 48845.  

It's not the run time that's the problem. It's the zips. They're BIG.

I'm going to make a News post.

************

10 year beta models on my Haswell 4770K processor took 218 hours. This is just under 10 days.

At 189 hours run time, they still had "344 hours to go", and were at "Progress = 6.7% ".
The best indicator I think, is the number of zips uploaded, compared to the amount of time run so far. There's 10 zips.


ID: 48847 · Report as offensive     Reply Quote
Helmer Bryd

Send message
Joined: 16 Aug 04
Posts: 136
Credit: 4,700,070
RAC: 0
Message 48853 - Posted: 19 Apr 2014, 14:49:42 UTC

The progress bar in Boinc is calculated for a 120 year run, blah

My i5 3570k @4.4gig (normal I guess) ran a ten year model 7.02 in the beta site and took 5 days to complete.
The newer version 7.03 runs at the same speed, and I have a finished 2 year beta test that took 22h to run (beta site crapped out)

A problem in beta tests with 7.02 was stop/restart, monthly zips were never built when stopping/restarting. Try that.

ID: 48853 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 368
Credit: 137,835,684
RAC: 35,634
Message 48855 - Posted: 20 Apr 2014, 6:20:18 UTC

I've only got a few of the MOSES wu's - got a big backlog of ANZ's :) Not seen so much work for long long time. Suspended all other projects to run CPDN.

A few concerns -- after less than 3 days running these MOSES models

1- The totally wrong BOINC mis-estimate of run-time and percent-completion -- it's a BOINC problem and was implausible from the get-go. Glad others have confirmed it's a non-problem. (except if one is trying to guess how long the model will run :).
Based on the first few wu's on my machines, guess 23hours*10 to 30hours*10. About 1+ to 2+ weeks. Not bad. And I'm loading every hyperthread save one on most of my boxes. Certainly not the 1000 hours BOINC was misestimating at first. Expect this will settle down in a few weeks or months of client BOINC experience

2- thanks Melvyn and the other beta testers -- especially for the warning about restart problems. As luck would have it, 2 of my boxes hit the infamous "exited with zero status but no 'finished' file errors - possibly network related - possibly caused by the ultra-low nice 19 that BOINC tasks run at by default. I reduced load on the problem machines and has not happened again.
After this happened, 2 tasks just kept on eating cpu but not trickling for over a day. A clean shutdown and restart got one of them going, the other seems stuck still.. The other box kept trickling, but the first upload is nowhere in my logs. But the second upload is in the logs. Huh?
OTOH clean shutdowns for backups haven't caused any problems yet, the tasks keep on ok after restart.

3- Looking at the wingmen for the tasks that failed before by machines downloaded them -- and this is an ongoing problem with Linux users - see the Unix-Linux thread --
Missing 32-bit libs - This problem is I think something the Linux distros should look into. Ubuntu tells me when I try to run a non-existent program - a whole list of possible things I might have meant - but a missing system library leaves me totally wondering and googling.
The other error I've seen, also discussed on the Unix-Linux thread - where ancient linux 2.6 distros have libstd6C++ or some such lib more than half-decade old. And don't work with code compiled in the last few years. Sorry but OS's depending on ancient system libs nearly a decade and 8 or so point releases old. Does not compute. Ask WIN XP or original WIN NT users. Stability is good, obsolescence not so good.

Hope this helps.

Keep on crunching

e
ID: 48855 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7233
Credit: 23,154,247
RAC: 341
Message 48856 - Posted: 20 Apr 2014, 6:50:47 UTC - in response to Message 48855.  

Hi Eirik

The other box kept trickling, but the first upload is nowhere in my logs. But the second upload is in the logs. Huh?
Something similar happened to me:

I shut down the computer at one point, (no graphics, so hard to tell when it's safe), and when I restarted, all 4 restarted OK.
BUT ...
One of them failed to produce zip 8. I'm guessing that one was running a bit behind the others, and got caught at a critical point when I shut down.
But, unlike in the past, the model started again! It went on the produce zips 9 and 10, and then all 4 finished, 3 OK, the 4th with an error message.

Then the beta server broke before I could upload, and come partly back while I was running main site ANZ models. (What's there is several years old.)
During the next ANZ upload, BOINC said: Oh, there's the server, I'll start uploading. The zips for the failed model were aborted by BOINC, and then it was reported. Where to though is a mystery.

I guess we're in the fast lane again. :(

ID: 48856 · Report as offensive     Reply Quote
pvh

Send message
Joined: 9 Apr 14
Posts: 14
Credit: 1,962,018
RAC: 0
Message 48874 - Posted: 23 Apr 2014, 16:40:16 UTC

I have just finished 6 WUs, but BOINC is refusing to download any new work because it thinks that it doesn't need any work. This is undoubtedly a result of the excessive time remaining estimate for the hadam3pm2 WU. Is there any way of tricking BOINC into downloading new work despite this?
ID: 48874 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 368
Credit: 137,835,684
RAC: 35,634
Message 48882 - Posted: 24 Apr 2014, 14:00:30 UTC - in response to Message 48874.  

I have just finished 6 WUs, but BOINC is refusing to download any new work because it thinks that it doesn't need any work. This is undoubtedly a result of the excessive time remaining estimate for the hadam3pm2 WU. Is there any way of tricking BOINC into downloading new work despite this?


If you have an empty slot with no work for the cpu's you've allotted and BOINC isn't grabbing a download - that's a problem.
If you just want BOINC to grab some new work before the old work is done, not likely to happen.

The miscalculation of work remaining and progress on the MOSES wu's has been commented on before.

I have noticed that BOINC starts to compensate (somehow, no clue as to how) after running the MOSES's for a few days, in that the absurdly high compeletion estimate drops by a few hundred hours after a few days (not on old work, but on new downloads)

For now, I just multiply the "percent completion" that BOINC estimates by 9 or 10 or 11 or so.

For me, when a cpu slot empties, BOINC always fills it, but not always before time.

ID: 48882 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 48895 - Posted: 25 Apr 2014, 10:42:10 UTC - in response to Message 48816.  

I notice Les's comment that it's unlikely a Window's version of this model will be developed, and must admit I thought BOINC was BOINC and didn't realise that different model versions had to be developed for each OS.

Given that this is model is the largest release by far for some time (I think), I was wondering on the rational behind the decision. To get an idea of the split I looked at the top 200 hosts, we have 27 Linux, 22 Darwin & 151 Windows boxes of varying shades.

Assuming this split is fairly representative, it seems odd to be limiting the processing to 25% of available units?
ID: 48895 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7233
Credit: 23,154,247
RAC: 341
Message 48896 - Posted: 25 Apr 2014, 10:59:22 UTC - in response to Message 48895.  

Perhaps I need to clarify.

The different versions for the different OSs, is due to the separate compilers.
But Macs and Linux both use the same compiler.

"No Windows available" is only until such time as a Windows version can be "arranged". I don't know if it will need separate testing or not.
As the testing was very hurried, only a single OS type, Linux, was tested and debugged. Apparently this also works on Macs.

Andy is now talking about a Windows version. So, once again, Patience. (In large friendly letters, as Douglas Adams said about the Hitchhiker's Guide to the Galaxy.)

ID: 48896 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1121
Credit: 20,460,788
RAC: 4,016
Message 48903 - Posted: 26 Apr 2014, 1:06:05 UTC - in response to Message 48895.  


ID: 48903 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Number crunching : UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03

©2020 climateprediction.net