climateprediction.net home page
UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03

UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03

Message boards : Number crunching : UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 1919
Credit: 40,598,866
RAC: 6,597
Message 49070 - Posted: 8 May 2014, 3:25:16 UTC - in response to Message 49069.  

Jean-David,

The too old version of GLIBCXX that causes problems is in RHEL 5 and clones. RHEL 6 should work fine as long as the 32 bit libraries are installed.
ID: 49070 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 376
Credit: 3,595,519
RAC: 201
Message 49073 - Posted: 9 May 2014, 0:12:11 UTC - in response to Message 49070.  

I think that was the reason why I upgraded to RHEL 6. I normally skip the even numbered releases as I hate doing the upgrades.
ID: 49073 · Report as offensive     Reply Quote
DadX

Send message
Joined: 30 Aug 06
Posts: 24
Credit: 1,245,326
RAC: 3
Message 49100 - Posted: 14 May 2014, 17:30:12 UTC

Will the results of these tasks have any value or should we just abort them as they appear to help flush them from the system?
ID: 49100 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1013
Credit: 5,146,611
RAC: 1,975
Message 49101 - Posted: 14 May 2014, 17:58:58 UTC - in response to Message 49100.  

Will the results of these tasks have any value or should we just abort them as they appear to help flush them from the system?

As far as I know, a full set of results would be perfectly valid from a scientific point of view - and something of a surprise for the project scientists. However, the difficulty is getting a model to complete with a full complement of Zip file uploads. I'm running one model as a challenge at the moment: to give it a sporting chance the machine has been disconnected from the Internet to prevent any disturbance at all - not a suitable approach for most computers ...
ID: 49101 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 1919
Credit: 40,598,866
RAC: 6,597
Message 49102 - Posted: 14 May 2014, 18:24:23 UTC - in response to Message 49101.  

Iain, You must be running these models under a different userID? How'd that happen?
ID: 49102 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1013
Credit: 5,146,611
RAC: 1,975
Message 49103 - Posted: 14 May 2014, 18:35:54 UTC - in response to Message 49102.  

ID: 49103 · Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 6 Jul 06
Posts: 98
Credit: 1,637,534
RAC: 0
Message 49108 - Posted: 14 May 2014, 22:58:01 UTC
Last modified: 14 May 2014, 23:00:59 UTC

I must be one of the lucky ones as I have managed to finish a MOSES without error (as far as I can tell).
See WU 8804573

Just waiting for the validation and credits to catch up.

Took just over 311 hours run time.

Conan
ID: 49108 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 49115 - Posted: 15 May 2014, 14:26:04 UTC

TWO! (16546960 and 16525908)

Regarding my remaining MOSES II tasks, is there a way I can reboot my machine without these erroring?
ID: 49115 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 49116 - Posted: 15 May 2014, 15:16:18 UTC

...and maybe the too-heavily-weighted points situation could be fixed with the next batch? It's just with my predilection for AMD machines there's just no way I merit a spot in the top 30 hosts. (And pause with the recognition that AMD is synonymous with slow).
ID: 49116 · Report as offensive     Reply Quote
Helmer Bryd

Send message
Joined: 16 Aug 04
Posts: 136
Credit: 4,700,070
RAC: 0
Message 49117 - Posted: 15 May 2014, 16:51:29 UTC

Nice to see some completing successfully.

But I wonder about the long list in stderr output:
oa.pc|xxxx.nc
and
oa.pe|xxxx.nc
for every month

OK? don't know what that is, was the same in beta test.

Just started a resended one here but don't want to run it in wain if they are not good.
ID: 49117 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 1919
Credit: 40,598,866
RAC: 6,597
Message 49123 - Posted: 15 May 2014, 22:15:56 UTC - in response to Message 49117.  

Looking at the ones that have completed through 10 years, success or error, they seem to be in work units issued on April 17th or before.

Tasks from those work units issued April 19th or later, can't seem to make it to the first trickle in the 10h year no matter what. An input file error on those latter work units perhaps. The stderr for the ones that fail between the 9 year zip upload and the first 10th year trickle doesn't have anything obvious in it, just some gibberish.

.....
oa.pe|0nov.nc
Model crashed: æM
Model crashed: æM
Model crashed: æM
Model crashed: æM
Model crashed: æM
Model crashed: æM
Sorry, too many model crashes! :-(
08:48:10 (2408): called boinc_finish

</stderr_txt>
ID: 49123 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 49124 - Posted: 16 May 2014, 0:38:55 UTC

Can these reboot?
ID: 49124 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 1919
Credit: 40,598,866
RAC: 6,597
Message 49125 - Posted: 16 May 2014, 4:02:40 UTC - in response to Message 49124.  

Can these reboot?

If they are removed from memory, trickles will stop for that model year and the zip upload for that year won't be generated. The next year trickles will resume and zip uploads will resume. At the end, since at least one yearly upload wasn't generated, the status of the task will be marked as an error, even though you will get all credits if you get to the end. I do not know if the output is useful at that point.
ID: 49125 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 49129 - Posted: 16 May 2014, 13:10:27 UTC - in response to Message 49125.  

Thanks. I see now from your earlier post that I made you repeat yourself, so I apologize. I already rebooted because of a kernel update, so I'll see if I can restart these from the beginning.
ID: 49129 · Report as offensive     Reply Quote
Helmer Bryd

Send message
Joined: 16 Aug 04
Posts: 136
Credit: 4,700,070
RAC: 0
Message 49134 - Posted: 16 May 2014, 22:05:42 UTC - in response to Message 49123.  

Looking at the ones that have completed through 10 years, success or error, they seem to be in work units issued on April 17th or before.

Tasks from those work units issued April 19th or later, can't seem to make it to the first trickle in the 10h year no matter what.


Oh, my WU 8861247 was issued April 19 and earlier crashed in the last year, but that was on a Mac;
Model crashed:
Sorry, too many model crashes! :-(
error: zipfile probably corrupt (segmentation violation)
hadam3pm2_7.03_i686-apple-darwin(87014,0xa12831a8) malloc: *** error for object 0x500000: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug

</stderr_txt>

Mine is Linux with a fresh PSU, hope is the last thing...
ID: 49134 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1013
Credit: 5,146,611
RAC: 1,975
Message 49251 - Posted: 29 May 2014, 11:57:32 UTC - in response to Message 49101.  

Will the results of these tasks have any value or should we just abort them as they appear to help flush them from the system?

As far as I know, a full set of results would be perfectly valid from a scientific point of view - and something of a surprise for the project scientists. However, the difficulty is getting a model to complete with a full complement of Zip file uploads. I'm running one model as a challenge at the moment: to give it a sporting chance the machine has been disconnected from the Internet to prevent any disturbance at all - not a suitable approach for most computers ...

That experiment has failed: despite being locked in a darkened room and disconnected from the Internet, the model created 99 trickles and 9 Zip files - but also an error exit code 9 (as in beta). So the trickles were uploaded but the ~500 MB of Zip files were immediately deleted on reconnection.
ID: 49251 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 367
Credit: 131,194,567
RAC: 105,667
Message 49278 - Posted: 31 May 2014, 22:24:45 UTC

Got two of those re-issued MOSES things, luckily on my fastest machine. One just uploaded the number 8 file, one is at number 3. Won't interrupt the processing at all in any way.
We'll see what happens. Que sera, sera.
Hope the results are useful, as always.
ID: 49278 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 367
Credit: 131,194,567
RAC: 105,667
Message 49288 - Posted: 3 Jun 2014, 9:06:14 UTC

One of those thing just finished OK. Got one more may finish in a few days.
Hoping we get lots more of these models soon.

Hope newer edition works for Windows and for not requiring never stop model.


ID: 49288 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 2680
Credit: 3,259,434
RAC: 2,149
Message 49289 - Posted: 3 Jun 2014, 9:56:11 UTC - in response to Message 49288.  

Well done Erik!
Good to know they will finish is allowed to. I haven't had any yet but maybe by the time my running and queued models have finished the reworked ones will be out.
ID: 49289 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1013
Credit: 5,146,611
RAC: 1,975
Message 49290 - Posted: 3 Jun 2014, 10:06:55 UTC

One of those thing just finished OK. Got one more may finish in a few days.
Hoping we get lots more of these models soon.

Hope newer edition works for Windows and for not requiring never stop model.


... that's interesting. The full complement of trickles appears to be 111. The first trickle is at 2,948 followed by 10 sets of 11 trickles. The first ten trickles of each set are at intervals of 2,880 with the eleventh at twice that interval (i.e. 5,760).
ID: 49290 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03

©2020 climateprediction.net