climateprediction.net home page
UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03

UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03

Message boards : Number crunching : UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
DadX

Send message
Joined: 30 Aug 06
Posts: 27
Credit: 1,480,698
RAC: 694
Message 49100 - Posted: 14 May 2014, 17:30:12 UTC

Will the results of these tasks have any value or should we just abort them as they appear to help flush them from the system?
ID: 49100 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1079
Credit: 6,902,393
RAC: 6,787
Message 49101 - Posted: 14 May 2014, 17:58:58 UTC - in response to Message 49100.  

Will the results of these tasks have any value or should we just abort them as they appear to help flush them from the system?

As far as I know, a full set of results would be perfectly valid from a scientific point of view - and something of a surprise for the project scientists. However, the difficulty is getting a model to complete with a full complement of Zip file uploads. I'm running one model as a challenge at the moment: to give it a sporting chance the machine has been disconnected from the Internet to prevent any disturbance at all - not a suitable approach for most computers ...
ID: 49101 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,403,322
RAC: 5,085
Message 49102 - Posted: 14 May 2014, 18:24:23 UTC - in response to Message 49101.  

Iain, You must be running these models under a different userID? How'd that happen?
ID: 49102 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1079
Credit: 6,902,393
RAC: 6,787
Message 49103 - Posted: 14 May 2014, 18:35:54 UTC - in response to Message 49102.  

Iain, You must be running these models under a different userID? How'd that happen?

I had a run-in with a climate-change sceptic team member a few years ago and thought that perhaps being a moderator was a part of the attraction in taking a swing at me, so Milo very kindly created this account and transferred the moderator privileges. So the models that I run are here. Wearing two virtual hats shouldn't make any difference, but it feels better this way. The credits go to the team: render therefore to Caesar the things that are Caesar�s!
ID: 49103 · Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 6 Jul 06
Posts: 141
Credit: 3,511,752
RAC: 144,072
Message 49108 - Posted: 14 May 2014, 22:58:01 UTC
Last modified: 14 May 2014, 23:00:59 UTC

I must be one of the lucky ones as I have managed to finish a MOSES without error (as far as I can tell).
See WU 8804573

Just waiting for the validation and credits to catch up.

Took just over 311 hours run time.

Conan
ID: 49108 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 49115 - Posted: 15 May 2014, 14:26:04 UTC

TWO! (16546960 and 16525908)

Regarding my remaining MOSES II tasks, is there a way I can reboot my machine without these erroring?
ID: 49115 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 49116 - Posted: 15 May 2014, 15:16:18 UTC

...and maybe the too-heavily-weighted points situation could be fixed with the next batch? It's just with my predilection for AMD machines there's just no way I merit a spot in the top 30 hosts. (And pause with the recognition that AMD is synonymous with slow).
ID: 49116 · Report as offensive     Reply Quote
Helmer Bryd

Send message
Joined: 16 Aug 04
Posts: 147
Credit: 7,748,561
RAC: 8,366
Message 49117 - Posted: 15 May 2014, 16:51:29 UTC

Nice to see some completing successfully.

But I wonder about the long list in stderr output:
oa.pc|xxxx.nc
and
oa.pe|xxxx.nc
for every month

OK? don't know what that is, was the same in beta test.

Just started a resended one here but don't want to run it in wain if they are not good.
ID: 49117 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,403,322
RAC: 5,085
Message 49123 - Posted: 15 May 2014, 22:15:56 UTC - in response to Message 49117.  

Looking at the ones that have completed through 10 years, success or error, they seem to be in work units issued on April 17th or before.

Tasks from those work units issued April 19th or later, can't seem to make it to the first trickle in the 10h year no matter what. An input file error on those latter work units perhaps. The stderr for the ones that fail between the 9 year zip upload and the first 10th year trickle doesn't have anything obvious in it, just some gibberish.

.....
oa.pe|0nov.nc
Model crashed: æM
Model crashed: æM
Model crashed: æM
Model crashed: æM
Model crashed: æM
Model crashed: æM
Sorry, too many model crashes! :-(
08:48:10 (2408): called boinc_finish

</stderr_txt>
ID: 49123 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 49124 - Posted: 16 May 2014, 0:38:55 UTC

Can these reboot?
ID: 49124 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,403,322
RAC: 5,085
Message 49125 - Posted: 16 May 2014, 4:02:40 UTC - in response to Message 49124.  

Can these reboot?

If they are removed from memory, trickles will stop for that model year and the zip upload for that year won't be generated. The next year trickles will resume and zip uploads will resume. At the end, since at least one yearly upload wasn't generated, the status of the task will be marked as an error, even though you will get all credits if you get to the end. I do not know if the output is useful at that point.
ID: 49125 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 49129 - Posted: 16 May 2014, 13:10:27 UTC - in response to Message 49125.  

Thanks. I see now from your earlier post that I made you repeat yourself, so I apologize. I already rebooted because of a kernel update, so I'll see if I can restart these from the beginning.
ID: 49129 · Report as offensive     Reply Quote
Helmer Bryd

Send message
Joined: 16 Aug 04
Posts: 147
Credit: 7,748,561
RAC: 8,366
Message 49134 - Posted: 16 May 2014, 22:05:42 UTC - in response to Message 49123.  

Looking at the ones that have completed through 10 years, success or error, they seem to be in work units issued on April 17th or before.

Tasks from those work units issued April 19th or later, can't seem to make it to the first trickle in the 10h year no matter what.


Oh, my WU 8861247 was issued April 19 and earlier crashed in the last year, but that was on a Mac;
Model crashed:
Sorry, too many model crashes! :-(
error: zipfile probably corrupt (segmentation violation)
hadam3pm2_7.03_i686-apple-darwin(87014,0xa12831a8) malloc: *** error for object 0x500000: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug

</stderr_txt>

Mine is Linux with a fresh PSU, hope is the last thing...
ID: 49134 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1079
Credit: 6,902,393
RAC: 6,787
Message 49251 - Posted: 29 May 2014, 11:57:32 UTC - in response to Message 49101.  

Will the results of these tasks have any value or should we just abort them as they appear to help flush them from the system?

As far as I know, a full set of results would be perfectly valid from a scientific point of view - and something of a surprise for the project scientists. However, the difficulty is getting a model to complete with a full complement of Zip file uploads. I'm running one model as a challenge at the moment: to give it a sporting chance the machine has been disconnected from the Internet to prevent any disturbance at all - not a suitable approach for most computers ...

That experiment has failed: despite being locked in a darkened room and disconnected from the Internet, the model created 99 trickles and 9 Zip files - but also an error exit code 9 (as in beta). So the trickles were uploaded but the ~500 MB of Zip files were immediately deleted on reconnection.
ID: 49251 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 49278 - Posted: 31 May 2014, 22:24:45 UTC

Got two of those re-issued MOSES things, luckily on my fastest machine. One just uploaded the number 8 file, one is at number 3. Won't interrupt the processing at all in any way.
We'll see what happens. Que sera, sera.
Hope the results are useful, as always.
ID: 49278 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 49288 - Posted: 3 Jun 2014, 9:06:14 UTC

One of those thing just finished OK. Got one more may finish in a few days.
Hoping we get lots more of these models soon.

Hope newer edition works for Windows and for not requiring never stop model.


ID: 49288 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4314
Credit: 16,376,846
RAC: 3,590
Message 49289 - Posted: 3 Jun 2014, 9:56:11 UTC - in response to Message 49288.  

Well done Erik!
Good to know they will finish is allowed to. I haven't had any yet but maybe by the time my running and queued models have finished the reworked ones will be out.
ID: 49289 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1079
Credit: 6,902,393
RAC: 6,787
Message 49290 - Posted: 3 Jun 2014, 10:06:55 UTC

One of those thing just finished OK. Got one more may finish in a few days.
Hoping we get lots more of these models soon.

Hope newer edition works for Windows and for not requiring never stop model.


... that's interesting. The full complement of trickles appears to be 111. The first trickle is at 2,948 followed by 10 sets of 11 trickles. The first ten trickles of each set are at intervals of 2,880 with the eleventh at twice that interval (i.e. 5,760).
ID: 49290 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 49293 - Posted: 3 Jun 2014, 16:04:50 UTC

So I restarted two from scratch and ran them in lock-step along with one newly started.

16603602: restart--finished!
16608984: restart--failed.
16611240: new--failed.

The failures show this message repeated 5 to 6 times near the end in stderr: "Model crashed: &#230;&#144;M".

Tempermental things, for sure.

(I learned to restart when working with Iain's slab model anaylsis--just involves some careful file deletion and xml editing.)
ID: 49293 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,888,554
RAC: 1,481,373
Message 49311 - Posted: 6 Jun 2014, 7:31:26 UTC

Yup, the one re-issue my machine got finished, the other ran through uploading the 9.gz and then died with a totally useless error code.
Got two more re-issues. Inclined to let them run, as are on my more stable machines, one even has ECC memory.
Que sera.
If letting the re-issues run is a waste, please let me know.
ID: 49311 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03

©2024 climateprediction.net