climateprediction.net home page
FAMOUS SUCCESS/FAILURE RATIO

FAMOUS SUCCESS/FAILURE RATIO

Message boards : Number crunching : FAMOUS SUCCESS/FAILURE RATIO
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

AuthorMessage
Profile Greg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 39991 - Posted: 21 Jun 2010, 22:27:39 UTC

Core i3 530 2.93GHz, 2GB Kingston valueRAM, Gigabyte H55M UD2H mo'board, Linux Arch 2.6.33, 100% CPDN.

Crashed 3:
              u0d9_0599    neg. press.      42,999 sec 

              upij_0799      theta           271,931 sec 

              u0s5_1999    neg. press.     155,591 sec


Completed 2:
             u0s4_1799                 1,029,859 sec 

             u0sp_1799                 1,029,843 sec


In progress 1: u089_0599 - 90%

Mystery (says in progress on web page, but isn't on PC) 1: u0ch_1999
ID: 39991 · Report as offensive     Reply Quote
[B^S] mavau

Send message
Joined: 30 Aug 04
Posts: 142
Credit: 9,936,132
RAC: 0
Message 40007 - Posted: 24 Jun 2010, 12:00:08 UTC

One mistake in my previous post: only 6 completed models.
And 7 crashes.
The latest:
famous_uow2_1799_200_006665101

famous_uoxh_1799_200_006665152

famous_uowz_1799_200_006665134


Forum search Site search
ID: 40007 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,077,407
RAC: 1,835
Message 40028 - Posted: 26 Jun 2010, 17:09:15 UTC
Last modified: 26 Jun 2010, 17:12:42 UTC

Famous_u0na_1799_200_006633689_6 crashed at 96% completion. OS is Windows 7 32 bit running on an Intel Core 2 Duo 1.5 GHz processor with 2 GB of RAM. 1.06s/TS RIP :(
ID: 40028 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,077,407
RAC: 1,835
Message 40033 - Posted: 27 Jun 2010, 14:30:31 UTC
Last modified: 27 Jun 2010, 14:34:04 UTC

Famous _u0mw_1799_200_006634055_6 completed successfully. Os is Windows 7 32 bit running on Intel Core 2 Duo 1.5 GHz processor with 2 GB of RAM. 1.05s/TS. :) I seem to be running about 50% success rate on this type.
ID: 40033 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 40034 - Posted: 27 Jun 2010, 15:30:49 UTC
Last modified: 27 Jun 2010, 15:46:44 UTC

I've looked at how some of the top computers are doing, adding together results for FAMOUS 6.10 and 6.11. I've not counted models with downloading errors as that was a server problem.

Peter, Linux: 6 completed, 5 errored

Ian Rees, Windows: 5 completed, 5 errored

Montes, Mac: 2 completed, 7 errored

Mike Koehler, Mac: 2 completed, 6 errored

Anonymous, Windows: 1 completed, 6 errored


This is less than the approx 50% success rate you estimate, but two factors make the above figures not entirely reliable.

* Models that crash take less computing time than completions.

* The list doesn't include partly processed models and the further a model has progressed the less likely it must be to crash, ie the more likely to succeed.

So I think the success ratio of these computers will probably increase as they have time to finish more models.

A more accurate estimate could be obtained by trawling through many workunits to see how many succeed on all platforms and how many crash on one, two or three. But this would be extraordinarily time-consuming. Because some computers crash models for non-model-related reasons one would need to look at the stderr of every model failure apart from those that couldn't get started because of a computer misconfiguration.

I will not be doing this.

The % of workunits that complete on all platforms must be lower than the average success % on members' computers.

One of us could look at those very stable top computers again after say another month.
Cpdn news
ID: 40034 · Report as offensive     Reply Quote
old_user596405

Send message
Joined: 4 Oct 09
Posts: 73
Credit: 7,242,427
RAC: 0
Message 40039 - Posted: 28 Jun 2010, 6:28:23 UTC

One more crash in my i7 920 system (@3.4 with Win 7 Home x64) at 51.5%.
famous_upfd_1799_200_006665796_0 - Invalid Theta Detected.

3 completed, 3 crashed and 5 still running in this machine.
ID: 40039 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1081
Credit: 7,002,728
RAC: 4,238
Message 40040 - Posted: 28 Jun 2010, 8:30:21 UTC - in response to Message 40039.  

Invalid Theta Detected.

Just in case anyone is wondering what 'theta' is: potential temperature.
ID: 40040 · Report as offensive     Reply Quote
[B^S] mavau

Send message
Joined: 30 Aug 04
Posts: 142
Credit: 9,936,132
RAC: 0
Message 40051 - Posted: 29 Jun 2010, 12:53:07 UTC

Two more successes:

famous_up1h_1399_200_006665296

famous_uoxz_1799_200_006665170

8 completed models, 7 crashes, 8 running on the corei7 and 1 on the Inspiron.


Forum search Site search
ID: 40051 · Report as offensive     Reply Quote
Profile old_user733
Avatar

Send message
Joined: 9 Aug 04
Posts: 25
Credit: 4,756,979
RAC: 0
Message 40052 - Posted: 29 Jun 2010, 14:36:18 UTC
Last modified: 29 Jun 2010, 14:37:18 UTC

Invalid Theta on this task: famous_r100_799_200_006666899_1.

So far, five completions, and one other Invalid Theta. All on Win7_x64.
ID: 40052 · Report as offensive     Reply Quote
[B^S] mavau

Send message
Joined: 30 Aug 04
Posts: 142
Credit: 9,936,132
RAC: 0
Message 40080 - Posted: 5 Jul 2010, 8:23:23 UTC

I'm now on 13 completed models and 10 crashes.

Forum search Site search
ID: 40080 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,077,407
RAC: 1,835
Message 40095 - Posted: 9 Jul 2010, 18:11:23 UTC

Famous_u0qu_1799_200_006667114_2 completed successfully. OS is Windows 7 64 bit running on a Intel Core 2 Duo 2.2 GHz with 4 GB’s of RAM.

ID: 40095 · Report as offensive     Reply Quote
Profile Greg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 40112 - Posted: 11 Jul 2010, 22:28:38 UTC - in response to Message 40034.  

I've had a look in a little more depth at the FAMOUS success/failure stats from the first two pages of the 'Top Computers' list.

I tried to pick computers with at least 700,000 credits, so not "drive-bys". Compute errors only, as before.
Computer.......OS.........Pend+Invalid......Error.....Error%..Overall.Fail%
 976458      Darwin           11              29        73
1013254      Darwin            4              29        88
1001600      Darwin            0               9        ALL
 978938      Darwin            4              12        75
1063866      Darwin            3              27        90
                                                               83% Darwin
                                 excluding 1001600:            82% Darwin

1000554      W7                2               3        60
 961681      WSv2008           7              12        63
 882224      WXP X64           5               2        29
                                                               55% Windows

1036870      Lin 2.6.16       16               8        33
1072992      Lin 2.6.32        6               7        54
1047400      Lin 2.6.32 FC12   7               6        46
                                                               42% Linux


Of course this is a snapshot, so you won't get these numbers now, or not all of them anyway. And early days, and all that. However.

Is it possible there is a problem with the MacOS code? Especially since most of the Darwin computers have relatively few failures with the other types of models.

Edit: will cross-post on CPDN board as this board seems to ignore the "pre" tag, so the table is not easy to follow.
ID: 40112 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,528,572
RAC: 6,474
Message 40115 - Posted: 11 Jul 2010, 23:22:57 UTC - in response to Message 40112.  
Last modified: 11 Jul 2010, 23:24:42 UTC

On my systems here at cpdn...

Core i7 920 in Linux
6 completed, 7 failed, 4 in progress
Phenom II X4 940 in Linux
7 completed, 5 failed, 4 in progress
Core 2 E6420 in Windows
2 completed, 0 failed, 1 in progress
ID: 40115 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 40116 - Posted: 12 Jul 2010, 0:05:27 UTC - in response to Message 40112.  

There's always the possibility of faulty data files, but ALL types of climate model are tested for months on our beta site.

It's possible that your comparisons are too simplistic.

As I said near the start of this thread, it's known that some of the series of models with "early label names" were being "pushed hard" with their forcing values, making them more unstable. (Some of the models that I have now, are up to the "u" series.)

And I also said there that the models with a start year of 599 are 'spinups', which are also more unstable than any of the subsequent year starts. As these later years use data from models of the previous year that completed, (which will allow these 2 years to be "stitched together" to form a longer year), it's more likely that the parameter values used are from a stable part of parameter space.
And they will definitely be using a spinup that was stable. :)

So your comparison would need to take into account these 2 items: the series name, and the start year of the models.


Backups: Here
ID: 40116 · Report as offensive     Reply Quote
Profile Greg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 40117 - Posted: 12 Jul 2010, 0:22:29 UTC - in response to Message 40115.  

On my own machine, Core i3 Linux, I have had 3 complete and 5 failed, a failure rate of 63%.

I have my suspicions about my computer's memory (Kingston valueRAM), even though it passes the memtest86+ test. I have underclocked the memory by 10% and the latest 4 models are running fine so far. Time will tell.

In case you can't decipher the messed-up table below, the essence was

Darwin failure rate 82%, Windows failure rate 55%, Linux failure rate 42%. Darwin seems to be an outlier.
ID: 40117 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,528,572
RAC: 6,474
Message 40120 - Posted: 12 Jul 2010, 1:07:42 UTC
Last modified: 12 Jul 2010, 1:12:08 UTC

If I recall correctly from beta, the FAMOUS application for Darwin is using a higher optimization because they couldn't compile it without it. That may, or may not have anything to do with the failure rate.

As Les said, however, some of these sets will be inherently more unstable than others due to parameter choices. It's difficult to accept only a 50% success rate when it's previously been > 95%, but that's the nature of running this FAMOUS experiment.
ID: 40120 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,077,407
RAC: 1,835
Message 40121 - Posted: 12 Jul 2010, 1:53:45 UTC

Famous_r149_799_200_006666483_5 completed successfully. OS is Windows 7 64 bit running on Intel Core 2 Duo 2.2 GHz processor with 4 GB of RAM.

I don’t know if it is just luck, but, this is 2 for 2 with the Famous models with the new graphics.


ID: 40121 · Report as offensive     Reply Quote
Profile Greg van Paassen

Send message
Joined: 17 Nov 07
Posts: 142
Credit: 4,271,370
RAC: 0
Message 40125 - Posted: 12 Jul 2010, 3:47:16 UTC - in response to Message 40116.  

More detailed investigation as suggested by Les.

Ignoring anything that is not "famous_uxxx_", and all with _599_ start year, i.e. looking at just "u series and not 599":-

Darwin Xeon (3 computers): 20 succeeded, 70 failed.

Darwin i7 (1 computer): 9 succeeded, 7 failed.

Win Opteron (1 computer): 6 succeeded, 6 failed.

Linux Xeon (2 computers): 15 succeeded, 9 failed.

Linux i7 (1 computer): 5 succeeded, 7 failed.

All of these are compatible with the "about fifty-fifty chance of failure" warning, except for Darwin Xeon. It could be just chance... but it might not.

(And actually, the r series and the "599s" don't make much difference to the percentages, in the tiny sample of computers I looked at.)

I'm not comparing the failure rate to anything--I've been away from the project for a few years, and only had about 10 SM3s before starting on famouses. I don't have Darwin, or a Xeon--more's the pity ;-). I'm just saying that there might be something to look into, using proper statistical methods.

Geophi - compiler (option) problems was my first guess. Famous models seem to be smaller than others, only about 30 MB resident rather than 100+ MB -- CPUs seem to spend less time moving data in and out from memory, and more time computing. Maybe the famous code has flushed out a very obscure intermittent bug.

And maybe it's just chance.

This is about as much investigation as I'm prepared to do without writing scripts, and it'd be better for someone who has direct access to the database to do that. So: leaving it there, thanks for listening. ;-)
ID: 40125 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1081
Credit: 7,002,728
RAC: 4,238
Message 40128 - Posted: 12 Jul 2010, 8:05:58 UTC

On the Darwin thing: I have 5 succeeded and 3 failed on beta. On main-project Windows, 1 succeeded and 3 failed. (Plus, the current beta WUs are apparently exploring a different parameter range - just to add to the confusion over success/failure ratios.)
ID: 40128 · Report as offensive     Reply Quote
Profile JIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,077,407
RAC: 1,835
Message 40195 - Posted: 21 Jul 2010, 0:34:50 UTC

Famous_u0il_1799_200_006667077_3 finished successfully.
OS is Windows 7 64 bit running on Intel Core 2 Duo 2.2 GHz processor with 4 GB of RAM.
THREE IN A ROW AND COUNTING.
ID: 40195 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

Message boards : Number crunching : FAMOUS SUCCESS/FAILURE RATIO

©2024 climateprediction.net