climateprediction.net home page
Posts by Iain Inglis

Posts by Iain Inglis

21) Message boards : Cafe CPDN : Scotland team (Message 38532)
Posted 16 Dec 2009 by Profile Iain Inglis
Post:
See that now... and a very good reason for pausing. :-)

Duly received and processed!
22) Message boards : Number crunching : Iceworld Appeal (Message 38527)
Posted 14 Dec 2009 by Profile Iain Inglis
Post:
David Glogau has added another model to the mix. It freezes at a new point - near the Straits of Gibraltar (top-right) on the Atlantic side.

So here\'s an update of the relevant map.

PS This thread is getting a bit graphics-heavy - I should perhaps start a new one at some point.
23) Message boards : Number crunching : Iceworld Appeal (Message 38523)
Posted 12 Dec 2009 by Profile Iain Inglis
Post:
[Lockleys wrote:]Thanks, Les. I\'ll do as you suggest.

That WU looks like a Windows/Intel iceworld - three people are stuck at that point - so if you\'re happy to re-run the five days then I\'ll be very interested to get the \'.cpdn\' file. You will lose 5 days x 4 CPUs processing but will have nailed another iceworld.

In your situation I would:

1. Abort the iceworld and report it (i.e. press project \'Update\').

2. Backup the installation (call this the \'good\' backup).

3. Restore the 5-day backup and turn the network activity off (this will stop the models being marked on the Web site as \'client detached\' - the message is benign but annoying).

4. Run the 5-day backup with only the model that will become an iceworld.

5. Start recording a day or so before you expect the freeze.

6. Send the \'cpdn\' file at the freeze point.

7. Restore the \'good\' backup and carry on as before.

Thanks.
24) Message boards : Number crunching : Iceworld Appeal (Message 38508)
Posted 11 Dec 2009 by Profile Iain Inglis
Post:
[karfixer wrote:]It looks to be frozen(No Pun intended). I\'m pulling the plug.

Here is a graph of relative model speed vs trickle number for that work unit.

As you can see a number of models have hit the same problem. You were right to abort the model.

Better luck next time!
25) Message boards : Number crunching : Iceworld Appeal (Message 38503)
Posted 10 Dec 2009 by Profile Iain Inglis
Post:
Here is a summary of the steps needed to submit an iceworld:

[Les Bayliss wrote:]

1) Backup your current position and make sure to put it somewhere safe. With an appropriate label! You\'re going to need this to continue later!
2) Restore the pre-iceworld backup.
3) Make sure that the project is set to \'No new tasks\' in the Projects tab.
4) Make sure that BOINC is set to Network activity suspended.
5) Suspend all models except for the \'iceworld\'.
6) Press the \'Show graphics\' button in BOINC.
7) Press <Ctrl> Q to start recording. (Or run the model for a while first, to get closer to the failure point. It took me a while to do this, because I kept missing it.)
8) Close the graphics window (the recording will carry on).
9) Save the relevant .cpdn file. (I saved half a dozen before and after. To be sure; to be sure.)
10) Copy the GOOD backup back to the working location.
11) Continue from where you were. (You can Abort the \'iceworld\' model.)
12) Get the address from Iain for sending the file.
13) Send it.


Thanks to Les for that. I added a couple of steps to start/stop the graphics.
26) Message boards : Number crunching : Iceworld Appeal (Message 38448)
Posted 3 Dec 2009 by Profile Iain Inglis
Post:
Two more iceworlds to report, one from Belfry and one from peterfilla - to whom thanks are due! Both models freeze on the west coast of North America, though in different places. The map earlier in this thread has been updated accordingly.

The fast-processing iceworld from Belfry is the first Linux/AMD model to be analysed in this way. In common with iceworlds on Windows/Intel (slow) and Windows/AMD (fast) the freeze:

* appears in the second timestep of a group of six

* happens at a grid point adjacent to land in a restricted latitude band.

I take this as good evidence that models on all three platforms freeze for the same reason. Keep the models coming, on any platform, as the geographical distribution may give a clue as to the underlying cause.

Now, what we need is a Mac user!
27) Message boards : Number crunching : Iceworld Appeal (Message 38443)
Posted 2 Dec 2009 by Profile Iain Inglis
Post:
WU: 6690551 Iceworld at TS 202.622 (also from backup); generated .cpdn-files; have a backup immediatedly (savepoint) before that point.

Thanks for that: there are several other Windows/Intel machines stuck in that WU, so it\'s definitely a proper iceworld and not a PC problem. I\'ve sent a PM with the e-mail address for the \'.cpdn\' file - it should be 100 - 120 kB.
28) Questions and Answers : Unix/Linux : Graphics recording (Message 38427)
Posted 30 Nov 2009 by Profile Iain Inglis
Post:
@Iain...

I\'ve never seen any symptoms of an iceworld on my Linux AMDs. No significant speedup or slowdown at any point (other than what happens when adding/subtracting the number of models run, or running the slab with different model types). I\'ve also never seen the odd end of phase graphics on the webpage where the temp suddenly changes and goes off the scale. Crashes happen, but they are probably 1 or 2 out of 100. How are you determining an iceworld on AMD/Linux? I\'m sure you might have posted about this somewhere else, but if so, I wasn\'t paying attention at the time.

Then again, since my AMD/Linux PCs are utilizing SSE2, and most others are not, perhaps that puts my PCs in the same boat as Intel/Linux?

The brute force scan of WU s/ts graphs is an attempt to gather circumstantial evidence linking the behaviour of the different platforms in re iceworlds. For example, the Intel/Darwin has the same rate of iceworlds as Intel/Windows: it could therefore be argued indirectly that Mac and Windows iceworlds are examples of the same phenomenon. (If a sympathetic Mac user gives me a \'.cpdn\' file then I would count that as direct evidence.)

However, there\'s another way of getting the rate and that is to estimate the \'reliable processor rate\' - i.e. the rate at which iceworlds occur for a user who tries to finish everything and doesn\'t have other types of crash. That\'s my policy and the rate at which I get iceworlds is the same as the brute force estimate, which validates my brute force iceworld detection method for Windows/Intel. I did then try to find users on other platforms who might operate as I do - and started with you for Linux/AMD! However, as you say, your rate is effectively zero and I gave up rather discouraged: the SSE2 thing perhaps explains why - so perhaps I should have persevered (or sent a PM). There are, of course, plenty of users who do a lot of crunching but it\'s hard for them to babysit lots of models and so I suspect that they just let BOINC get on with it - it\'s not then possible for me to tell whether a crash in their record is an iceworld or a PC upset.
29) Questions and Answers : Unix/Linux : Graphics recording (Message 38420)
Posted 29 Nov 2009 by Profile Iain Inglis
Post:
Has Iain actually discovered that iceworlds can occur on Linux?
On AMD, yes. On Intel, no. When I get to the end of analysing a brute force scan of 1000 WUs that\'s underway, it will be a surprise if the Linux/AMD rate differs much from Windows/Intel, Windows/AMD and Mac (when corrected for platform penetration rates and other factors) - i.e. ~13%.

Does anyone know whether graphics recording slows the model processing much? I imagine it must slow the model about as much as running the graphics.
That\'s exactly what I would expect, but it doesn\'t seem to work that way (on Windows/Intel). I\'ve found no significant slowdown in models when recording: strange but true.

Nice work, Belfry. That\'ll go into the tool bag for when Linux users pitch up here with malfunctioning graphics.
30) Message boards : Number crunching : Iceworld Appeal (Message 38408)
Posted 28 Nov 2009 by Profile Iain Inglis
Post:
Thanks, Eric.

This effort is initially concerned with some pretty basic questions: in particular, \'are fast-processing iceworlds on Windows/AMD, Linux/AMD and Mac the same thing as slow-processing iceworlds on Windows/Intel?\' It\'ll be very surprising if the answer is \'no\', but since I\'ve only got one iceworld that isn\'t Windows/Intel there\'s still a bit of work to be done even on the basics.

So, a Linux/AMD model would be a big step forward, not only because it would extend coverage to a third platform but because the one Windows/AMD model that has been analysed \'froze\' in an unusual place - so your model might add to the variety of freeze points - which must be a significant diagnostic as to the cause (coastal location, restricted latitude range).

And what\'s this \"recording\" that everyone is doing?
The \'recording\' is from the graphics display, where pressing Ctrl-Q will toggle the recording on and off. (This is the graphics display that appears after pressing \'Show graphics\' in BOINC Manager, not the screensaver.) The recording generates a 100-120 kB \'.cpdn\' file per timestep in the model\'s \'tmp\' folder. The \'.cpdn\' playback file is a compressed binary file, which means that it isn\'t necessary to stare at the graphics waiting for an iceworld to happen (which is how I started!) - just set the recording going and look for the change in \'.cpdn\' file size that occurs at the freeze point. The file I need is the one before the file size reduction is noticeable (i.e. which has just one frozen grid point).

I only have the one backup though, how do I start the task over from scratch with the originial zip file?
I don\'t know the answer to that: from your single model restore I would guess you\'re way ahead of me in file editing. However, I do now operate a backup policy of downloading a new model before the old model finishes, finish and report the old model, backup the \'raw\' model (i.e. still in Zip file format), then start again. This allows small backups of uncontaminated models to be moved from machine to machine. However, it is a long way back if the freeze point is missed, so I sometimes make phase end backups as well.

If you do get the model going again from the backup, then send me a PM and I\'ll reply with an e-mail address to which the file can be sent. (It would also be an interesting footnote to find out whether the two executables freeze at the same point: the most likely explanation for platform differences is some arcane instruction set variation in the run-time library. However, that\'s a lot to ask!)
31) Message boards : Number crunching : Iceworld Appeal (Message 38402)
Posted 27 Nov 2009 by Profile Iain Inglis
Post:
It looks like it has finished Iain. I wont be in to the office until after it reports. Does that mean the cpdn files wont be saved for it?

The trickles are already on the Web site and the temperature/precipitation graphs too. So the model looks like a conventional success - and no evidence of an iceworld from the graphs, just the same cool and dry climate reported by the other finisher in the work unit.

I\'m not sure at exactly what point the \'.cpdn\' files are tidied up, but they will certainly have been done by the time the model gets to report. It is a bit of a problem having to know that an iceworld is coming before turning the recording on: I try to download new models as far ahead of time as possible (the BOINC maximum is ten days), so that someone else can get ahead of me. Otherwise, the best method is to wait for an iceworld and then re-run it from backup with the recording switched on some time before the freeze (but I know that\'s a bit of a hassle).
32) Message boards : Number crunching : Iceworld Appeal (Message 38399)
Posted 26 Nov 2009 by Profile Iain Inglis
Post:
Thanks for looking at that, Rick.

The complete model in that work unit is showing a marked decline in precipitation (here) but has a complete set of temperature and precipitation data and doesn\'t slow down. So, I\'d bet that your model will finish OK, though it would appear to have an odd climate.

Tell us how it develops. Even if it finishes successfully, you could treat this model as a dry run (no pun intended) for an iceworld - see if you can find where the plaback \'.cpdn\' files are stored: there should be thousands by now. They\'re cleaned up when the model finishes.

Iain
33) Message boards : Number crunching : Is it worth continuing when someone has already succeeded ? (Message 38392)
Posted 25 Nov 2009 by Profile Iain Inglis
Post:
... as we have different OS, I\'d better keep on crunching this WU.

That\'s right. Since the completed model is Intel/Linux and they don\'t agree even with apparently similar computers, my advice would have been to continue even if you had a Linux/Intel computer.

A few duplicates could be useful to the project as it would help exclude unstable machines - but only a few. Since CPDN models are long relative to other BOINC projects, CPDN doesn\'t use the WU parameters very intelligently (quorum etc.); it would be easier for us if they did, but I suspect that would simply make their experiments longer and longer (i.e. the time taken to get a useful ensemble of individual models).
34) Message boards : Number crunching : Is it worth continuing when someone has already succeeded ? (Message 38390)
Posted 25 Nov 2009 by Profile Iain Inglis
Post:
Welcome to the CPDN message board.

Computers are machines, and identical machines will produce identical results. However, computers differ and the results they produce therefore differ.

Generally speaking, Windows/Intel machines will agree with each other, Windows/AMD machines will agree with Windows/AMD machines and Darwin/Intel with Darwin/Intel. Linux/Intel machines tend not to agree with each other, Linux/AMD agree more. Careless overclocking will reduce agreement too.

So, different BOINC hosts will only agree if the operating system and processing chip are the same.

It has been shown by statistical analysis of the results that the variations between different machines have the same properties as variations in initial conditions (see here, Knight et al.). So, a WU that produces a variety of results is perfectably reasonable from a physical point of view.

I can\'t personally see the point of having more than two complete duplicates of any model. But you won\'t know whether the results are actually the same until the model has finished, so you might as well finish it...

(Your computers are hidden, so it\'s not possible to see whether your model duplicates, or is likely to duplicate, any other.)
35) Message boards : Number crunching : Iceworld Appeal (Message 38359)
Posted 22 Nov 2009 by Profile Iain Inglis
Post:
Two more iceworlds have now been processed, points #22 and 23 on the west coast map - one from Dave Peachey and one of mine, both phase 2 slabs. The map earlier in the thread has been updated.

Thanks Dave!
36) Message boards : Number crunching : NO WU\'S ? (Message 38357)
Posted 22 Nov 2009 by Profile Iain Inglis
Post:
This problem now seems to have been corrected: I\'ve got some new slabs.
37) Message boards : Number crunching : Iceworld Appeal (Message 38343)
Posted 21 Nov 2009 by Profile Iain Inglis
Post:
Thanks Dave. I\'ve been out of circulation for a week or so and may not be entirely reliable for a while. However, I\'ve now got login access to this board again and will send you a PM with my e-mail address. The \'.cpdn\' file can then be analysed and added to the collection: I\'m sure a pattern will emerge in time that will be significant to the project physicists.

(Sorry, Don, for not being able to respond more quickly: I see you\'ve aborted the model now.)
38) Message boards : Number crunching : Iceworld Appeal (Message 38260)
Posted 5 Nov 2009 by Profile Iain Inglis
Post:
By now I\'m at Timestep 102366 of 259248 - Phase 3 of 3

Matthias,

Welcome to the CPDN message board.

From what you say, it does seem to be an iceworld. The rate of progress has slowed dramatically and, since the model is in the final phase, it will not recover.

If this is your first iceworld and you have a backup, then you could restore the backup to see whether the model freezes again at the same place: they usually do. Otherwise, my advice is to abort the model and download another that will then progress at normal speed.

Iain
39) Message boards : Number crunching : Requesting work .. \"Project has no new jobs available.\" But-- (Message 38255)
Posted 4 Nov 2009 by Profile Iain Inglis
Post:
I have now got a model of the required type: two requests provoked a red \'no work\' message, then third time lucky. Perhaps something has been fixed, or perhaps I should have waited a little longer before.

[Edit: it is apparently an intermittent Apache/BOINC server problem, which should now be fixed.]
40) Message boards : Number crunching : Iceworld Appeal (Message 38253)
Posted 4 Nov 2009 by Profile Iain Inglis
Post:
PS I don\'t know whether the new batch of slabs turn into iceworlds. I guess we\'ll find out.

That question is now answered by two adjacent iceworlds from Les Bayliss, u71b and u71c, both from the new batch.

Both west coast crashes (points #20, #21).


Previous 20 · Next 20

©2024 climateprediction.net