climateprediction.net home page
Posts by glaesum

Posts by glaesum

21) Message boards : Number crunching : Long shutdowns and keeping HADSM model alive (Message 38162)
Posted 21 Oct 2009 by glaesum
Post:
thanks Iain, Les and Mo - each of you can be relied on to add some extra piece of information and advice.

so, in summary, I take it that all means it\'s ok to go ahead! ;-)

I just had a blast at Einstein@H to get myself into the top quartile of crunchers - job done so back to cpdn again!
22) Message boards : Number crunching : Long shutdowns and keeping HADSM model alive (Message 38148)
Posted 20 Oct 2009 by glaesum
Post:
the question is a little subtler than I could put in the title:

I\'ve just finished a slab model (shortly to report), the first for a while after that run of HADAM models, and I\'ll be having to shut down in a few weeks for well over a month.

Now I noticed with the recent slab model that only a few replicants were issued to begin with, then more were sent out gradually and even now 2 wus are unsent. This seems a good policy of reducing a bit of the duplication of the high IR needed in cpdn\'s long tasks.

Unusually I was the first to finish this last model, as I\'m near end of the life-cycle of this pc, and it\'s not so fast relatively speaking. Anyway I\'ll be shutting down for a period when the next slab model would be only half or at best 2/3rds complete.

I have a vague recollection of seeing a post saying that after 6weeks of \'not calling home\' the models are deemed inactive; I suppose without progress more replicants will be issued to other computers, so my question is whether this is unnecessarily inefficient and whether I would do better to go NNW now and concentrate on other projects meanwhile. In the new year I\'ll let boinc run cpdn models on both cores and catch up a bit giving cpdn higher priority for a couple of months. (and with luck a new multi-core pc soon after.)

It\'s a pity there are no short HADAM models around at the moment because I put an extra 1GB of 2nd-hand memory from a tech friend in the old tub during the summer and it ran them very well until the well ran dry.

/pg
23) Message boards : Number crunching : Orphaned models (Message 37849)
Posted 20 Aug 2009 by glaesum
Post:
thanks Les and Mo, I did try to hold the task from finishing during the worst of the server outages - it did seem ok at the time so I let it finish. The reporting \'trickle\' followed the zips upload by about an hour an a half triggered by the boinc mgr - I didn\'t force it manually. Sorry I haven\'t been doing back-ups of these shorter models - the 80yr and 160yr ones were a different matter, I did try to copy the boinc folder about every 10 model yrs with those.

it\'s still odd how many hadam3p models only seem to register 72000 timesteps rather than the full 72096.

/pg
24) Message boards : Number crunching : Orphaned models (Message 37841)
Posted 19 Aug 2009 by glaesum
Post:
I saw a slightly similar \"An extra model?\" thread in the Q&A: Windows forum but thought this was best under \"Number Crunching\".

I shoe-horned in another gig of RAM in July so have been receiving just HADAM3p models since then.

A hadam3p model was approaching it\'s final day or two and I struggled to get a second model downloaded (I\'m set to run dual core) even when I shut all other projects down. I checked the forums about the various outages and then noticed this orphaned model and wondered if this extra \'active\' model was somehow blocking the request for more work.

With the task I started on Aug3rd #9280478, wu #6498292 it seems to have returned its final trickle (72000 timesteps?)

12/08/2009 11:54:18|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
12/08/2009 11:54:23|climateprediction.net|Scheduler request completed: got 0 new tasks

which matches with the online task details 11:57:07.
then it does another upload at 13:44, there are the final zips but it looks like another trickle unless this is the header for the zips. later there is the closing report at 15:25 and the model cleared from my Boinc manager.

but the model is not showing as finished in the database.

12/08/2009 13:38:48|climateprediction.net|Resuming task hadam3p_nawx_1977_2_006232235_2 using hadam3p version 606
12/08/2009 13:44:03|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
12/08/2009 13:44:05|climateprediction.net|Scheduler request completed: got 0 new tasks
12/08/2009 13:44:21|climateprediction.net|Computation for task hadam3p_nawx_1977_2_006232235_2 finished
12/08/2009 13:44:24|climateprediction.net|Started upload of hadam3p_nawx_1977_2_006232235_2_1.zip
12/08/2009 13:44:24|climateprediction.net|Started upload of hadam3p_nawx_1977_2_006232235_2_2.zip
12/08/2009 14:04:41|climateprediction.net|Finished upload of hadam3p_nawx_1977_2_006232235_2_2.zip
12/08/2009 14:04:41|climateprediction.net|Started upload of hadam3p_nawx_1977_2_006232235_2_3.zip
12/08/2009 14:05:00|climateprediction.net|Finished upload of hadam3p_nawx_1977_2_006232235_2_3.zip
12/08/2009 14:09:02|climateprediction.net|Finished upload of hadam3p_nawx_1977_2_006232235_2_1.zip
12/08/2009 15:25:50|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 1 completed tasks
12/08/2009 15:25:51|climateprediction.net|Scheduler request completed: got 0 new tasks

I held it at 99.9% until it seemed the uploader server was working.
so my question is - does this matter, have the final uploads completed or got lost with all the server problems, can anything be done, will I continue to struggle to have overlapping working models? One wingman has errored and the other seems to have active trickles. (I did see a slight slowing down when models overlapped earlier this week but it wasn\'t very serious)

The next model really laboured to download on Aug11th, that was when all the new Facebook users hammered the server but that eventually d/l\'ed fine and is showing fully completed and valid now too.

/pg

Ed: looking at the discussion in the thread \"No trickles in task details\" running alongside mine - it seems only one of my models has completed a full 72096 timesteps, the others only reported 72000 steps.
25) Message boards : Number crunching : hadcm slight deceleration of secs/TS (Message 33590)
Posted 26 Apr 2008 by glaesum
Post:

It sounds as if your processor was being \'thermally throttled\' (automatically slowed down when overheating is detected by the Bios), well done for solving the issue :-)

hi mike - yes I agree, the most probable explanation; that\'s why I thought it worth mentioning as those of us in the northern hemisphere go into \'summer mode\' putting more heat strain on our rigs. Actually I\'ll be stopping treating my pcs as heaters now and shutting them down more often. I\'m also going to try and cadge the tail end of a can of pressurised air off a tecchie friend to give the heat sink another blast.
26) Message boards : Number crunching : hadcm slight deceleration of secs/TS (Message 33583)
Posted 26 Apr 2008 by glaesum
Post:
a small news update since I last reported:

within a few days of the slight improvement that I got from the file clean up and defrag things turned rapidly for the worse and the timestep interval climbed back to 4.75s/TS within about 4 days so I finally got down to cleaning the dust out. it was pretty dreadful in there and took ages to suck out - the fan on the video processor card was probably the worst; also without having a jet blower the fine fins on the main cpu are not quite fully clear but anyway everything immediately improved.

at a rough estimate, its now crunching about 15% faster and, if you check the trickle list, I\'m managing about 3 years every four days (apart from one shut down day) and this has now been pretty consistent for a few weeks so the deceleration problem looks pretty well fixed. this rate is close to 4s/TS and the cumulative trickles are gradually approaching this - it did go a bit faster with the previous model (more like 28hrs/year than 32hrs/year) but at least things are now more respectable.

the moral is definitely to clean out the dust periodically!

/pg
27) Message boards : Number crunching : hadcm slight deceleration of secs/TS (Message 33048)
Posted 21 Mar 2008 by glaesum
Post:
Conan - perhaps something for others to watch out for, I do everything to prevent the hadcm task switch out as the lost time cannot be recovered on such long work units; I leave the second core to share the remaining time between other projects.
_

meanwhile... ...I\'ve cleared out 5-6GB of media files, redundant or backed-up elsewhere, and done a couple of defrags (it says 15% free space is preferred) and rebooted.
the model continued to slow down to 4.726s/TS but - short of coincidence - this last clean-up has turned things around; the latest trickle reported at 4.69 and the graphic screen is currently quoting 4.68.
because we are talking overall averages and I\'m on year1936 this an improvement from ~123k-126ksecs per trickle-year to ~109ksecs per year. that is still only 4.2s/TS in the long run but at hopefully at least I\'ve stopped the rot. I probably should reboot the system more often as it can get a bit sludgier after a week or so.

perhaps cleaning out the dust is the next project for the holiday weekend!! aitchooooo... :)
28) Message boards : Number crunching : hadcm slight deceleration of secs/TS (Message 32968)
Posted 14 Mar 2008 by glaesum
Post:
>snip<

The current 5.44 model can be 30% slower than the 5.15 models at the BBC. So, your previous 2 sec/timestep becomes 2.6 s/ts.

So, the numbers are about right - but always slightly worse ...

thanks, interesting point about v.5.44 against the older v.5.15 which I didn\'t know.
the calculations make sense, however we still have the two puzzles of why the models are decelerating fairly systematically and why the second model (160yrs rather than the previous 80yr one) started off at the speed the first one finished at.
29) Message boards : Number crunching : hadcm slight deceleration of secs/TS (Message 32955)
Posted 14 Mar 2008 by glaesum
Post:
hi again Iain and everyone,

I let things run for a while after the recent server problems settled down and the old model had finished. The model continued to slow down right to the end. More worrying is that the new model started up matching the same slow speed as the old one (~4s/TS) and then decelerated quite sharply while the old was paused on suspend. Even now it\'s crunching on its own, it is still slowing although not as markedly. It has now done 10yrs dropping down to 4.64s/TS with another trickle due this afternoon which should show about 4.66s/TS. Each year is now taking over 36hrs and it was only a little over a day at the beginning of the year - once the weather warms up I\'m unlikely to crunch 24/7 so it could take all year at this rate.

I looked at my old BBC CCE account too and, while only the last few dozen trickles are available to view (on a model that failed with negative pressure c.2050), the machine was performing at 1.98s/TS at that time last May. Admittedly that was before I discovered how to set both cpu virtual cores running which will account for quite a bit of this change. The only other comparison to look at is the slab model run last October.

Despite all this, it\'s still pleasing to have completed my first coupled model on the main cpdn project.

/Pete
30) Message boards : Number crunching : Trickles not showing. (Message 32916)
Posted 11 Mar 2008 by glaesum
Post:
the latest trickle (a couple of hours ago) showed on the database within a minute and there was a general catch up of more trickles yesterday, the 10th. so I think that is a cue to run out the old model to completion. credits seem to follow within the day.

it wasn\'t impatience merely for the trickles, just wanting to follow up on the other thread as my new model is running even slower and I\'d like it to crunch on its own for a while first.

31) Message boards : Number crunching : Trickles not showing. (Message 32874)
Posted 6 Mar 2008 by glaesum
Post:
I\'ve run my 80yr coupled model to within 3hrs of completion and, with the end of the week coming, I wondered how things were getting on and whether it is deemed safe now to send the final trickle and result upload zips?
32) Message boards : Number crunching : Trickles not showing. (Message 32852)
Posted 4 Mar 2008 by glaesum
Post:
Ok thanks, since I\'m running various projects, suspending the task is the better option here although I did suspend network activity for a couple of days last week when the server probs were at their worst. Indeed I also backed up the model at 98% to protect the final lunge to the finish post.

I can see there is still quite a delay before trickles are showing in the model results. Keep us posted with Milo\'s progress - we\'re all cheering him on.

/pg
33) Message boards : Number crunching : Trickles not showing. (Message 32836)
Posted 3 Mar 2008 by glaesum
Post:
I\'m in Oct 2079 on a model with two final trickles to complete it. Is it safer to hold back the final trickle of the model until the current problems are completely cleared up? My trickles are have caught up now and all the credits exported except one lot on each model. Plenty more to crunch meanwhile on the replacement model which is only at 1925!
34) Message boards : Number crunching : hadcm slight deceleration of secs/TS (Message 32782)
Posted 28 Feb 2008 by glaesum
Post:
interesting and very pretty graphic!! the big gap was obviously xmas when I was shut down.

we might learn a bit more when the model finally finishes (and see how fast the new model runs on its own); the old one is set to complete in 109 hours but I\'ve just set a timer against the clock as it\'s clearly running down much slower than that prediction.

I\'ll turn network activity back on when the all the server problems are back to normal. It\'s a pity we can\'t look at the last couple of trickles - I\'d also like to see my new model successfully get it\'s first trickle showing and check it\'s happy.
35) Message boards : Number crunching : Any Beta Work? (Message 32780)
Posted 28 Feb 2008 by glaesum
Post:
if you look on your account page, in the cpdn preferences section, you\'ll see there are 3 applications. to see how a production task runs normally you could try running a Slab (hadsm) model only, these are a bit smaller and you might be able to knock it out in 3wks or so depending on your engine size. that might help give a feel when comparing with any test jobs. just a thought... :)
36) Message boards : Number crunching : hadcm slight deceleration of secs/TS (Message 32774)
Posted 28 Feb 2008 by glaesum
Post:
I\'ve been pondering the various answers - thanks to all.

it\'s difficult to elicit a cause that would gradually and systematically produce the slowing down symptoms.

I don\'t do the obvious no-nos: I don\'t use the screensaver and only occasionally peek at the graphics perhaps to check near the end of a year or decade or for the savepoint. I don\'t shutdown much (during the heating season) and anyway the loss of time crunched from last checkpoint would be random and not linked to %age of task completed. There are no signs of rewinding or looping - not that I\'ve seen.

on the matter of it\'s partner processing in the other core, apart from the overlap of two CM models this week, I don\'t normally do two cpdn tasks. otherwise it has been a fairly consistent mix of rosetta, malaria, einstein (and spasmodic lhc, somewhat more this month) - I haven\'t changed the resource share much since I set them up. The cpu is indeed a virtual not a true dual core as Richard guessed.

point taken on disk maintenance: I\'ve made a tentative start on a bit of clearout, first out were the temp internet files and I found one dead aborted hadcm folder. The old disused BBC-CE folder was also given the heave-ho - it was archived anyway! Next on the list is re-organising a load of media files on to a USB drive...
I\'ll do the full defrag when the 80yr model finishes next week when I\'ll shut down all processes and let it get on with it.

I did look at cpu usage and I guess it would help one %age point or two if I shut a couple of dozen Firefox tabs down! The DVD media player takes 20% or a little more when running - that\'s the only thing I\'ve been able to see that might be resource hungry.

since the original post, of course no fresh trickles are showing on the web-database during the current cpdn server problems for you to see - the models are now saying 3.99s/TS and 4.02s/TS respectively, even slower than two days ago.

has anyone attempted to put together a very rough guidance table of expected processing speed of the most popular cpus of various vintages? This P4 is now abt 2.5yrs old.
37) Message boards : Number crunching : hadcm slight deceleration of secs/TS (Message 32736)
Posted 26 Feb 2008 by glaesum
Post:
my 80yr coupled model is running fine, under a week away from finishing. I\'m just slightly curious at noticing on the full trickles results page that the seconds per timestep (s/TS) is gradually slowing down throughout the model. It started at abt 3.4s/TS then settled around 3.79s/TS for much of the long haul and during the last month it has slipped further to 3.96s/TS.

do models typically slow up slightly or could my pc have built up sludge processes using up crunching resources? I\'m not very tecchie about \'looking under the hood\'... lol

for the power of my pc it also seems a bit slower than others who typically report speeds under 3s/TS.

I\'ve opened the gate and let a new model download (a 160yr one this time) and run in the other core; it\'s kicked out the other projects and started crunching - probably some long term debt on cpdn. This is going a similar speed to the old 80yr model - currently 3.98s/TS.
38) Message boards : Number crunching : Why is Climate Prediction grabbing all the CPU time (Message 31308)
Posted 8 Nov 2007 by glaesum
Post:
is the formula behind the number in the public domain?

Yup.
http://krijnen.com/time.php

If it\'s 3:30 AM (I\'m assuming it is) then the Unix time stamp works out to be 1344411015.

you\'re all going to have laugh at this:

I checked your conversion page - \'human to unix time\' and back - what a nice expression!
but got that 1927 date yet again in my BOINC manager, aaargh...
poked around a bit, putting one of my Rosetta end dates in to convert in the opposite direction which looked ok apart from an exact 5hour offset.
final inspiration was to put the weird 26-05-1927 date into the unix date converter and out popped \"-1344393015\" it had gone negative!!
I\'d read the dash as a hyphen not a minus sign! I\'ve no idea whether I inserted the dash myself or whether this was the effect of the bug, either way the problem is fixed. There is that 5hr offset again something unlikely to be of consequence. (This also explains why reducing the number, in fact making less negative i.e. increasing it, made a small improvement when I was experimenting.)

If one halves the 80yr difference between 2007 and 1927, you get 1967 although the zero point for the Unix timestamp seems to be a five hours before the turn of 1970. I suspect this recurrent 5yr offset is something to do with the difference between GMT and EST.

matter closed, the new model is now past its first 1% \'milepebble\', thanks all.
hope the narrative helps any others. /Pete
39) Message boards : Number crunching : Why is Climate Prediction grabbing all the CPU time (Message 31304)
Posted 8 Nov 2007 by glaesum
Post:
thanks to all above: so, we\'re now running 5.10.28 installed in /program files/BOINC/ and I did Les\'s clock test disconnecting from the net. things are back crunching with CPDN status now saying \"running, high priority\", but the deadline remained 1901, probably not surprising as the client state file already had the faulty date code in it.
at least the manager\'s log doesn\'t shout back in red text which I guess is a small step forward. :-)

so I returned to Thyme Lawn\'s instructions above:
You can fix it by stopping BOINC and editing the file client_state.xml as described here - you\'ll need to search for the string hadcm3iozn_cpdf_2000_80_95898652_0 and change the deadline value to 1342515406 to reset to the July 2012 deadline shown on the result\'s page.
(ed. actually searching on the word \"deadline\" is easier. /pg)

but this only shifted the date to 1927. even using the advice in Dagorath\'s help page on the 1901 issue it only shifted to 1930. I found reducing rather than increasing the number in fact helped and it\'s currently on 18/12/1936 with date code \"1042515406.000000\". I wondered if there might have been a typo error in Thyme\'s post - is the formula behind the number in the public domain? The model itself is of course now a different one with a true deadline on the results page of 8 Aug 2012 3:30:15 UTC.

/pg
40) Message boards : Number crunching : Why is Climate Prediction grabbing all the CPU time (Message 31293)
Posted 7 Nov 2007 by glaesum
Post:
sigh... ...back to the original question!

my slab model will complete within the hour (Wey hey!) so I let the manager allow new tasks and it has given me yet another CM model with a 1901 deadline. actually it tried to give me two but I aborted the download on the second before it could complete.

I thought this would also be a good time to upgrade my (5.4.11) BOINC version as well when there was nothing of accumulated crunched value to lose. I\'ve cleared out any remaining Rosetta tasks and done a backup both today and yesterday as I\'d heard of model crashes on 99% just prior to final upload. I have install files for 5.8.16 and 5.10.20 plus the latest 5.10.28 on my hd in case of strong opinions on the version number.

I\'d quite like to get everything into the default /boinc/ folder rather than the /BBC_CCE/ one - I think creating a fresh folder and copying all the contents across may be a better way than just renaming the old folder - a belt & braces back-up.

Perhaps the old version of BOINC is part of my completion date problem...
either way, any more thoughts on fixing the date question?

Do I kill off this further 13dec1901 CM 80yr model before doing the upgrade and hope for the best I get a normal one afterwards? It\'s yet to crunch for even an hour.

thanks again /pg


Previous 20 · Next 20

©2024 climateprediction.net