climateprediction.net home page
Posts by MikeMarsUK

Posts by MikeMarsUK

61) Message boards : Number crunching : failed upload: can't resolve hostname (Message 47018)
Posted 12 Sep 2013 by Profile MikeMarsUK
Post:
I wonder if that is supposed to be 'rapid-watch' rather than 'apid-wattch'. I have sent a query off to the admins.

If it turns out to be a typo, then it may be possible for the Rutherford Appleton labs to set up a redirect on that address.


-- Edit:

Andy has confirmed - it is a typo in the model definition (it should be rapid-watch).
62) Message boards : Number crunching : Virtualisation (running CPDN inside virtual PCs) (Message 47016)
Posted 12 Sep 2013 by Profile MikeMarsUK
Post:


A very nice idea, but somewhat beyond my interest in getting BOINC to behave. I just solve it by running only one type of CPU project at a time, though I also run GPU projects. Attempts to go beyond that simple scenario always cause me trouble.


One thing which helps slightly is to increase the 'switch time' from 60 minutes to something higher. Usually mine is set to 2 hours but I have changed it to 1440 minutes / 24 hours.

But ultimately, if you are running a mix of small CPU jobs and big CPU jobs, the model won't get a solid unbroken run at the CPU.
63) Message boards : Number crunching : Virtualisation (running CPDN inside virtual PCs) (Message 47015)
Posted 12 Sep 2013 by Profile MikeMarsUK
Post:

... while 7.0.X clients have some problems with the latest Virtual Box release (4.2.18). ...

Interesting. I have been unable to run test4theory since they changed the VM they are using (something to do with running 32-bit Boinc on a 64-bit PC).
64) Message boards : Number crunching : New Tasks not being snapped up (Message 47013)
Posted 12 Sep 2013 by Profile MikeMarsUK
Post:



I have moved the virtualisation posts into a new thread:
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=7660
65) Message boards : Number crunching : Virtualisation (running CPDN inside virtual PCs) (Message 47012)
Posted 12 Sep 2013 by Profile MikeMarsUK
Post:


Probably one of us should start a new thread on the virtualization subject.

(I have created this thread & moved the virtualisation posts)
66) Message boards : Number crunching : Virtualisation (running CPDN inside virtual PCs) (Message 47011)
Posted 12 Sep 2013 by Profile MikeMarsUK
Post:

Presumably task prioritisation does not work across a VM boundary? (i.e., will CPDN running at idle inside the VM take processor time on the host as if it was running at normal priority?)

Some years ago I was running an Oracle server inside a VM on my home PC. I did find that it would cause temporary network problems when starting up or shutting down the VM (so I had to suspend Boinc briefly each time to avoid no-heartbeat errors).
67) Message boards : Number crunching : New Tasks not being snapped up (Message 47001)
Posted 11 Sep 2013 by Profile MikeMarsUK
Post:


...BTW There are several in the BOINC tables around me that picked up credits on 17/8, but only on that day. Any relevance?


I believe that this was residual credit calculated on the 30th July (just before the old server went bang), which got exported for the first time when the credit export process was turned back on.

68) Message boards : Number crunching : New Tasks not being snapped up (Message 46995)
Posted 10 Sep 2013 by Profile MikeMarsUK
Post:
It's primarily because the number of tasks created was so large, it has immediately swamped the pool of available computers. And there are still tasks being added - it has gone up by 1000 or so since yesterday.
69) Questions and Answers : Windows : Optimise PC build for CPDN (Message 46992)
Posted 9 Sep 2013 by Profile MikeMarsUK
Post:
Yes, the INVALID THETA errors are different. They can be caused by either the model's initial parameters resulting in an implausible climate, or they can be due to floating point errors creeping in. They will also be at the 25%/50%/75% boundaries because that is when the model validation takes place.

You've only had the one of these, and your machine has passed long stability checks, so in your case I think the model itself is to blame. If you were getting lots of INVALID THETAs, while other people running the same models were not, then there would be a cause for concern, but that isn't the case.

Even if you saw a number of THETAs turning up, they may be related to a particular batch of models, so one of the things to look at would be if they were generated at similar times or different times.

I would suggest ramping up the number of models & seeing what happens.
70) Message boards : Number crunching : Why do I keep getting a 'Computation Error'? (Message 46983)
Posted 7 Sep 2013 by Profile MikeMarsUK
Post:
Which one of your computers is it? (Could you post a link to both the computer & the task in quest) There is one with a lot of 'aborted by user'...




Task ID
click for details
Show names Work unit ID
click for details Sent Time reported
or deadline
explain Status Run time
(sec) CPU time
(sec) Claimed credit Granted credit Application
15999324 8540636 1 Sep 2013 15:45:42 UTC 5 Sep 2013 4:51:14 UTC Aborted by user 11,918.64 11,759.16 0.00 --- UK Met Office Coupled Model Full Resolution Ocean v6.07
15973944 8591527 30 Aug 2013 9:33:20 UTC 1 Sep 2013 15:45:42 UTC Aborted by user 43,380.69 32,350.35 0.00 --- UK Met Office Coupled Model Full Resolution Ocean v6.07
15928546 8557817 20 Aug 2013 9:18:11 UTC 30 Aug 2013 9:33:20 UTC Aborted by user 130,565.45 107,276.60 0.00 --- UK Met Office Coupled Model Full Resolution Ocean v6.07
15925569 8339457 18 Aug 2013 22:03:05 UTC 19 Aug 2013 1:48:48 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P Pacific North West v6.09
15855491 8540503 23 Jun 2013 14:15:45 UTC 16 Aug 2013 6:55:51 UTC Aborted by user 583,111.53 450,435.70 0.00 933.12 UK Met Office Coupled Model Full Resolution Ocean v6.07
15815532 8474790 1 Jun 2013 11:06:45 UTC 16 Jun 2013 20:41:25 UTC Aborted by user 58,233.55 48,268.72 0.00 --- UK Met Office Coupled Model Full Resolution Ocean v6.07
15798768 8473794 27 May 2013 11:36:30 UTC 1 Jun 2013 1:15:04 UTC Aborted by user 13,557.45 13,256.55 0.00 --- UK Met Office Coupled Model Full Resolution Ocean v6.07
15796716 8469378 26 May 2013 12:06:13 UTC 27 May 2013 5:45:59 UTC Aborted by user 5,081.69 4,825.03 0.00 --- UK Met Office Coupled Model Full Resolution Ocean v6.07
71) Questions and Answers : Windows : Optimise PC build for CPDN (Message 46976)
Posted 6 Sep 2013 by Profile MikeMarsUK
Post:
... currently set at 8 tasks. When Mike first mentioned 16 threads, I took that as meaning CPU threads as hyperthreading is on. ...


Yes, I was looking at the 'processor' count on your computer page (= actually the number of CPU threads). 8 models is easier on the machine than 16 :-) I actually run 6 models on 4 cores / 8 threads, any more than that and a) it makes my machine struggle, and b) throughput did not increase anyway. The best individual processing speed comes from running one model per core.

Let us know if you have any failures after the above changes. (Fingers crossed...)
72) Questions and Answers : Windows : Optimise PC build for CPDN (Message 46972)
Posted 5 Sep 2013 by Profile MikeMarsUK
Post:
One more thing:

This is the model running on your machine:
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=15998767


This is the same model running on someone else's machine.
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=15816061


Note that they both crashed at the same point. It might simply be coincidence (the risky 50% point), or perhaps the model itself was doomed to die then anyway.
73) Questions and Answers : Windows : Optimise PC build for CPDN (Message 46971)
Posted 5 Sep 2013 by Profile MikeMarsUK
Post:

... It sounds like it will be active by default on Intel 68 and up even without the configuration program installed (the driver is part of Win7/8).
...


It won't use ISRT by default unless it is set up in both the Bios & Windows... it took me several days and much swearing before I could get ISRT working with my PC! (I set it up on my main disk, and have a secondary disk for Boinc which is not cached).


... How do we know for certain this is where BOINC builds the zipped files? I notice a file in c:/windows/temp/ called DMI3FCD.tmp of 0KB and timed at 17:37 (NZ time), which was the time task 15998767 crash. Could be it? ...


Well, it is probably associated with the model, but whether it was actually the starting point for a .Zip or not I cannot tell. That's a typical name for a temporary file when requested by something using the Windows API. But try excluding it, and see if that helps.


... PC had a 40 hour burn before it left the shop, and a 30 hour one after I had installed all the extra gear and software, but could run it again in the weekend if it was going to be beneficial. ...


Well, if it's already had a stress-test done, there is little point in doing it again.


... the drive that houses BOINC ...


That sort of suggests that you have other drives available. Just as an experiment, it may be worth moving the Boinc Data folder over to a different drive (as long as it isn't an SSD), to see if you can see a difference.



Speaking of which, in the (very) old days it used to be possible to set up a RAM disk at system startup, and store it at shutdown. While that wouldn't be a good idea for the models (since you would risk losing progress if the system shuts down unexpectedly), it may also be worth experimenting with.



The overall impression I am getting from this is that the problem is not stability (otherwise the crashes would be along the lines of NEGATIVE THETA etc), but something to do directly or indirectly with disk access (which is why I mentioned antivirus software earlier).
74) Questions and Answers : Windows : Optimise PC build for CPDN (Message 46967)
Posted 5 Sep 2013 by Profile MikeMarsUK
Post:
Two minutes waiting for shutdown tasks to write sounds pretty out-of-whack. ...


Well, he has 16 threads so there are an awful lot of models to shut down.




75) Questions and Answers : Windows : Optimise PC build for CPDN (Message 46966)
Posted 5 Sep 2013 by Profile MikeMarsUK
Post:

I would recommend against an SSD for CPDN. I calculated my 64mb Intel 520 would only survive around 6 weeks in theory (each model generates something like a terabyte of writes over its life), so I moved the Boinc data directory onto a physical disk. A bigger disk, or single-level-cell flash would last longer.

I see both 'signal 11' and 'code 193' in the status of those jobs. Does the time of the crashes correspond to anything particular?

As a starting point:

* Change your settings to 'Leave tasks in memory when suspended' = Y, 'suspend if CPU usage is above %' to 0%, 'Use at most ... % of CPU' to 100.00. This will prevent the model being swapped out of memory.

* Make sure that the Boinc data directories are excluded from any antivirus scans
If the crash always happens at the moment that the zip files are generated (25%, 50%, 75%) then I would be looking at the antivirus software on that PC first, it may be interfering. Obviously you shouldn't turn off antivirus, but you can exclude the appropriate directories (which may include c:/temp/ if the files are generated there).


Here is a good post regarding error numbers:
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=7592&nowrap=true#46161


If neither of these helps, try running a 'stress test' for 24 or 48 hours on the PC. I use prime95 (one copy per thread, to test the CPU), and memtest86+ (to test the memory).
76) Message boards : Number crunching : News and Announcements (Message 46955)
Posted 3 Sep 2013 by Profile MikeMarsUK
Post:


The credit generation system is still not working after the climateapps2 server rebuild, the administrators are aware & have been investigating for quite some time. Once it is resolved, everyone will get the outstanding credit for work done since the original server failed (30th July).
77) Message boards : Number crunching : More Work (Message 46947)
Posted 2 Sep 2013 by Profile MikeMarsUK
Post:

I have asked the admins about the credit situation & they're still trying to figure out where the problem is. The CPDN credit system is weird & complex internally.

78) Message boards : Number crunching : More Work (Message 46944)
Posted 2 Sep 2013 by Profile MikeMarsUK
Post:
Oh, sorry, I thought you meant this, I'm 4 miles from it. Lost my power and I'm running on my generator now.

http://www.mymotherlode.com/news/index.php


Hope they get it under control soon, fires are pretty scary. When I was in Australia one got to about 1/2 a mile away, we ran for the hills...

79) Message boards : Number crunching : More Work (Message 46941)
Posted 2 Sep 2013 by Profile MikeMarsUK
Post:
Yeah exactly, running from crisis to crisis with little time to do anything else.
80) Message boards : climateprediction.net Science : Energy Efficiency: Combi Boiler (Message 46940)
Posted 2 Sep 2013 by Profile MikeMarsUK
Post:
... I'm having trouble unpicking this, unless you live pretty much exclusively in the room with the stove!!


Presumably due to the 7.5kw hot water heating which comes from his wood-burner. That's quite a lot of heat!!



Previous 20 · Next 20

©2024 climateprediction.net