climateprediction.net home page
Posts by Greg van Paassen

Posts by Greg van Paassen

1) Questions and Answers : Unix/Linux : Lots of tasks end with "Error While Computing". Is there a problem at my end? (Message 49565)
Posted 16 Jul 2014 by Profile Greg van Paassen
Post:
WB8ILI: use the `ldd' command as described in this thread.

That tells you which shared library files CPDN applications are using.

Then use `dpkg-query --search filename' to find the owning package, and check for updates.

(No doubt there's a quicker way to do this, but I don't know it off the top of my head.)

By the way, I'm also using 32-bit 12.04 LTS. It's fine; there's no need to upgrade to 14.04 yet, if you don't want to.
2) Message boards : Number crunching : Credit updates? (Message 49504)
Posted 5 Jul 2014 by Profile Greg van Paassen
Post:
I've actually realized that I've only got a couple of computers that I can run this project on - I don't think most of my computers would be able to finish WUs before the deadlines.

No, you're fine. CPDN accepts work that finishes after deadline.

Deadlines are a BOINC feature that CPDN uses for two things:-

  • To re-issue work to another computer, if no results have come back yet. (Unfortunately many computers accept tasks and then "go dark".)
  • As a crude prioritisation mechanism. For instance, the RAPID-RAPIT/CHAAOS researchers wanted to get some results in a hurry last year (I'm guessing for IPCC AR5--but I'm not confident about that), so those units have a very short deadline. But I understand that they are still taking and using results that come in after deadline.


So don't worry about the deadlines. Keep crunching!

3) Message boards : Number crunching : Credit updates? (Message 49352)
Posted 13 Jun 2014 by Profile Greg van Paassen
Post:
Fox, the credit calculation script hasn't run for about ten days now. When it does run, you'll get your credits.
4) Message boards : Number crunching : HADAM3P not getting credits (Message 49241)
Posted 29 May 2014 by Profile Greg van Paassen
Post:
hi Nigel,

please copy and paste your event log to a pastebin and post the URL here - or in a private message to Les, if you think here is too public.

The next step turn on debugging in your cc_config.xml - searching this message board via google ("site:climateapps2.oerc.ox.ac.uk cc_config.xml debugging"))ought to provide you with several sets of instructions on how to do that.
5) Questions and Answers : Windows : Optimise PC build for CPDN (Message 49112)
Posted 15 May 2014 by Profile Greg van Paassen
Post:
Martin-

Jim has the right of it. With 8 hadcm3ns, my i7-2600 maxed out around 3400 RAC. With other models--FAMOUS, Weather at Home--it could get over 6000. There's some technical reason why they can't adjust the credits per trickle for hadcm3n. I see you're running 5 hadcm3ns; if in the past you were running all Weather At Home models, that would account for the drop in RAC.

I currently have a similar situation. My PC is running only 7 models, 4 hadam3pm2s and 3 hadcm3ns, and its RAC is over 10,000: it's no. 4 on the "top computers" list as I write, which is just silly. I think the credit allocation for hadam3pm2 is also wrong-in the opposite direction.

But as Les says, getting a Windows version of hadam3pm2 out the door and responding to scientists' demands are probably higher priorities than adjusting credits. We don't run CPDN work because of the project's focus on PR and community interaction. ;-)
6) Message boards : Number crunching : UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03 (Message 49033)
Posted 4 May 2014 by Profile Greg van Paassen
Post:
It probably doesn't matter either way, Dave.

I note from the "server status" page that the admins have deleted the bulk of this batch of hadam3pm2s.

I'm guessing that they'll fix the two major problems, incredibly wrong and long running time estimate and failing with error if the computer (or boinc) is ever restarted while they run, and then re-issue the batch.

With luck there'll be a Windows version as well, and a fix for the cross-project Boinc stats too ... but maybe those are two fixes too far at this stage. ;-)

When they re-issue the work, though, the admins will probably delete any results dating from before the re-issue, in order to have a clean data set for the scientists.

So: run it, or don't run it. If you do, you'll get credits (eventually), but the scientists probably won't use the results. If you are supporting other projects, perhaps they could use the CPU cycles instead. (I note that there are a few HadCM3N tasks available from CPDN, too. I don't know what that's about.)
7) Message boards : Number crunching : hadam3p eu WU segfault (Message 49011)
Posted 1 May 2014 by Profile Greg van Paassen
Post:
pvh, try a couple of sysctls (in sysctl.conf):-

# vm.swappiness controls how aggressively the kernel swaps out inactive processes. Default 50, lower=less quick to swap things out.
vm.swappiness=10

# vm.vfs_cache_pressure controls how fast the kernel reclaims memory used to cache
# copies of inodes and directory entries. Default 100, lower -> slower to reclaim them.
vm.vfs_cache_pressure=10


Reasoning:

1. Boinc is sensitive to its processes being swapped out. (Whether or not its developers agree.)

2. CPDN processes are sensitive to disk timing issues, especially at zip creation time. Keeping core filesystem records in RAM reduces waits for disk access.

These two steps reduced the number of mysterious crashes I had on one of my boxes. I still got a few at zip creation time, though. See next step.

Try changing the disk scheduler to deadline. CFQ is used by default in a lot of distros. CFQ is known to cause problems with some workloads; maybe CPDN and/or BOINC is one. After I changed to deadline, I haven't had any inexplicable crashes. Example code:

# enumerate disks, set scheduler for each
for d in `ls /dev/sd[a-z]`; do
   disk=`basename $d`
   echo deadline > /sys/block/$disk/queue/scheduler
done


You could add this to the kernel command line in your bootloader configuration:

... elevator=deadline ...
(where the dots stand for other stuff already present)
8) Message boards : Number crunching : CONVERTING TO LINUX (Message 48927)
Posted 28 Apr 2014 by Profile Greg van Paassen
Post:
I ran CPDN on 32-bit (not 64-bit!) Ubuntu in a VirtualBox session and it worked right out of the box. No problems at all.


Thanks for the advise, but, I am not sure that I want to install a 32 bit OS and throw away half of my RAM. Does anyone know of a 64 Linux distro that comes with 32 compatibly libraries already installed.

Jim, most current 32-bit Linux distributions can use all the RAM available (up to 64GB) because of a feature called PAE. Certainly all of the popular distros have PAE.

For example, I am running 32-bit Ubuntu on a box with 12 GB of RAM, and all of the RAM is being used.

PAE should in theory slow down the box, but if it does, the effect is not very noticeable - less than half a percent difference running CPDN work. 32 bit Linux is another 1% or 2% slower than 64-bit, but with a 32-bit system you don't have the hassle of manually installing the correct 32-bit libraries, and losing CPDN models until you get it right.

I agree with Les: Mint is best if your Linux install is going to be used for various things. But if your box is dedicated to CPDN work (or BOINC work in general), I'd recommend 32 bit Lubuntu. Lubuntu is Ubuntu with the 'LXDE' desktop, which is very light on system resources, so it leaves more CPU for CPDN to use. LXDE is similar to Windows 95 or 98 in how it "feels". (It looks a bit more modern, though.) Lubuntu is based on Ubuntu, so you will get security updates for it. For most people it's easier to set up than Crunchbang (which is even more lightweight, but is kind of a "hot-rod" distribution - you have to be an enthusiast to want to drive it).

I also agree with Eirik and Les about using a virtual machine. If you have lots of RAM, set up a virtual machine with a few GB of RAM (0.75 GB to 1 GB per virtual processor) and maybe 50 GB of disk, and install Lubuntu in that. The performance penalty of the VM compared to "bare metal" is maybe 1% or 2%, but you gain a lot of flexibility.

So all up, with Lubuntu 32 bit in a VirtualBox virtual machine, you lose maybe 2% - 5% of the power of the CPU, but you gain the time spent setting up and using a dual-boot environment (shutting down and rebooting), you gain peace of mind, and you gain the ability to move that virtual machine to another box if you wish. You don't lose the use of any of your RAM - if 1GB per virtual processor isn't enough, simply increase the amount of RAM allocated to the virtual machine.

Oh: the direct answer to your question is: no, I don't think so. Not a distribution that updates them reliably after version upgrades. That's why I'm using a 32 bit distribution - thrice bitten, fourth time shy.
9) Message boards : Number crunching : hadam3p eu WU segfault (Message 48926)
Posted 27 Apr 2014 by Profile Greg van Paassen
Post:
Thanks, Bonsai911! :-)
10) Message boards : Number crunching : hadam3p eu WU segfault (Message 48916)
Posted 27 Apr 2014 by Profile Greg van Paassen
Post:
Segfaults happen occasionally (rarely), with all model types.

Why? Who knows? A misbehaving driver for one of the computer's peripherals. Cosmic Rays flipping a bit in the computer's RAM. Electrical 'noise' on the power supply from (for example) the refrigerator turning off. Static electricity building up and then discharging. Oxide films building up on connectors in the computer. A failing component or solder joint. Most likely: a very obscure bug in the model code, that can never be reproduced by the developers, because it only shows up after a certain pattern of disk accesses, with model data in a specific place in RAM. Or something like that.

If segfaults happen frequently with one computer, then it might be worth investigating further, for example checking that all the connectors to the motherboard and disk drives are properly seated, and running Memtest86+ for 72 hours or so.

But a one-off segfault is nothing to worry about.
11) Message boards : Number crunching : No Tasks Available (Message 48272)
Posted 4 Mar 2014 by Profile Greg van Paassen
Post:
My experience has been that the HadAM3P code is less sensitive to conditions on your computer than is the HadCM3N.

The usual advice applies still: ensure that your virus checker ignores the Boinc data folder.
12) Message boards : climateprediction.net Science : New project launch tomorrow: Weather@home 2014: the causes of the UK winter floods (Message 48248)
Posted 3 Mar 2014 by Profile Greg van Paassen
Post:
Welcome to the forums, Hannah!

It looks like the new models are being issued already. My machine has picked up four.
13) Questions and Answers : Windows : No new tasks (Message 47580)
Posted 14 Nov 2013 by Profile Greg van Paassen
Post:
From the looks of the web page for your computers, you disconnected and reconnected to the project many times from December 2012 to February 2013. Each time you did that, the project allocated your computer a new ID number. In effect, the project considered it a different computer.

The "in progress" tasks showing on your computer relate to the older IDs.

You could try using the "merge computers" option shown on the "my computers" page, to tell the project that those old computer IDs are really the same computer as your latest one. That may (or may not) help with the problem -- please let us know how you got on.

Edit: It may take a few days after you perform the merge before any effects are apparent.
14) Questions and Answers : Unix/Linux : Workunit "stuck" in the middle of calculation. (Message 47527)
Posted 11 Nov 2013 by Profile Greg van Paassen
Post:
Bob, the 25% / 50% / 75% problem is one where the tasks crash and terminate themselves at those points. They may leave behind directories named after themselves in the boinc-client/projects/climateprediction.net directory. That particular problem is less common than it was in 2012.

Your problem sounds like a "zombie" task. It's dead, but won't lie down. In all cases that I know of, the only possible action for tasks that stop advancing is to terminate them.
15) Message boards : Cafe CPDN : 50:1 Project (Message 47202)
Posted 29 Sep 2013 by Profile Greg van Paassen
Post:
16) Message boards : Number crunching : hadcm3n failed at 1% (Message 47153)
Posted 23 Sep 2013 by Profile Greg van Paassen
Post:
As Mo says, BOINC is designed to allow frequent suspension of work. I'm not so sure about all of CPDN's models, though. In my experience---this is of course anecdotal---Weather At Home models, and the now retired FAMOUS models are quite robust to frequent suspension; but the older HadCM3Ps, and now HadCM3Ns: not so much. It does seem to vary a lot between machines, though.

But HadCM3N seems to have the most trouble with disk contention: its files in use by other software when HadCM3N wants to write to them. Antivirus or backup software, usually. In the case of Macs, I guess that means Time Machine.

And I second Mo's advice about exiting from BOINC before shutting down the computer.
17) Message boards : Number crunching : hadcm3n failed at 1% (Message 47117)
Posted 19 Sep 2013 by Profile Greg van Paassen
Post:
Hi Jane.

Code 193 is a bit of a catch-all error.

If you have upgraded BOINC from an older version, the upgrade may have caused a permissions problem. Probably not that, though.

The problem may be caused by not having selected the processing option "Leave tasks in memory while suspended". (BOINC's default is unselected, but that doesn't work well for CPDN.)

It's also best to set the limit of CPU use by other programs quite high, too -- i.e. to not suspend BOINC too frequently. (The work runs at low priority and Macs are good at prioritising work, so leaving BOINC to run mostly has no effect on other work. The exceptions are recording sound or editing movies, and some games.)

Both of those options are in the "Computing preferences..." menu option, available from Boinc Manager's "advanced" view.

It's also best to exclude the BOINC folder from backups, as the CPDN programs are "touchy" about other programs trying to access their files.

The Mac section of this Board may be a source of other things to try if those don't fix the problem, and it has a post detailing how to fix the permission problem.
18) Questions and Answers : Windows : Optimise PC build for CPDN (Message 47085)
Posted 17 Sep 2013 by Profile Greg van Paassen
Post:
PrimoCache (renamed FancyCache) looks interesting.

There's also this (from 2008, that's why it talks about Vista):

https://www.pokertracker.com/forums/viewtopic.php?f=45&t=10489 (the "fsutil" command is discussed on Microsoft Technet, so this seems to be OK.)
Enlarge Write-Ahead Cache

This option is configurable in Vista or Windows 2003 server only.

Windows gives the NTFS filesystem a default cache to use for information, but if you are opening and closing a lot of different files in rapid succession, this cache can be exhausted, causing reads and writes to take longer than necessary. There are two setting sizes: normal, and large. From the Microsoft Documentation:

Increasing physical memory does not always increase the amount of paged pool memory available to NTFS. Setting memoryusage to 2 raises the limit of paged pool memory. This might improve performance if your system is opening and closing many files in the same file set and is not already using large amounts of system memory for other applications or for cache memory. If your computer is already using large amounts of system memory for other applications or for cache memory, increasing the limit of NTFS paged and non-paged pool memory reduces the available pool memory for other processes. This might reduce overall system performance.

To set the cache to its larger size, click Start --> Run, type 'cmd' and hit enter. Then type:

fsutil behavior set memoryusage 2

In the event of any issue, or degradation of performance as a result of this change, you can set the cache back to its normal size. To revert to the default configuration, click Start --> Run, type 'cmd' and hit enter. Then type:

fsutil behavior set memoryusage 1
(underline added).
19) Questions and Answers : Windows : Optimise PC build for CPDN (Message 47074)
Posted 17 Sep 2013 by Profile Greg van Paassen
Post:
the recommendations I've read say stay around 20C below TjMax for stability and long life.

This fits with what I read when I was building audio amps. Audio amps we like to keep running for decades. But what's long life for a computer? Hmmm... maybe I should get a better cooler, too.

(Audio is all different now--that was class B bipolar transistors; now hi-fi amps are mostly class D, and the things barely get warm at all, while providing much better sound. And with high-frequency switching power supplies, like those in computers, they barely weigh anything either. Hurray for modern power MOSFETs.)

Your power supply sounds good, so that's blown that theory. Your earlier post sounded like you live a bit out of town, so the voltage at your house may vary quite a bit. The UPS should help considerably with that.

20) Questions and Answers : Windows : Optimise PC build for CPDN (Message 47067)
Posted 16 Sep 2013 by Profile Greg van Paassen
Post:
Hi Martin,

I don't think you said what brand and model of power supply is in the machine? I found that the power supply makes a difference. Also, sizing. The rule of thumb is to aim for full-on processing, including graphics card, to be no more than 2/3 of the rated power of the supply. (And no less than 1/2, for optimum efficiency.) I'd estimate a 550W-650W class supply for your machine.

FYI on my Sandy Bridge Core i7, i7z (a Linux CPU reporting tool) reports temperatures of 83 - 85 degrees Celsius with 8 models running, and the machine's been stable for the last couple of years (... touch wood). (I do need to vacuum out the CPU heatsink fins six-monthly.) Ivy Bridge CPUs may be more touchy, of course.

If you still have problems even after your UPS and water(!) cooling, the last resort (before a different motherboard) is to underclock a few percent and see if that helps.

I feel for you. This must be frustrating.


Next 20

©2020 climateprediction.net