climateprediction.net home page
Posts by Michael Goetz

Posts by Michael Goetz

1) Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion *** (Message 62022)
Posted 24 Jan 2020 by Profile Michael Goetz
Post:
Hi
I'm using Debian GNU/Linux 10 (buster).
Can you help me how to load the 32 libs needed to run CPDN ?

Regards
Christian


Sure. These are my notes for my Buster installations. This worked for me:

Instructions for running 32 bit apps on Debian Stretch and Buster:

dpkg --add-architecture i386
dpkg --print-foreign-architectures
apt update
apt install zlib1g:i386 libncurses5:i386 libbz2-1.0:i386 libstdc++6:i386 -y
2) Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion *** (Message 61469)
Posted 5 Nov 2019 by Profile Michael Goetz
Post:
To install 32 bits libraries on recent Debian versions, you will need to add the i386 architecture to Apt, using the multiarch mechanism:...


Thanks!

I can confirm that this works on both Buster and Stretch.
3) Message boards : Number crunching : Slow progress rate for HadAM4 at N216 (Message 61419)
Posted 28 Oct 2019 by Profile Michael Goetz
Post:
Jean-David Beyer wrote:
I guess there would be a lot more of them were it not for the tasks crashing due to lack of those libraries.


Are there a lot of these? Would not all work units, not just hadam4* work units, crash because of this?

Is there any way the boinc server for ClimatePrediction to detect if libraries are absent (perhaps by analysis of failures) to determine this, and to refrain from sending 32-bit work units to machines lacking 32-bit libraries?


You probably don't want to do that:

1) Project-wise, this isn't a big problem. These tasks error out almost immediately, get sent back to the server, and are quickly turned around to go out to other hosts. This doesn't affect the project's overall throughput significantly, nor does it significantly impact the ability of good hosts to get work.

2) This is a problem users can, and do fix. You don't want to block the host permanently. You don't want to even block it temporarily because the inability to get tasks makes it impossible for a user to fix the problem. if you lock out such a host, you're actually contributing to the problem by making it harder for users to correct the problem!
4) Questions and Answers : Unix/Linux : *** Running 32bit CPDN from 64bit Linux - Discussion *** (Message 61371)
Posted 24 Oct 2019 by Profile Michael Goetz
Post:
Any guidance for which packages need to be added to Debian 9 (Stretch) or Debian 10 (Buster)? I've been having a devil of a time getting 32 bit BOINC apps (not just yours) to work on Stretch or Buster. Eventually I took the path of least resistance and create a 32 bit Debian VM. It would still be nice to know what's needed for the 64 bit installs. The Ubuntu instructions don't seem to work anymore for Debian.
5) Message boards : Number crunching : New work Discussion (Message 61345)
Posted 22 Oct 2019 by Profile Michael Goetz
Post:
Yes, if it is running, you are good. But don't reboot or suspend the CPDN work units too often.
It doesn't like that, and will eventually fail if done too often.

If you need to reboot, then I would suggest manually suspending BOINC first, but how you do that in your arrangement is another matter.
I just use straight Linux (or Windows).


With a VM, there's no need to suspend the "task" at all. Or to reboot. Or anything like that. You can simply suspend the VM itself -- which freezes the VM in place. It can be stopped and started like that without risk. As far as the CPDN app is concerned,or, for that matter, anything running inside the VM, the VM never stopped running at all. The fact that the VM was suspended is not detectable from inside (unless you're measuring the time.)

Note that this means suspending the VM from the VM control panel. NOT by executing any linux commands from inside the VM.

On the VMWare Player, click on the "pause symbol", and then select "Suspend Guest". Do NOT choose "Shut Down Guest" or "Restart Guest".
6) Message boards : Number crunching : New work Discussion (Message 61342)
Posted 22 Oct 2019 by Profile Michael Goetz
Post:
In an attempt to get new work, I have done the following:
Installed VMWare Workstation Player 15 on my Windows 10 computer.
Installed LinuxMint Tina 19.2 32-bit version in that Virtual Machine (I hope that avoids the 32bit libraries problem).
Installed BOINC in Mint
Linked it to my account

It downloaded 1 task (Virtual Machine is only using 1 core) and according to BOINC Manager, it appears to be crunching OK.

Looking in my CPDN account, a virtual machine has been added to my list of computers,

Since I am a complete Linux Newbie, I ask if I have I missed anything, or should it crunch OK and report OK?


That's pretty much all there is to it. Presumably, you gave the VM enough ram to run (since it's running). 2 GB is enough.
7) Message boards : Number crunching : New work Discussion (Message 61320)
Posted 22 Oct 2019 by Profile Michael Goetz
Post:
Alan -

Here is what I do to run other projects (for example WCG) and also get work from CPDN when it is available.

1) In Computer Preferences, set your queue to "Store 1 days of work".
2) Update WCG (to download 1 days of work).
3) Set WCG to "No new tasks"
4) In Computer Preferences, set your queue to "Store 10 days of work" and "10 additional days of work".

Now your computer will be busy for the next 24 yours on WCG tasks.
In the meantime, the computer will check CPDN every hour for new CPDN tasks.

You do have to repeat the process once your WCG tasks are finished - In other words, every 24 hours.


I have a different strategy that has the advantage of solving two problems at once.

Step 1: Turn on VT-X in your BIOS.
Step 2: Download the VBOX version of BOINC, which gives you the full VBOX product. You could also download something like VMPlayer, but you need VBOX for some BOINC products, so you might as well use that.
Step 3: Run VBOX stand alone, and create a 32 bit linux VM. This solves the 32 bit library problem.
Step 4: Install BOINC in this VM. Set it to run only CPDN, plus any NCI apps you might want to run.

This has no problems with 32 bit libraries, is always asking CPDN for work, and if you want to pause the task, just suspend the entire VM. No worrying about checkpointing or keeping stuff in memory or anything like that.
8) Questions and Answers : Wish list : Badges (Message 57494)
Posted 18 Dec 2017 by Profile Michael Goetz
Post:
While I wholeheartedly agree (the BOINC project I run has had badges for many years, and they absolutely, positively are a significant factor in bringing in new users and keeping them around), these days it seems like CPDN suffers from a lack of work rather than a lack of users. I don't think badges will help with that problem!
9) Questions and Answers : Wish list : Using GPUs for number crunching (Message 36385)
Posted 14 Mar 2009 by Profile Michael Goetz
Post:
It\'s also a question of the manpower available, which depends entirely on funding. CPDN has two full-time programmers and a small number of researchers who help out to some extent with the model programming. Last year we spent at least 6 months beta-testing models to make them and their graphics BOINC 6-compatible. As a result CPDN is well behind schedule with one big new research project and there\'s another queuing up.

The models have to be released and crunched to tight deadlines so the PhD students get their data in time to analyse it and write up their research within the 3-year limit for PhD grants. The programmers already have a hard time meeting some of the deadlines set for them by the researchers.

There\'s just no slack in the system at CPDN, at least at the moment, to redesign anything from zero. Everybody\'s already working flat out.


Thanks for the informative reply! I didn\'t realize CPDN had an issue with processing the incoming results. I can certainly understand how the very last thing you would want to do would be to speed up the client-side processing!

And I fully understand that the effort involved would be very large. I\'ve personally done the Fortran to C translation on a very large program, and the two languages are simply from very different eras. They don\'t map to each other very well, at least not if you want to end up with understandable C code when you\'re done. A lot of effort is required to do it right.

Once again, thanks for the explanation.

Regards,
Mike
10) Questions and Answers : Wish list : Using GPUs for number crunching (Message 36382)
Posted 14 Mar 2009 by Profile Michael Goetz
Post:

These climate programs are a million or so lines of Fortran, written by many people over several decades, to run on the supercomputers at the UK\'s Met Office.

The maths uses very large numbers of precision because of the need to encompass the range of things like the large amount of water vapour in the air, while at the same time including the miniscule amounts of things like sulphate particules.

The graphics for computer games don\'t have anywhere near the same range of values.


A year has past since this post, so I think I\'ll revisit it.

My understanding (I\'m not an expert on this) is that the latest Nvidia cards support double precision math. (As you stated, most older cards only did single precision math, since that was more than enough for graphics.)

Does CPDN use precision higher than double? If so, that\'s not supported in the CPU hardware either, so it would have to be emulated in the Fortran libraries. That emulation could be ported to the GPU.

If double precision is used in the calculations, that\'s directly supported by the newer GPUs.

Considering those GPUs run an order of magnitude faster than even the fastest CPUs, and the length of the CPDN workunits, this would be a superb project to be GPU\'d.

All that aside, there\'s the Fortran thing. Porting massive programs from Fortran to C is not a trivial task, and that, in and of itself, might prohibit CPDN from ever being available on a GPU.

As for converting the app to run on a parallel processor, I would think that CPDN would be a great fit for this. The supercomputers used to run many weather forecasting systems are (or were) supercomputers because of their specialized vector (i.e., parallel) math capabilities. I\'d be surprised if the original code written for the supercomputers wasn\'t written to take advantage of the parallel processing abilities of those computers.

An awful lot of coding would be involved. Not only do you have to translate the programing language, but rewriting the application to take good advantage of the specific type of parallel processing available in a GPU is something that requires some skill, thought, and time.

I guess it depends on where the project wants to put its resources. There\'s a ton of GPU processing ability out there. Is it worth it to get results back much faster, meaning many more WUs get processed? Imagine crunch times for CPDN models that are measured in hours, instead of days. (The latest GPUs could probably complete the smaller CPDN models in less than a day.)

Mike
11) Questions and Answers : Windows : User avg trickling off to nothing (Message 36348)
Posted 9 Mar 2009 by Profile Michael Goetz
Post:
So my prefs for 80% to CP are still being ignored, unless BIONC is smarter than I realise and is cumulatively expecting to run SETI a quarter of all the time spent running CP and is therefore trying to catch up.
Thanks again


That\'s exactly what\'s happening.

BOINC goes into \'panic\' mode (aka running at high priority) when it thinks a task is going to miss its deadline. It will then ignore your resource-sharing preferences and run the at-risk tasks until they either finish or are no longer in danger of missing their deadlines.

There\'s no such thing as a free lunch, however. BOINC remembers that SETI got extra time, and both its short term debt and long term debt debt are debited as a result. In the end, what happens (assuming you don\'t intervene and nothing else affects the scheduling) is that after the SETI tasks are done, BOINC won\'t download any more for a while and will just download and run tasks from other projects. Once SETI has been in the penalty box for a while, to let CPDN catch up, BOINC will start downloading SETI again.

Bottom line: Assuming your projects always have work available, which is generally true for both SETI and CPDN, if you don\'t mess with the scheduler, over the long run it WILL honor your preferences. In the short term, however, it won\'t when tasks may miss deadlines. That\'s normal (and preferable to missing deadlines, since most projects, except for CPDN, usually discard work returned late.)

On my big computer, I have a dozen projects running, about half of which rarely have work to do. BOINC keeps the work queue filled with tasks from the projects that have work continuously. When one of those intermittent projects suddenly has lots of work available, this is frequently what happens:

1) Normally, BOINC keeps a supply of work on hand from projects A, B, C, D, and E.

2) Work becomes available for project F, and BOINC downloads several days worth of work for F.

3) Every project on the computer now takes longer to complete tasks, because there\'s now 6 projects instead of 5 contenting for processing time on the four cores in the CPU.

4) Two of projects, B and C, have tasks with short deadlines. With the (unexpected and unpredictable) increase in expected completion times due to the addition of project F\'s tasks, B and C now have some tasks that are going to miss their deadlines if they play nice and only use their fair share of CPU time.

5) BOINC puts B and C into panic mode, and those two tasks run at high priority, at the exclusion of all other projects. In addition, no projects will download any new work, so as not to exacerbate the problem.

6) Once B and C finish, BOINC goes out of panic mode and starts running A, D, E, and F again. New tasks will once again be downloaded for A, D, E, and F.

7) B and C won\'t download new tasks for a while until, by virtue of not running anything, they\'ve paid back the extra CPU time they used. Once that happens, they resume downloading new work and processing.

That\'s somewhat simplified, but hopefully that helps understand what\'s happening.

Bottom line advice: Either don\'t attempt to micro-manage BOINC\'s scheduler and let it do what it\'s supposed to, or tell SETI not to give you Astropulse tasks. I think you can select which tasks to run on their website.

Trying to micro-manage BOINC\'s scheduler often yields unexpected (and undesired) results. Been there, done that.

Mike




©2022 climateprediction.net