climateprediction.net home page
Posts by old_user156196

Posts by old_user156196

1) Questions and Answers : Wish list : Processor specific optimization? (Message 20091)
Posted 10 Feb 2006 by Profile old_user156196
Post:
Hello Les,

I joined CPDN a few days ago and it annoys me to no end how long a work unit will take and how much processing power is wasted by the way it was programmed. (OK the last is a guess and may be wrong.)

You confirm in that post what I feared - using a supercomputer program on a desktop. That’s like trying to pull a 40 tons truck trailer with a VW Beetle.

FORTRAN is certainly way faster than C. Still I do not understand why you not release optimized code for AMD and Intel based as well as 32 and 64 bit CPUs.
After all this would greatly improve the return rate. Still I think that’s not the biggest point for optimization.

50 MB code? Yes I saw the process is that big but that’s just what is not working on a desktop CPU. I do not know the memory size, access speed and bandwidth of your supercomputer but my Athlon 64 3000+ @ 1.8 GHz does not have 50 MB (even L2) cache. This means in most cases it will go to RAM which is from my calculations at least 15 cycles from the CPU (probably more - up to 300 we learned). That\'s a lot of waiting time if you ask me. Then there are those constant HDD accesses. Does the program read or write and does it have to wait for the access to be finished? I have 1 GB RAM but with such huge programs I would not be surprised if at least part was moved to the page file. So I tired to switch of the paging file but the HDD accesses were still there.
Is this really necessary? I mean my HDD has 8ms average access time -> this means for my system 14,400,000 CPU cycles of waiting time.

Here are my questions/suggestions.

1. Is it really necessary to have all 5 phases done in one WU?
If all were of the same processing time that would cut each unit to 10 days instead of 50 on my machine without any other optimization.
I assume that not all 50 MB are used in each phase. So cutting it in parts would make the process smaller and less likely to be put to disk.

2. Does each phase have to cover the whole time? Would it not be possible to just calculate a year or even just a months (so a work unit would not take more than about 5 hours hopefully less). The result would be send out again to be processed further till the phase is finished and then again till all phases are finished. If you could cut it to small functional phases so that the program code would mostly stay in cache and the data as well this would speed up to whole processing immensely.
I also believe a lot of people get scared away but the long processing time. While for a supercomputer user this may seem fine it looks just horrible to a desktop user. So with smaller WU you would likely get more people contributing to your project.

3. Is the Animation part of the 50 MB? My guess is it’s the other 7 MB of C code but why is it in memory when I run no animation. I would like to help research but I do not really understand it so it’s just a nice animation. Something I can live without. But even if one wants it should not be part of the model. I just say Model-View-Controller design-pattern.
Do not load what is not needed at a certain processing stage. Make the whole thing more modular so the OS can work with it better.

4. I wrote to nVidia if they are aware of BOINC and that their GPUs are quite well suited for the kind of problems research applications present. A GPU maybe up to 10 times as fast as a CPU given the right tasks.
Most users do not play all the time but still have a fast 3D GPU in there system. So this GPU is mostly doing nearly nothing. Maybe it would be possible to write graphics card drivers that would allow BOINC and its applications to run on the GPU idle time and the CPU idle time. GPUs should be good at vector processing and I guess you use vectors.
Here is the answer I got:

Hi Holly,

I will pass this along. Good idea.

You should lob it in to this group, too.
http://www.gpgpu.org/

Brian Burke
NVIDIA Corp.
12331 Riata Trace Parkway, Suite 300
Austin, TX 78727

Well I have no idea if they will decide that the possible costs will be worth the possible PR effect or not. So its not clear, if they actually will try to make it work, but it’s cool that they even consider it.
Maybe you want to see if you could tab into those resources as well.

Ok that was more than enough I guess.
I would like to hear your answers. What is possible? Is it mainly a question of financing such changes? (Would not surprise me - sadly)

One thing I want to say for the end. It’s great that you work on this and show how climate is developing. Hopefully the governments will see that they need to act now and hopefully it’s not to late already. Germany is not that bad in trying to be not so invasive with the environment but when I heard Sweden’s commitment yesterday I have to say we are light years behind since money is still the stronger drive. Let’s not even thing about the USA’s point of view. But since you live there you know better than I do ;)

So please speed up your research and show what the consequences are.

All the best

Holly
2) Questions and Answers : Windows : How do I compute faster? (Message 19986)
Posted 5 Feb 2006 by Profile old_user156196
Post:
Hi,

I would like to bring your attention to an other aspect.
Cache size.

If you really look at the whole CPU-L1-L2-L3-RAM-HDD line you will see that even chache already a few cycles away.
RAM can be as much as 300 cycles away and lots not even talk about the disk with its average acces time of 8 ms
One cycle at 1.7 GHz is 0.59 ns long so it takes 13,600,000 cycles till the data from the disk starts to come in ...

What you would want to have is that there are very few cache misses so you do not need to acces the RAM often and hopefully you to need to access the HDD even less often.

Now cachesize is something you can only influence inderectly.
Buy a CPU that has lots of fast cache.
But even then it may not help since the cache is not visable to the CPU and there for you can never tell if the data you need is really there.

All you can do is hope the cache has a good strategy.

What could be done is that the programmers of this CPDN try to cut the promlem in smaller pices so it hope fully fits and stays all in cache.

Usually 90% are cache hits but if the data is to big which I asume aplies to CPDN you will have lots of cache misses and therefor your CPU will wait for data most of the time.

Btw this has nothing to do with the operating system but with how programmers are aware of hardware limitations. Still even if they are aware t depends on the problem, the programming language and the actually CPU how much he can take this in account. Since you do not know the CPU and it is abstracted away from the programmer by serval lavers (usually 5+) there is not much they can do but cut the problem in to tiny pices and hope the OS and CPU will bring out the max of the actuall setup.

one last thought - current desktop CPUs can spent up to 90 % of their time waiting for data from any of the memory levels and it will only get worse since CPUs have become faster way faster memory did and so far this difference will only grow.

Best you can do now:
at least 512 MB for XP
at least 1 GB for 64-bit CPUs
more may be good but it also means more work for the memory mangement unit so 4 GB are not neccesaryly better.

This all applies for single cores.
If you have a dual core CPU you should have more memory since both will hunger for data to process ...

Ok enough tech stuff.

Its not really something you change but I wanted to make you aware that you can not really compare CPUs just with their clock speed.
You can not really compare AMD to Intel or IBM
it depends on the prob, the RAM, and the architecture of the CPU and OS
just to name a few.

all the best to you

Holly




©2024 climateprediction.net