Using GPUs for number crunching

Author	Message
Milo Thurston Volunteer moderator Volunteer developer Send message Joined: 2 Mar 06 Posts: 253 Credit: 363,646 RAC: 0	Message 36620 - Posted: 3 Apr 2009, 21:05:56 UTC - in response to Message 36619. I assume he does these searches on the climateapps2 server because when he\'s extracting data this forum slows down. I need to do some queries using the cpdnboinc databases but they are done on the slave server and serve to populate some tables in the results database (the one behind results.cpdn.org). Researchers can then download data directly from the upload servers, and that might result in a performance hit here if they\'re downloading a lot of data from the trickle server. Concerning available space, we have approx. 30TB in total if we include all our active upload servers and also the storage servers used by the results system. Last time I checked (a couple of weeks ago) a little over 21Tb of this was in use. I expect CPDN provides some similar program for its research staff and students. As far as I know they rely upon custom-written IDL scripts. ID: 36620 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 36621 - Posted: 3 Apr 2009, 21:16:19 UTC Last modified: 3 Apr 2009, 21:17:19 UTC Actually I was referring to the fact that the project\'s upload servers often fill up with returned data and stop accepting more data, and Milo then has to offload a few hundred megabytes to \"somewhere else\". There\'s usually a message of some sort about problems with space, followed by questions from crunchers asking what\'s going on. So people who think that using GPUs is a good idea, because the project can then get more data even faster, will have to come up with a different reason for expecting this project to spend several years re-writing the code. However, Mo is correct in that every now and then the upload server climateapps2, (which, as you can see from the url at the top of this page, also holds these BOINC style forums, and as well as the user\'s account pages), slows down for hours at various times during the day. This, as Mo says, is when Milo is running search scripts for \"certain data\" required by the physicists for their research. But I think that this part may now be \'past tense\', as Milo has the researcher\'s area created, and this may automatically be filled with links from the data. Where ever it gets stored from time to time. (Milo has said that when he\'s moving the data off the upload server, that it\'s also necessary to change the links that point to it, so that the research area still works.) If you look at the bottom of every page for a model, you will see: View this result on results.cpdn.org, which will take you to this newish area. As for volunteering to help with the research, the steps would be: Locate a research group somewhere around the planet with a vacancy. Apply for a position with them. Sign a comprehensive secrecy agreement. Work long hours for a mere pittance in appalling work conditions. * The model type currently being tested is the Regional model. It appears in two types: South Africa Pacific North West and when it\'s released, the results will be used by physicists in these two places. So you may be able to get a job with the one in your area, and commute to their offices. The research done at Oxford is by pre and post grad students there. Linked from the front page of this site: Atmospheric, Oceanic and Planetary Physics * Unless I\'m confusing this with artists, who are supposed to live in garrets, and never make any money form their work until after they\'re dead. edit A bit too slow this morning. :( ID: 36621 · Reply Quote

Milo Thurston Volunteer moderator Volunteer developer Send message Joined: 2 Mar 06 Posts: 253 Credit: 363,646 RAC: 0	Message 36622 - Posted: 3 Apr 2009, 21:42:30 UTC - in response to Message 36621. Sign a comprehensive secrecy agreement. This bit is necessary because the climate model code is proprietary and belongs to the Met. Office. Anyone who has access to it, including an installation of the PUM on their machine that\'s needed for model development, has to sign the Met. Office\'s license. ID: 36622 · Reply Quote

mo.v Volunteer moderator Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0	Message 36623 - Posted: 4 Apr 2009, 13:46:07 UTC Last modified: 4 Apr 2009, 20:01:27 UTC There\'s also an elephant in the room. On a project that has apps for both CPU and GPU, this is what\'s going on. Until and unless BOINC and the projects find robust ways of dealing with the credit disparities between tasks run on CPU and GPU, the moderators, programmers and researchers here would have no time or inclination to expose CPDN to these shenanigans. CPDN is already receiving large numbers of carefully-crunched models. Almost all results are of extremely high quality. All the effort is going into developing new experiments. In my view the project\'s priorities are absolutely right. Edit to correct link Cpdn news ID: 36623 · Reply Quote

old_user27607 Send message Joined: 28 Oct 04 Posts: 64 Credit: 34,444,555 RAC: 0	Message 36633 - Posted: 6 Apr 2009, 18:42:09 UTC Thanks to all who provided the explanations of the bottleneck. It appears that it is a search performance problem: Massive DB plus complex searches can take a long time. It is also obvious (I think) that the staff has already considered using a Postgres (or other) DB with indexing on the search parameters. Either it was not right, or perhaps just too short of staff. From my limited POV, this might solve some problems, but unless they get critical, staff seem already oversubscribed. :-} Unless a DB expert can do the design and setup, and the staff can install and run, there doesn\'t seem to be an early solution, unfortunately. :-{ The good news is that the compute side is working well. Hope the data side can catch up. I\'ll be available to help if needed. BillN ID: 36633 · Reply Quote

Jord Send message Joined: 5 Aug 04 Posts: 250 Credit: 93,274 RAC: 0	Message 36647 - Posted: 8 Apr 2009, 10:40:55 UTC Telling that CPDN is Fortran code and thus porting it over to C isn\'t feasible isn\'t an issue anymore as future versions of CUDA will support FORTRAN, C++ and OpenCL. The bigger problem is that the code doesn\'t actually run on the GPU, the program will always run on the CPU as that\'s the main processor. The GPU is only a coprocessor, the operating system doesn\'t know about it, so it\'s impossible to move programs to it and start them in the GPU\'s memory. The other problem is that the code run on the GPU isn\'t the same code that runs on the CPU. GPUs run the code as kernels, with one kernel for every multiprocessor on the GPU. So for each kernel you have to have x amount of memory. Which easily translates as 4 to 6 times the amount of memory that\'s needed to run the code on the GPU when compared to running it on the CPU. For comparisons, if the CPDN application on the CPU runs in 500MB of RAM, it can take up 2 to 3GB of VRAM on the GPU. But by the time that CUDA X will support FORTRAN, C++ and OpenCL the videocards that are supporting that standard will probably be 8GB monsters. ;-) Jord. ID: 36647 · Reply Quote

Simplex0 Send message Joined: 7 Sep 05 Posts: 12 Credit: 601,646 RAC: 0	Message 36683 - Posted: 11 Apr 2009, 10:35:02 UTC I am just speculating here and i have not done any programming in a long time, back in the Commodore 64 days :). But is it not possible, as a first step, to concentrate on the most time-consuming routines in the program and have them to be processed by the GPU? Tomas ID: 36683 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 36686 - Posted: 11 Apr 2009, 14:54:37 UTC Possibly. The present models took 2 software engineers over 2 years to port from the Met Office computers to desktops and get them stable. So those people who will only run climate models if they can do so on GPUs should do something else for 5 years or so, and then come back to see what is happening here. In the meantime, the rest of us will continue churning out models. I can get through 4 of the new hadam3p models in about 4 days on a quad using just the CPUs. :) ID: 36686 · Reply Quote

Simplex0 Send message Joined: 7 Sep 05 Posts: 12 Credit: 601,646 RAC: 0	Message 36690 - Posted: 11 Apr 2009, 15:40:50 UTC The fact that they can\'t do it does not mean that it can\'t be done. A lot of the best optimized code written so fare is done by volunteers. Currently I know of Seti@home, Milkyway@home and gpugrid.net that have GPU optimized code and there are more project on the way like Lattice Project. And you have Folding@Home that can use both ATI and NVIDIA cards. Regarding argument that the extra speed is of no use is most likely nonsense. If les of our computer time needs to be spent on climateprediction.net it can be used by other projects. And as I said before. Do you need to rewrite all of the code? Is it not possible to start with a few GPU subroutines that run under the main program? Tomas ID: 36690 · Reply Quote

mo.v Volunteer moderator Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0	Message 36691 - Posted: 11 Apr 2009, 16:08:06 UTC Last modified: 11 Apr 2009, 16:09:21 UTC One might have a model type that develops its climate in timesteps every 10 minutes of model time and the longest calculations are at every hour ie every sixth timestep. The calculations at each timestep depend on the values produced by the model at the previous timestep. If the long hourly calculations are removed to the GPU, I don\'t see where they would get their input data from (ie the model values at 50 minutes past each hour or during the entire previous hour). And the other part of the model on the CPU would have no input data at 10 minutes past each hour. The models are designed to run as a single continuous process with the values at each timestep being calculated from the values generated at the previous timestep. I don\'t see how this continuous process can be broken up into two separate processes. Cpdn news ID: 36691 · Reply Quote

Simplex0 Send message Joined: 7 Sep 05 Posts: 12 Credit: 601,646 RAC: 0	Message 36692 - Posted: 11 Apr 2009, 16:08:17 UTC Les Bayliss! Just to get an idea of the speed gain, take a look at this link. http://www.youtube.com/watch?v=nnsW-zB95Is ID: 36692 · Reply Quote

mo.v Volunteer moderator Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0	Message 36693 - Posted: 11 Apr 2009, 16:38:15 UTC Hi again Tomas I know that the speed gains are phenomenal. I\'ve looked at what several members of MilkyWay are doing on both their CPUs and GPUs. I already knew how fast their best computers can process climate models because I chose people who are active members of both projects. The climate Unified Model code will not be released to volunteers. It belongs to the Met Office who allow CPDN to use it. All the new model types have to comply with internationally-agreed climate model standards. It\'s already difficult enough to ensure this with the model development solely in the hands of people with years of experience with the UM. We are not sucking our thumbs. We know what is happening on other projects. We know what the BOINC alpha testers are doing. We know what is being developed at CPDN Beta. We know that the latest CPDN model in beta testing is so complex that it\'s been in development for over a year and is still not ready for public release. We know the (two) project programmers and we know they haven\'t got enough time to deal with all the current (and old) outstanding issues, at least one of which is a major problem. This is the current situation and is the best that can be done with the currently available funding. Cpdn news ID: 36693 · Reply Quote

Belfry Send message Joined: 19 Apr 08 Posts: 179 Credit: 4,306,992 RAC: 0	Message 36839 - Posted: 30 Apr 2009, 21:52:03 UTC Last modified: 30 Apr 2009, 22:17:01 UTC SETI and F@H are multiple-instance number crunchers, whereas the CPDN apps are models. Models try to accurately replicate physical processes across time and space, and their calculations are highly interdependent and conditional. Consider a computer model of a highway bridge--could the math be divided across two processors? If we assign the north end to one processor and the south end to another, then there's no way to describe the strain when both ends pull in opposite directions. SETI and F&H can exploit GPU stream processors because the result of a single star-system search or protein formation has no bearing on the next. Interestingly, when the first dual-core processors came out, some rumors popped up about reverse-hyperthreading--two processors working on a single thread. This article suggests we can expect Santa Claus before we see this technical feat. ID: 36839 · Reply Quote

Simplex0 Send message Joined: 7 Sep 05 Posts: 12 Credit: 601,646 RAC: 0	Message 36841 - Posted: 1 May 2009, 15:44:19 UTC Last modified: 1 May 2009, 15:46:35 UTC The GPU can be and ARE used as an cooperating processor, it process a given tasks and deliver the processed data back to the CPU. I do not know anything regarding the mathematics in this project but to give you an idea of how it could be used I take the following example. Say that you want to multiply 2 n*n matrix with each other. The matrix M itself is to large to be handle by the GPU but the matrix is build up with vectors v1, v2....vn and vector are build up by element e1, e2.....en. the element can be processed by the GPU, 2 at the time, and the processed data can be delivered back to the CPU. So you see this is a linear process and the GPU are only used to process given tasks that it can handel much faster that the CPU. ID: 36841 · Reply Quote

Simplex0 Send message Joined: 7 Sep 05 Posts: 12 Credit: 601,646 RAC: 0	Message 36842 - Posted: 2 May 2009, 7:30:26 UTC - in response to Message 36839. Last modified: 2 May 2009, 7:32:54 UTC Interestingly, when the first dual-core processors came out, some rumors popped up about reverse-hyperthreading--two processors working on a single thread. This article suggests we can expect Santa Claus before we see this technical feat. Did you also read this? " With a queue of 1024 entries, DCE outperforms a single-thread core by 41% (up to 232%) on average and achieves better performance than run-ahead execution on every benchmark we studied (24% on average). " ID: 36842 · Reply Quote

Belfry Send message Joined: 19 Apr 08 Posts: 179 Credit: 4,306,992 RAC: 0	Message 36843 - Posted: 2 May 2009, 20:50:53 UTC - in response to Message 36842. Last modified: 2 May 2009, 20:59:13 UTC Hello Tomas, If you believe every academic paper produces real-world applications, then I have a metal-ceramic, combustion engine I'd like to sell you. I also don't know anything about the CPDN / Hadley Center code, but I suspect lots of conditional branching and looping, such that it's hard to discern where a calculation begins or ends--making it extremely difficult to offload math to another thread and/or processor. But I am not a professional programmer, so maybe someone else can offer their thoughts here. ID: 36843 · Reply Quote

Simplex0 Send message Joined: 7 Sep 05 Posts: 12 Credit: 601,646 RAC: 0	Message 36844 - Posted: 2 May 2009, 22:11:29 UTC - in response to Message 36843. Hello Tomas, If you believe every academic paper produces real-world applications, then I have a metal-ceramic, combustion engine I'd like to sell you. Hi Belfry Hmm... I see, so we should only believe in the guys who say it can't be done? The information was taken from the link you provided. Anyway it should not be that hard to repeat the experiment to see if it is true right? It seams that the main reason is that there are very few, if any, at the different universities that are capable to make a program that runs on a ATI card. I had no idea that it was so difficult. That's make me see game programmers in a whole new light. ID: 36844 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 36845 - Posted: 2 May 2009, 22:30:32 UTC The programs being used have been written and developed at the UK Met Office over many decades for use on their supercomputers. The work being done here is part of an attempt to improve the understanding of how climate works by producing massive numbers of slightly different models for various scenarios. It's unlikely that an attempt will be made by the programmers of this project to alter the basic underlying code. Their task is to use it to create different models for climate researchers from several UK Unis. If the UK's Met Office produces new code that will run on GPUs and makes it available to their registered users around the globe, then it would probably get used here eventually. ID: 36845 · Reply Quote

BarryAZ Send message Joined: 13 Jul 05 Posts: 125 Credit: 11,778,421 RAC: 0	Message 36857 - Posted: 4 May 2009, 23:12:18 UTC - in response to Message 36623. One of the issues over in Milkyway is that they are the ONLY BOINC project with GPU support for ATI cards. That code was done by volunteers -- MilkyWay clients use 3rd party optimized clients both for CPU and GPU workunits - in fact, at the moment, the same workunits can be done via ATI 38xx and 48xx cards as well as CPU's. Because MilkyWay is the only BOINC project with ATI GPU support, anyone interested in doing BOINC projects who has one of the supported ATI cards is going to gravitate over to MilkyWay. There are a few projects which support the CUDA applications -- in some ways this is easier since the BOINC client from 6.4.5 and up supports CUDA -- but not ATI. So a large piece of the 'pressure' on MilkyWay is from folks running ATI cards since they are for now, unwanted stepchildren as far as the Anderson BOINC core client code folks are concerned. I have a few of the supported (double precision) nVidea cards -- I joined GPUGrid to utilize them in a BOINC project. There's also an elephant in the room. On a project that has apps for both CPU and GPU, this is what's going on. Until and unless BOINC and the projects find robust ways of dealing with the credit disparities between tasks run on CPU and GPU, the moderators, programmers and researchers here would have no time or inclination to expose CPDN to these shenanigans. ID: 36857 · Reply Quote

tullio Send message Joined: 6 Aug 04 Posts: 264 Credit: 965,476 RAC: 0	Message 36861 - Posted: 5 May 2009, 12:37:11 UTC I think one of the reasons behind the SETI daily loss of users is the invasion of the aliens, that is people having CUDA-capable graphic cards. People with no card, like myself, or not capable cards (there are many) exit from the project, feeling to be treated unfairly. Tullio ID: 36861 · Reply Quote