New work Discussion

Author	Message
Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4529 Credit: 18,633,388 RAC: 13,220	Message 66031 - Posted: 1 Sep 2022, 17:13:08 UTC I got no ClimatePrediction tasks for my Linux Machine in August even though it was up the whole time. The last Linux batch to be released went out on 21st July. They were all gone by the first of August bar the odd re-send. There was a Windows batch on the 11th August for those running that OS and or WINE. ID: 66031 ·

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1044 Credit: 16,192,170 RAC: 13,524	Message 66032 - Posted: 2 Sep 2022, 14:59:02 UTC - in response to Message 66031. Last modified: 2 Sep 2022, 15:17:58 UTC I can give you a bit of a heads-up after visiting Oxford this week. There will be more OpenIFS work coming, as soon as we can get the server config done. We will be testing higher resolutions so expect to see tasks with significant memory requirements and multiple cores, roughly 10Gb, 20Gb & 30Gb RAM with 2, 3 & 4 cores. We know there are still 1000s of machines that can potentially run these, the idea is to see what's workable and what isn't in order to offer scientists the best scientific capability possible. Hoping for feedback on these. Given the resource requirements, the idea will be to use 'credit multipliers' to account for the extra resource use. I'm used to a supercomputer environment where computer time is charged by resource (CPU+MEMORY+STORAGE). I think the credit for these more demanding tasks will work in a similar way. If there are any thoughts on this I can pass them on (it's not something I will be doing). I also wondered about 'badges', which I've seen on other projects. I don't know if this is of interest to everyone. There's also another OpenIFS project starting next month so tasks from that should be appearing in the next 6 months. I'm not so familiar with the Hadley model work but I think there's more coming before the end of the year. Comments welcome. Cheers, Glenn p.s. at 25 pages, I wonder if it's time to close this thread and start a new one? ID: 66032 ·

Alan K Send message Joined: 22 Feb 06 Posts: 490 Credit: 30,749,549 RAC: 10,262	Message 66035 - Posted: 2 Sep 2022, 22:25:02 UTC - in response to Message 66032. We will be testing higher resolutions so expect to see tasks with significant memory requirements and multiple cores, roughly 10Gb, 20Gb & 30Gb RAM with 2, 3 & 4 cores. We know there are still 1000s of machines that can potentially run these, the idea is to see what's workable and what isn't in order to offer scientists the best scientific capability possible. Hoping for feedback on these. Comments welcome. Cheers, Glenn p.s. at 25 pages, I wonder if it's time to close this thread and start a new one? Just wondering how my machine with 4 cores and 24Gb will fit with these. Will we still be able to limit the number of processors (cores) for the tasks? ID: 66035 ·

Bill F Send message Joined: 17 Jan 09 Posts: 124 Credit: 1,987,228 RAC: 3,976	Message 66036 - Posted: 2 Sep 2022, 22:27:20 UTC - in response to Message 66032. Last modified: 2 Sep 2022, 22:28:19 UTC Your update email was wonderful and the information helpful to those who follow the forums and try to stay in touch. On your question of badges ... Yes a great many crunchers do appreciate them as a visible reflection of their activities on various projects. I will go as far as to recommend that the format used by the Milkyway project be emulated. One badge for Years on the Project and one for Credit milestones. using clean and simple images. Bill F In October 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic; There was no expiration date. ID: 66036 ·

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1044 Credit: 16,192,170 RAC: 13,524	Message 66041 - Posted: 4 Sep 2022, 20:34:50 UTC - in response to Message 66036. @Alan K: I did raise the issue of not having a project preference to limit number of cores with the CPDN team, it's a good point. That's something they will look at. To start with though, we'll only give out 2 core jobs to gain some experience before we go any further. I'm not an expert on the boinc side but as I understand it the server will only send tasks to machines that have the capability (available RAM etc) defined for the task. @Bill F: We thought badges for particular projects would make sense (similar to World Community Grid). That's because different projects will have different computing requirements; e.g. long seasonal timescale forecasts, short hi-res forecasts with high RAM etc. But it's open to discussion. The model is ready to go, it's just the boinc side that takes time. Cheers, Glenn ID: 66041 ·

Alan K Send message Joined: 22 Feb 06 Posts: 490 Credit: 30,749,549 RAC: 10,262	Message 66042 - Posted: 4 Sep 2022, 22:39:37 UTC - in response to Message 66041. Thanks for the info on the cores front. Badges could be interesting - task types, credits and years on project come to mind. ID: 66042 ·

Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1119 Credit: 17,189,662 RAC: 2,777	Message 66043 - Posted: 4 Sep 2022, 23:11:45 UTC - in response to Message 66035. Will we still be able to limit the number of processors (cores) for the tasks? Which way do you mean that? My machine claims to have 16 cores, and I can tell the Boinc-Client to use only 8 cores for Boinc work units (and I do). I assume that will work for the IFS work-units as well. The only project I am running that sometimes runs multiple cores for individul work units is is MilkyWay, and I tell it to run only 4 cores per work unit. I forget what happens if I do not tell it; I suspect it might use 8 cores. Tgere has been some discussion (here?) about if the same way could be used for the new IFS work units. I have seen none of the IFS work units yet, and none of the normal ones in over a month. ID: 66043 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4529 Credit: 18,633,388 RAC: 13,220	Message 66044 - Posted: 5 Sep 2022, 6:39:41 UTC I have seen none of the IFS work units yet, and none of the normal ones in over a month. And the two core ones will go on the testing programme first, though when tests are successful, it is often only a few days between the move to main site work. ID: 66044 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4529 Credit: 18,633,388 RAC: 13,220	Message 66045 - Posted: 5 Sep 2022, 10:22:57 UTC I am not sure if there is a way to limit the number of CPUs per task in BOINC. Possibly through editing of cc_config.xml? It would be nice to have that option through options>computing preferences. I will suggest over on BOINC notice boards. ID: 66045 ·

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1044 Credit: 16,192,170 RAC: 13,524	Message 66046 - Posted: 5 Sep 2022, 10:47:18 UTC - in response to Message 66045. I am not sure if there is a way to limit the number of CPUs per task in BOINC. Possibly through editing of cc_config.xml? It would be nice to have that option through options>computing preferences. I will suggest over on BOINC notice boards. Dave, it's not a change on the boincmgr or client side, it's on the project pages. It's already possible on some projects. Milkyway for example has a 'project preferences' page under the user account which allows you to limit the number of cores in workunits sent to you. CPDN doesn't support this at present because until now they have not done any multicore work. To answer the previous message, this is just about limiting what multicore workunits the server sends to you, regardless of how many cores you are allowing your boinc client to use. The CPDN team are quite aware of the need for users to control things at their end and not overload volunteer machines. That's why we're engaging with everyone now and taking this a step at a time. Cheers, Glenn ID: 66046 ·

Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1119 Credit: 17,189,662 RAC: 2,777	Message 66047 - Posted: 5 Sep 2022, 12:37:40 UTC - in response to Message 66046. Milkyway for example has a 'project preferences' page under the user account which allows you to limit the number of cores in workunits sent to you. CPDN doesn't support this at present because until now they have not done any multicore work. True. For all projects, I believe the <project_max_concurrent>4</project_max_concurrent> will work. it certainy works for all of mine. You can pick a different number for each project. For MilkyWay, the app_version stuff, especially <avg_ncpus>4</avg_ncpus> limits the number of processors per work unit. I would be very surprised if current ClimatePrediction tasks look at this. If you put something like this in for CPDN, I do not know if it would cause errors, but it would almost certainly be ignored. [/b] [/var/lib/boinc/projects/milkyway.cs.rpi.edu_milkyway]$ cat app_config.xml <app_config> <project_max_concurrent>4</project_max_concurrent> <app_version> <app_name>milkyway_nbody</app_name> <plan_class>mt</plan_class> <avg_ncpus>4</avg_ncpus> </app_version> </app_config> [/var/lib/boinc/projects/climateprediction.net]$ cat app_config.xml <app_config> <project_max_concurrent>4</project_max_concurrent> </app_config> ID: 66047 ·

Bryn Mawr Send message Joined: 28 Jul 19 Posts: 148 Credit: 12,830,559 RAC: 228	Message 66048 - Posted: 5 Sep 2022, 12:56:34 UTC - in response to Message 66047. Milkyway for example has a 'project preferences' page under the user account which allows you to limit the number of cores in workunits sent to you. CPDN doesn't support this at present because until now they have not done any multicore work. True. For all projects, I believe the <project_max_concurrent>4</project_max_concurrent> will work. it certainy works for all of mine. You can pick a different number for each project. For MilkyWay, the app_version stuff, especially <avg_ncpus>4</avg_ncpus> limits the number of processors per work unit. I would be very surprised if current ClimatePrediction tasks look at this. If you put something like this in for CPDN, I do not know if it would cause errors, but it would almost certainly be ignored. [/b] [/var/lib/boinc/projects/milkyway.cs.rpi.edu_milkyway]$ cat app_config.xml <app_config> <project_max_concurrent>4</project_max_concurrent> <app_version> <app_name>milkyway_nbody</app_name> <plan_class>mt</plan_class> <avg_ncpus>4</avg_ncpus> </app_version> </app_config> [/var/lib/boinc/projects/climateprediction.net]$ cat app_config.xml <app_config> <project_max_concurrent>4</project_max_concurrent> </app_config> That would quite happily limit Milky Way to running 4 WUs each using 4 cores and CPDN to running 4 WUs at any time but would not limit the number of cores that each CPDN WU used. It would be interesting to see how much work would need to be done on the CPDN server to implement average CPUs and total CPUs - it might just be filling a data field within each WU with the number of CPUs it is set up to grab, the rest of the checking might be part of the standard Boinc server software. ID: 66048 ·

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1044 Credit: 16,192,170 RAC: 13,524	Message 66049 - Posted: 5 Sep 2022, 14:27:44 UTC - in response to Message 66048. Last modified: 5 Sep 2022, 14:28:49 UTC For MilkyWay, the app_version stuff, especially <avg_ncpus>4</avg_ncpus> limits the number of processors per work unit. I'm not so sure about avg_ncpus. I am still somewhat confused about its use. On this page (at the very bottom) https://boinc.berkeley.edu/trac/wiki/AppPlanSpec it notes that avg_ncpus is NOT intended for compute intensive workunits (which makes no sense to me because surely that's what they all are - OpenIFS certainly is). And further still here: https://boinc.berkeley.edu/wiki/Client_configuration it says that avg_ncpus can be fractional and refers to the number of CPU instances. The terminology 'instances' is a little vague, though the point of using a fractional number suggests it informs the client about the efficiency of the model. For 4 threads, I know that OpenIFS will give a speedup of 3.5, so I would presume to set <avg_ncpus>3.5</..> for the 4 core app version. It would also be fractional in the case of a app with a GPU component. I guess this allows the client to work out elapsed/remaining times, credit etc. I'm not so sure it's intended the end user modifies this variable - though it probably depends exactly how the project has decided to implement avg_ncpus & friends. I'll leave this to the CPDN team and just work on the models :D ID: 66049 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4529 Credit: 18,633,388 RAC: 13,220	Message 66050 - Posted: 5 Sep 2022, 15:58:01 UTC I'll leave this to the CPDN team and just work on the models :D Good idea! Some of the discussion threads relating to ncpus over on the BOINC boards have left me no clear, though that may be because I have never used it. If there isn't already one, I may request being able to set the number of cpus or limit them via the manager at Git-Hub but if that produces a result it won't be quick! ID: 66050 ·

Richard Haselgrove Send message Joined: 1 Jan 07 Posts: 1058 Credit: 36,566,547 RAC: 15,982	Message 66051 - Posted: 5 Sep 2022, 17:29:48 UTC - in response to Message 66050. Trying to summarise the various controls and options. You also need to consider the placement - the context - of each individual control in BOINC's various files and applications. First, at the top level, there's <ncpus>N</ncpus> Act as if there were N CPUs; e.g. to simulate 2 CPUs on a machine that has only 1. Zero means use the actual number of CPUs. Don't use this to limit CPU usage; use computing preferences instead. That goes in cc_config.xml, and makes the BOINC client behave, in every respect, like a computer with N cores. That's what it will tell the server, and the server will respond appropriately. Useful for exploring how well the client is behaving once the first test multi-threaded app is deployed, but not munch else. Next, comes preferences. Again, this applies to the whole machine. Usage limits Use at most N % of the CPUs: Keeps some CPUs free for other applications. Example: 75% means use 6 cores on an 8-core CPU. Use at most N % CPU time: Suspend/resume computing every few seconds to reduce CPU temperature and energy usage. Example: 75% means compute for 3 seconds, wait for 1 second, and repeat. That's mostly useful for keeping temperatures under control, especially when crunching on a laptop.That goes in global_prefs.xml (which is managed through project websites), or global_prefs_override.xml (which you can set manually on a individual machine). And finally, there a couple that apply to a single project, or even a single task-type within a project. That's what we're mainly discussing here. Both versions live the a single file called app_config.xml, and the full formal specification looks like this: <app_config> [<app> <name>Application_Name</name> <max_concurrent>1</max_concurrent> [<report_results_immediately/>] [<fraction_done_exact/>] <gpu_versions> <gpu_usage>.5</gpu_usage> <cpu_usage>.4</cpu_usage> </gpu_versions> </app>] ... [<app_version> <app_name>Application_Name</app_name> [<plan_class>mt</plan_class>] [<avg_ncpus>x</avg_ncpus>] [<ngpus>x</ngpus>] [<cmdline>--nthreads 7</cmdline>] </app_version>] ... [<project_max_concurrent>N</project_max_concurrent>] [<report_results_immediately/>] </app_config> The first thing to note is that THESE SETTINGS CONTROL BOINC ONLY They DON'T control the behaviour of the project's own science applications (with one exception): the idea is that they *describe* how the science app is going to behave after it's launched, so the BOINC can leave enough space free for them to run efficiently. The fractional values for <cpu_usage>.4</cpu_usage> and <gpu_usage>.4</gpu_usage> (app section), and <avg_ncpus>x</avg_ncpus> and <ngpus>x</ngpus> (app_version section) do much the same thing: they allow boinc to launch more copies of the app, to run concurrently, until the device is "full". They're really aimed at GPUs, and assume that GPUs won't use the CPU as intensively as a standard native CPU app would use it. The odd one out is <cmdline>--nthreads 7</cmdline>. That one is designed to be passed to a multi-threaded science app: <avg_ncpus> can be used to ensure the required resources are available and not in use: --nthreads is supposed to keep the science app to its allotted space. I've talked about this value earlier in this thread: it transpired that the MilkyWay project had found a way of combining both these controls into a single project setting. Good for them, and if CPDN can follow their lead, it makes life easier for all of us. ID: 66051 ·

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1044 Credit: 16,192,170 RAC: 13,524	Message 66054 - Posted: 5 Sep 2022, 21:46:43 UTC - in response to Message 66051. Richard, that's a nice summary. I think what milkyway do is define an app version for each core count i.e. a 2 core app, 3 core app, etc. That way the threads argument becomes irrelevant because that version of the app knows how many threads to use. That's how I was thinking of implementing it on the server. Anyway this is getting a bit technical. I'll PM you if I we have questions. --- CPDN Visiting Scientist ID: 66054 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 66057 - Posted: 6 Sep 2022, 0:22:47 UTC - in response to Message 66046. Last modified: 6 Sep 2022, 0:27:13 UTC I am not sure if there is a way to limit the number of CPUs per task in BOINC. Possibly through editing of cc_config.xml? It would be nice to have that option through options>computing preferences. I will suggest over on BOINC notice boards. Dave, it's not a change on the boincmgr or client side, it's on the project pages. It's already possible on some projects. Milkyway for example has a 'project preferences' page under the user account which allows you to limit the number of cores in workunits sent to you. CPDN doesn't support this at present because until now they have not done any multicore work. To answer the previous message, this is just about limiting what multicore workunits the server sends to you, regardless of how many cores you are allowing your boinc client to use. The CPDN team are quite aware of the need for users to control things at their end and not overload volunteer machines. That's why we're engaging with everyone now and taking this a step at a time. Cheers, Glenn Why is this needed? I control my machines locally since they're all different. I assume your programs will obey instructions in app_config? this is what i use for Milkyway multicore tasks on one of my machines: <app_version> <app_name>milkyway_nbody</app_name> <plan_class>mt</plan_class> <avg_ncpus>8</avg_ncpus> <cmdline>--nthreads 16</cmdline> </app_version> avg_ncpus tells the Boinc scheduler how many it uses on average, nthreads limits the program to a maximum of 16. it's quite simple, one variable goes to the program, one variable goes to the scheduler. You just need to make sure your program obeys nthreads. ID: 66057 ·

Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1119 Credit: 17,189,662 RAC: 2,777	Message 66059 - Posted: 6 Sep 2022, 2:25:42 UTC - in response to Message 66054. I think what milkyway do is define an app version for each core count i.e. a 2 core app, 3 core app, etc. That way the threads argument becomes irrelevant because that version of the app knows how many threads to use. I do not think so. IIRC, they can go up to 16 simultaneous machines, so they would need 16 slightly different file names. Consider where my MilkyWay stuff is. Here are the "interesting" files" [/var/lib/boinc/projects/milkyway.cs.rpi.edu_milkyway]$ ls -l total 25608 -rw-r--r--. 1 root root 222 May 21 15:06 app_config.xml --rwxr-xr-x. 1 boinc boinc 578472 May 13 23:39 milkyway_1.46_x86_64-pc-linux-gnu -rwxr-xr-x. 1 boinc boinc 7896168 May 13 23:38 milkyway_nbody_1.82_x86_64-pc-linux-gnu__mt Notice that there is no hint in the nbody filename as to the number of cores to be used, and furthermore, I do not think the contents of the app_config.xml file is transmitted to the MilkyWay server to tell them what I want. I believe the app_config.xml file is read by the Boinc-client just prior to starting a process: if to start one at all and in the case of the nbody processes, 1.) it better know how many simultaneous ones to start OR 2.) it starts one and tells it how many processes it is allowed to start. Lacking knowledge of how it is actually done, I favor option 1.) Here is the entire command line given to a running 4-cpu nbody task; I do not see a 4 in there. Could they be in the nbody_parameters.lua file, if that is a file? And if so, where might it be? boinc 62235 16183 99 21:59 ? 00:17:27 ../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.82_x86_64-pc-linux-gnu__mt -f nbody_parameters.lua -h histogram.txt --seed 74337709 -np 11 -p 3.63399 1 0.247069 0.289298 1.13001 0.0248842 1 1 1 1 1 ID: 66059 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 66060 - Posted: 6 Sep 2022, 5:30:49 UTC - in response to Message 66059. I agree with Jean-David Beyer - I have Milkyway tasks and i've adjusted the number of cores on ones i've already downloaded. ID: 66060 ·