Multithread - why not?

Author	Message
wujj123456 Send message Joined: 14 Sep 08 Posts: 87 Credit: 32,981,759 RAC: 14,695	Message 70232 - Posted: 30 Jan 2024, 5:06:28 UTC - in response to Message 70231. Last modified: 30 Jan 2024, 5:07:43 UTC But the OS will page the inactive tasks in preference. There won't be inactive tasks when the memory spike happens. The smoothed 30-second monitor window won't instantly reflect the memory spike for boinc client to preempt tasks. So the initial swapping will more or less happen across all tasks. If we assume OpenIFS doesn't keep unused memory around for long, then whatever get paged out is likely in the active working set, just not most recently used. The boinc preemption may never happen at all if the spikes are not long enough due to smoothing, but any time the combined memory usage exceeds system memory, OS will either swap or OOM kill, no matter how short that spike is because it has no other choice. PS: I just realized my initial reply shouldn't have used "average" but "smoothed". The formula is an exponential decay instead of average. The 40 limit will never apply, since ATLAS can use 8 threads, so 40x8 is a lot more than any machine has with current technology. We are at 384 threads per 2-socket commercial server currently, excluding more exotic 8-socket servers. In addition, running ATLAS with 8-threads is quite wasteful given the fixed 20-30 minute idling window if we care about efficiency due to the same Amdahl's law mentioned above. The only relevant part here is just that LHC team did set some limit for ATLAS from my observation. However, until now, I didn't know that's for memory restriction. I always thought it's to prevent someone from grabbing too many tasks at once. ID: 70232 · Reply Quote

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 70233 - Posted: 30 Jan 2024, 5:26:23 UTC - in response to Message 70232. But the OS will page the inactive tasks in preference. There won't be inactive tasks when the memory spike happens. The smoothed 30-second monitor window won't instantly reflect the memory spike for boinc client to preempt tasks. So the initial swapping will more or less happen across all tasks. If we assume OpenIFS doesn't keep unused memory around for long, then whatever get paged out is likely in the active working set, just not most recently used. The boinc preemption may never happen at all if the spikes are not long enough due to smoothing, but any time the combined memory usage exceeds system memory, OS will either swap or OOM kill, no matter how short that spike is because it has no other choice. Boinc should have preempted it before it hits 100% memory usage. You can always tell boinc to use something like 80% memory. PS: I just realized my initial reply shouldn't have used "average" but "smoothed". The formula is an exponential decay instead of average. It shouldn't be averaging, it should be looking at peaks. If we have an average of 90% RAM usage, but every so often one task adds 30% for a second, paging is going to happen with active tasks. The 40 limit will never apply, since ATLAS can use 8 threads, so 40x8 is a lot more than any machine has with current technology. We are at 384 threads per 2-socket commercial server currently, excluding more exotic 8-socket servers. In addition, running ATLAS with 8-threads is quite wasteful given the fixed 20-30 minute idling window if we care about efficiency due to the same Amdahl's law mentioned above. The only relevant part here is just that LHC team did set some limit for ATLAS from my observation. However, until now, I didn't know that's for memory restriction. I always thought it's to prevent someone from grabbing too many tasks at once. Not many people have those, I've only heard Boinc users mention 192 threads. I use 8 threads, although I tell Boinc they have an average of 5 on some machines. If Atlas won't give out any more, will it give you some of the other subprojects at the same time? ID: 70233 · Reply Quote

Richard Haselgrove Send message Joined: 1 Jan 07 Posts: 943 Credit: 34,360,365 RAC: 9,337	Message 70234 - Posted: 30 Jan 2024, 9:33:46 UTC - in response to Message 70228. I heard LHC maintains the client code now? Not quite. LHC oversees the final testing and release/deployment of finished server packages, but the raw code and client package release is still under the direction of David Anderson in Berkeley. ID: 70234 · Reply Quote

Glenn Carver Send message Joined: 29 Oct 17 Posts: 819 Credit: 13,695,469 RAC: 6,516	Message 70235 - Posted: 30 Jan 2024, 10:20:53 UTC - in response to Message 70228. Last modified: 30 Jan 2024, 10:22:16 UTC 1. Boinc client does respect rsc_memory_bound set in the task spec, only when deciding whether it can start this specific task. This can be trivially verified by setting allowed boinc memory below rsc_memory_bound and the task will never start. Yes, exactly, that's the problem. I tell the client the <rsc_memory_bound> which is the peak memory but the client doesn't respect that value when starting the task. It doesn't take the sum of the memory_bound values of currently running tasks into account. I could have 4 OIFS tasks start exactly same time on a quiet machine, but the last one will fail when the first 3 allocate their peak (or close to) memory. Had the client considered the total of the memory_bound of the first 3 tasks it would have known not to start the 4th. Anyway, we're going in circles. It is what it is and we have a way to workaround it, even though it's not ideal. Going back to the original thread topic, yes we do intend to roll out a multicore app for OpenIFS, but not the Hadley models. ID: 70235 · Reply Quote

Glenn Carver Send message Joined: 29 Oct 17 Posts: 819 Credit: 13,695,469 RAC: 6,516	Message 70236 - Posted: 30 Jan 2024, 10:24:19 UTC - in response to Message 70234. I heard LHC maintains the client code now? Not quite. LHC oversees the final testing and release/deployment of finished server packages, but the raw code and client package release is still under the direction of David Anderson in Berkeley. Andy spoke with David about this issue with large memory tasks and the client. David's response was 'it can be handled on the server side'. Which told me he didn't really understand the point. It's a client issue because only the client knows what else is running on the computer (e.g. tasks from other projects). ID: 70236 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4353 Credit: 16,598,247 RAC: 6,156	Message 70237 - Posted: 30 Jan 2024, 11:19:44 UTC - in response to Message 70236. I heard LHC maintains the client code now? Not quite. LHC oversees the final testing and release/deployment of finished server packages, but the raw code and client package release is still under the direction of David Anderson in Berkeley. Andy spoke with David about this issue with large memory tasks and the client. David's response was 'it can be handled on the server side'. Which told me he didn't really understand the point. It's a client issue because only the client knows what else is running on the computer (e.g. tasks from other projects). Shame no one is doing a fork of the code. Beyond me but making the client respect the maximum memory usage rather than some sort of average can't be the most difficult of jobs. Maybe putting it in as a request on git-hub rather than going direct to David? (I am probably showing my ignorance of the politics of this but hey ho?) ID: 70237 · Reply Quote

Richard Haselgrove Send message Joined: 1 Jan 07 Posts: 943 Credit: 34,360,365 RAC: 9,337	Message 70239 - Posted: 30 Jan 2024, 11:52:57 UTC - in response to Message 70237. A programmer could have a quick read-through of comment lines 18 - 52 of https://github.com/BOINC/boinc/blob/master/client/cpu_sched.cpp. In particular, we could look at the implementation of lines 44, 49 - 52: // Don't run a job if // - its memory usage would exceed RAM limits // If there's a running job using a given app version, // unstarted jobs using that app version // are assumed to have the same working set size. ID: 70239 · Reply Quote

wujj123456 Send message Joined: 14 Sep 08 Posts: 87 Credit: 32,981,759 RAC: 14,695	Message 70243 - Posted: 31 Jan 2024, 1:49:27 UTC - in response to Message 70237. Last modified: 31 Jan 2024, 1:58:41 UTC Shame no one is doing a fork of the code. Beyond me but making the client respect the maximum memory usage rather than some sort of average can't be the most difficult of jobs. Maybe putting it in as a request on git-hub rather than going direct to David? (I am probably showing my ignorance of the politics of this but hey ho?) I'm also naive in politics, but even technically, once forked, the new code base has to be maintained by someone forever. It's probably not a great investment to fork an entire code base just for a single feature request... ID: 70243 · Reply Quote

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 70247 - Posted: 31 Jan 2024, 8:21:11 UTC - in response to Message 70235. 1. Boinc client does respect rsc_memory_bound set in the task spec, only when deciding whether it can start this specific task. This can be trivially verified by setting allowed boinc memory below rsc_memory_bound and the task will never start. Yes, exactly, that's the problem. I tell the client the <rsc_memory_bound> which is the peak memory but the client doesn't respect that value when starting the task. It doesn't take the sum of the memory_bound values of currently running tasks into account. I could have 4 OIFS tasks start exactly same time on a quiet machine, but the last one will fail when the first 3 allocate their peak (or close to) memory. Had the client considered the total of the memory_bound of the first 3 tasks it would have known not to start the 4th. Can you take into account how much RAM a client has when sending them out? Restricting a 16G and a 128GB machine to the same number of tasks seems silly. If you can't do that (although the server does know how much RAM my clients have, I can see it on my account page), perhaps have the default as 2, but allow users to change it in the webpage preferences, with a note saying don't allocate more than 1 per 8GB. ID: 70247 · Reply Quote

Glenn Carver Send message Joined: 29 Oct 17 Posts: 819 Credit: 13,695,469 RAC: 6,516	Message 70260 - Posted: 1 Feb 2024, 12:00:28 UTC - in response to Message 70243. Getting back to multicore apps: I ran some tests of OpenIFS (linux) with the machine both quiet, green curve, and busy, orange curve. 'Quiet' is with no other boinc tasks running, 'busy' is with 80% of physical cpus busy in use by boinc. Here's the speedup curve: The dotted line shows an ideal speed-up i.e. for 2 threads, we get twice the performance and the model finishes in half the time, for 4 threads, the model finishes in 1/4 of the time, etc. -- There is a noticeable impact from how busy the machine is (no surprise). -- For 2 threads, we get close to the ideal speedup: 1.8x (quiet), 1.7x (busy). -- For 3 threads, speedup is 2.4 (quiet), which might be acceptable, but on a busy machine speedup is 2.1. i.e. on average we are wasting 1 thread assigned to the task. This result is for the current resolutions that have been used for OpenIFS. For the higher resolutions the speedup will improve for more threads as there are more gridpoints in use, hence more time is spent in the computation. But the general picture of the curve 'flattening' as number of threads is increased would be the same. --- CPDN Visiting Scientist ID: 70260 · Reply Quote