UK Met Office HadAM4 at N216 resolution

Author	Message
Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 61259 - Posted: 18 Oct 2019, 7:21:59 UTC My 3.50GHz Haswell looks like taking about 14 days for these, even though BOINC is saying about 3.3 days. ID: 61259 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4389 Credit: 16,822,336 RAC: 5,993	Message 61260 - Posted: 18 Oct 2019, 7:30:47 UTC - in response to Message 61259. My 3.50GHz Haswell looks like taking about 14 days for these, even though BOINC is saying about 3.3 days. Similar percentage difference here. If the figure in the task files that determines the estimate is the same one one that determines credit, may need to mention this to the project? ID: 61260 · Reply Quote

Jim1348 Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653	Message 61269 - Posted: 18 Oct 2019, 19:34:36 UTC - in response to Message 61260. The estimates for my i7-8700 are a bit strange. If you just add the Elapsed Time and Time Left, you get about 5 days. But if you look at the % completed (only 6.1%), it comes out to about 26 days. Normally that means the "Time Left" is wrong, and will adjust itself in due course by slowing down. But at the moment, it is still decreasing in real time. Eventually, one or the other will change to more consistent values. The "% completed" could be non-linear, and the final result somewhere in between. ID: 61269 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 61270 - Posted: 18 Oct 2019, 20:03:05 UTC - in response to Message 61269. If you're running on the hyper cores, then it may be that. One of the researchers said some time back that doing that results in a lot of switching in the processor. I guess the code has something that likes/needs "real" cores. ID: 61270 · Reply Quote

Jim1348 Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653	Message 61271 - Posted: 18 Oct 2019, 20:08:26 UTC - in response to Message 61270. Last modified: 18 Oct 2019, 20:13:51 UTC Yes, it is on hyper cores. I can do real cores next, with a bit of memory juggling. I wanted to use my i7-9700 (8 real cores) anyway, but found that it was not stable with 64 GB of memory, at least not at the rated speed. But I now have new memory that might be more compatible. Or at least I can run the i7-8700 on real cores if need be. It should be ready by Christmas. EDIT: That much memory is not needed now, but I am planning for the OpenIFS. ID: 61271 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 61287 - Posted: 20 Oct 2019, 7:39:15 UTC I've just noticed something interesting. One of the 4 models running, which are batch 842, is now 35 minutes behind the others. Also about 0.15% behind. It was the last to start, about 1 minute behind the 3rd one to start. This is my "general use computer", and I've noticed it's slow to react, or even frozen for a few seconds. 11.5 hours until the first lot of zips. ID: 61287 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4389 Credit: 16,822,336 RAC: 5,993	Message 61289 - Posted: 20 Oct 2019, 9:33:33 UTC - in response to Message 61287. This is my "general use computer", and I've noticed it's slow to react, or even frozen for a few seconds. I have noticed this on my slow general use computer. But mine only has 2GB/core which really isn't enough if much else is running at the same time with these tasks. I have restricted it to just one of the two cores which has sorted that out. ID: 61289 · Reply Quote

WB8ILI Send message Joined: 1 Sep 04 Posts: 161 Credit: 81,421,805 RAC: 1,225	Message 61290 - Posted: 20 Oct 2019, 13:50:15 UTC Dave and Les - I stumbled across a Linux package called xosview which shows some cool information about memory usage, paging, cache, and if a cpu is in a wio (waiting for I/O) state. Maybe you already knew of it. ID: 61290 · Reply Quote

Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1093 Credit: 16,774,056 RAC: 3,819	Message 61291 - Posted: 20 Oct 2019, 16:19:06 UTC - in response to Message 61290. I stumbled across a Linux package called xosview which shows some cool information about memory usage, paging, cache, and if a cpu is in a wio (waiting for I/O) state. https://sourceforge.net/projects/xosview/ I have not run this one. http://xosview.sourceforge.net/ ID: 61291 · Reply Quote

Jim1348 Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653	Message 61295 - Posted: 20 Oct 2019, 19:36:00 UTC - in response to Message 61271. Last modified: 20 Oct 2019, 19:37:00 UTC One thing I have learned by monitoring the writes is that enabling or disabling hyper-threading on my i7-8700 has no effect. The write rate stays exactly the same at 33.5 GB/day, on either six full cores or twelve virtual cores. So the total work output would be the same over a period of time in either case. So you might as well save memory and operate on six full cores, or in other words just set BOINC to run on 50% of the available cores. As for the times, that is still a bit of a mystery and I won't know until I complete some under a given set of circumstances, but probably around 13 days on full cores and twice that on virtual cores. My i7-9700 should do better, but it is still early. ID: 61295 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 61296 - Posted: 20 Oct 2019, 20:10:32 UTC Finally! My first lot of zips have shown up. A bit over 137 Megs. ID: 61296 · Reply Quote

22 Send message Joined: 14 Mar 15 Posts: 1 Credit: 970,308 RAC: 12,438	Message 61297 - Posted: 20 Oct 2019, 20:42:15 UTC Do you have a figure for time between checkpoints to disk? I guess 2 and a half hours? Preferable to keep those tasks in memory and not shut down too often... ID: 61297 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 61298 - Posted: 20 Oct 2019, 21:07:54 UTC - in response to Message 61297. You can work out the figure for yours from the BOINC Properties list. Click on a model in the Tasks tab, then click on Properties to the left. A third of the way down the list is the time of the last checkpoint. Start writing down/watching, and soon you'll get what you want. ****************** Yes, these models are big, so the longer a computer can be left running the better. ID: 61298 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 61299 - Posted: 20 Oct 2019, 21:25:44 UTC - in response to Message 61290. WB8ILI No, but then I haven't gone looking for anything. I just leave them to get on with it. Mostly, anyway. I'll have a look at that program later. Jean Thanks for the link. ID: 61299 · Reply Quote

geophi Volunteer moderator Send message Joined: 7 Aug 04 Posts: 2171 Credit: 64,598,708 RAC: 762	Message 61301 - Posted: 21 Oct 2019, 0:24:56 UTC - in response to Message 61297. Do you have a figure for time between checkpoints to disk? I guess 2 and a half hours? Preferable to keep those tasks in memory and not shut down too often... You can enable "Checkpoint debug" under "Event Log Diagnostic Flags" or "Event Log Options" (depending on version of boinc). You can get at that from the "Advanced" or "Options" menu of boinc manager (also depending on the version of boinc). Of course this is probably useless for the hadcm3s models which checkpoint much, much more frequently. Keeping these big models in memory and not interrupting them frequently is definitely the right idea. My Ryzen 3600X running 4 models checkpoints about every 66 minutes per task. My i7-4970K does so every 106 minutes, also running 4 at a time. ID: 61301 · Reply Quote

alanb1951 Send message Joined: 31 Aug 04 Posts: 35 Credit: 9,581,380 RAC: 3,853	Message 61303 - Posted: 21 Oct 2019, 1:57:25 UTC TL;DR - you probably don't want to run more than one of these per 4MB+ of L3 cache... Jim1328's time estimates for an i7-8700 prompted me to do some tests (see below) as my experience with the Microbiome application (MIP1) at WCG, which is also a memory hog, suggests that one should only run one instance of that per 4MB (or more[1]) of L3 cache; running more results in significant increases in cache misses, with a corresponding drop in overall CPU effectiveness (for any BOINC tasks running, not just the hogs!) -- indeed, running 4 at a time on a machine with 8MB cache resulted in CPU temperatures dropping by 10C or more and run times nearly double that of a single task (which I restricted using the max_concurrent mechanism) Testing on an i5-7600 (6MB L3 cache, 4 cores, no hyper-threading, 8GB RAM, 3.5GHz clock) has shown HadAM4@N216 to be a cache-wrecker as well (no surprise there). I did tests with 1 HadAM4 task, 2 HadAM4 tasks, 3 HadAM4 tasks, and my normal workload if I have a CPDN task - 1 CPDN, 2 WCG. Running a single HadAM4 task with no company yields a checkpoint every 81 minutes; running two at once yields checkpoints every 91 minutes; running three, checkpoints are about 110 minutes apart. This is consistent with changes in the number of instructions run in a fixed time interval, which I monitored with the perf stat command. As checkpoints seem to be taken once per model day and there are about 120 days per 4-month model I'd reckon these would complete in about 6.8 days (running 1 at a time), 7.6 days (running 2 at a time) or 9.2 days (3 at a time). By the way, under my usual workload [avoiding MIP1 tasks as they mess up the cache too!], checkpoints are about 83 minutes apart, so it can be seen that the WCG tasks aren't really getting in the way. (If MIP1 tasks get in there, the checkpoints are about 86 minutes apart.) There's one thing in favour of running lots of these on a multi-core machine - your power draw will drop (as evidenced by CPU temperatures!) as the cores end up waiting for memory accesses more and more often! But I suspect there comes a point where each task takes so long to run that it's just not worth it - I, for one, will continue to treat CPDN as minority work on my Intel machines in order to maximize throughput. I'm about to take delivery of a Ryzen 3700X (32MB L3 cache, though I gather access is constrained to 8MB per 2 cores (4 threads)); I'll be interested to see how that behaves as and when it gets some CPDN work to do (and will probably do some bulk tests with WCG MIP1 to get an idea if there's no CPDN work available!) Cheers - Al. [1] Someone over at WCG seemed to think 5MB cache was what a MIP1 job would like. The user offered no justification for that number but 4MB probably isn't enough for near-optimum performance. ID: 61303 · Reply Quote

geophi Volunteer moderator Send message Joined: 7 Aug 04 Posts: 2171 Credit: 64,598,708 RAC: 762	Message 61304 - Posted: 21 Oct 2019, 4:26:56 UTC - in response to Message 61303. Last modified: 21 Oct 2019, 4:28:08 UTC The cache size definitely makes a difference as to how much the model speed slows down when loading more on. My 4790K has 8 MB of L3 cache and can run 1 N216 model at 13.9 sec/TS and 4 at 22 sec/TS. (58% slower) My 3600X has 32 MB of L3 cache and can run 1 N216 model at 11.2 sec/TS and 4 at 13.6 sec/TS. (21% slower) ID: 61304 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 61305 - Posted: 21 Oct 2019, 5:43:56 UTC There's also this page: Xosview for downloading it in a terminal window. ID: 61305 · Reply Quote

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 61306 - Posted: 21 Oct 2019, 5:49:48 UTC It looks like the cache is the culprit.. This will slow down those 64 and 128 core machines. Unless they're just crashing them because of the missing lib. ID: 61306 · Reply Quote

Jim1348 Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653	Message 61308 - Posted: 21 Oct 2019, 7:08:42 UTC - in response to Message 61303. Last modified: 21 Oct 2019, 7:28:06 UTC I'm about to take delivery of a Ryzen 3700X (32MB L3 cache, though I gather access is constrained to 8MB per 2 cores (4 threads)); I'll be interested to see how that behaves as and when it gets some CPDN work to do (and will probably do some bulk tests with WCG MIP1 to get an idea if there's no CPDN work available!) Cheers - Al. [1] Someone over at WCG seemed to think 5MB cache was what a MIP1 job would like. The user offered no justification for that number but 4MB probably isn't enough for near-optimum performance. Thanks a lot for the cache info. I was beginning to think that the issues were deeper than I had found. I just happen to have a Ryzen 3700x, and was wondering what its large L3 cache would do here. But I would need to add more memory. So let us know, and I could do it. EDIT: I have found that as I add more N216 to my i7-9700, the run time estimates increase, as manually calculated. The first one was 5.5 days, and the last one is now 15.5 days. So the cache is implicated, since they are all full cores and so hyper-threading is not an issue. (As for MIP1, I have found that I need to limit it to two running at a time on any of my machines - Intel or AMD. Cache could certainly play a role, or how it is accessed.) ID: 61308 · Reply Quote