New work Discussion

Author	Message
Jim1348 Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653	Message 63395 - Posted: 23 Jan 2021, 18:32:21 UTC - in response to Message 63391. Yes, my i5 finished 5 today and 1 in a few hours time. But the newer chips like Ryzens are slower per core. On the Linux ones, I found that my Ryzen 3950X was a little slower (21 sec/TS) than a Ryzen 3600 (18 1/2 sec/TS), when running two at a time. It is probably the difference in cache per core. I can run four on the Ryzen 3600 at that speed. ID: 63395 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 63396 - Posted: 23 Jan 2021, 19:01:35 UTC - in response to Message 63394. Last modified: 23 Jan 2021, 19:03:56 UTC [Peter Hucker wrote:]... Different programming? The SAFR region is larger than the EU one and the recent SAFR models 24 months instead of 13 months for the EU models. A factor going the other way is that the resolution of the recent EU models is double that of the SAFR models. The resulting estimated Gflops difference is listed below with a correction factor based on my machines. SAFR50/24 = 7,694,788 Gflops (/ 2.39) EU25/13 = 2,061,502 Gflops (/ 0.67) Thus, for example, the SAFR/EU ratio for CPU time on my machines is expected to be (7,694,788 / 2.39) / (2,061,502 / 0.67) = 1.05. The ratio for two models that finished on one of my machines from batch #890 (SAFR/24) and batch #894 (EU25/13) was 345,907.20 / 319,405.60 = 1.08 - i.e. about as expected. Thanks. I'm going to assume something in the EU ones is disagreeing with my old Xeons. Could be cache. They have 12MB between 12 cores and my i5 has 9MB between 6 cores, and probably a newer better cache design. The Xeons are also using single channel RAM. ID: 63396 ·

geophi Volunteer moderator Send message Joined: 7 Aug 04 Posts: 2167 Credit: 64,468,868 RAC: 3,594	Message 63397 - Posted: 23 Jan 2021, 19:16:43 UTC - in response to Message 63395. On the Linux ones, I found that my Ryzen 3950X was a little slower (21 sec/TS) than a Ryzen 3600 (18 1/2 sec/TS), when running two at a time. It is probably the difference in cache per core. I can run four on the Ryzen 3600 at that speed. What else are you running with the two N216 models? That can't be running them all by themselves? ID: 63397 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4341 Credit: 16,496,276 RAC: 6,460	Message 63398 - Posted: 23 Jan 2021, 23:08:36 UTC On my 3700x the five month N216s take between 724,512.90 and 765,076.60seconds cpu time, the fastest being mostly with just 2 tasks running, the slowest with 8 on the go at once. Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer. ID: 63398 ·

rbpeake Send message Joined: 27 Feb 08 Posts: 41 Credit: 1,402,356 RAC: 0	Message 63399 - Posted: 24 Jan 2021, 0:11:38 UTC As a Windows user who has completed some units, does it make sense now to set “no more work” so that I can process work from other projects? Until the next batch of Windows units comes along, whenever that is. Regards, Bob P. ID: 63399 ·

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 63400 - Posted: 24 Jan 2021, 0:37:51 UTC - in response to Message 63399. There doesn't seem to be anything in the pipeline at the moment, so OK. Just remember that new work may show up unexpectedly, and not take long to go. ID: 63400 ·

Jim1348 Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653	Message 63401 - Posted: 24 Jan 2021, 2:35:05 UTC - in response to Message 63397. Last modified: 24 Jan 2021, 2:36:37 UTC On the Linux ones, I found that my Ryzen 3950X was a little slower (21 sec/TS) than a Ryzen 3600 (18 1/2 sec/TS), when running two at a time. It is probably the difference in cache per core. I can run four on the Ryzen 3600 at that speed. What else are you running with the two N216 models? That can't be running them all by themselves? As a guess, it was probably Rosetta, or possibly QuChemPedIA on the 3600 (with all the cores loaded). More recently, I have been running WCG/OPN or ARP (among others) with less than the full number of cores. I am still trying to find an optimum. ID: 63401 ·

Bryn Mawr Send message Joined: 28 Jul 19 Posts: 147 Credit: 12,814,088 RAC: 261,385	Message 63402 - Posted: 24 Jan 2021, 2:52:32 UTC - in response to Message 63399. As a Windows user who has completed some units, does it make sense now to set “no more work” so that I can process work from other projects? Until the next batch of Windows units comes along, whenever that is. What would be the advantage? Whilst there are no WUs to get NNT will do nothing, your system will move on to other projects whether or not it is set and it won’t stop you getting jobs that aren’t there. On the other hand, I f some new tasks are released unexpectedly having NNT set will stop you from getting any. ID: 63402 ·

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 63403 - Posted: 24 Jan 2021, 4:41:12 UTC - in response to Message 63402. Last modified: 24 Jan 2021, 4:41:37 UTC I was thinking that Bob could avoid re-sends, so that he can do some work from elsewhere for a while. ID: 63403 ·

Bryn Mawr Send message Joined: 28 Jul 19 Posts: 147 Credit: 12,814,088 RAC: 261,385	Message 63404 - Posted: 24 Jan 2021, 12:00:05 UTC - in response to Message 63403. I was thinking that Bob could avoid re-sends, so that he can do some work from elsewhere for a while. Work is work :-) I suppose is was unable to get any form of work for so long I’ll take anything. ID: 63404 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 63405 - Posted: 24 Jan 2021, 13:17:44 UTC - in response to Message 63402. As a Windows user who has completed some units, does it make sense now to set “no more work” so that I can process work from other projects? Until the next batch of Windows units comes along, whenever that is. What would be the advantage? Whilst there are no WUs to get NNT will do nothing, your system will move on to other projects whether or not it is set and it won’t stop you getting jobs that aren’t there. On the other hand, I f some new tasks are released unexpectedly having NNT set will stop you from getting any. I have CPDN (and other small rare projects like Ralph) set to a much higher weighting than other projects, so if there's work it gets it. If there isn't, then it does the other projects. Like you said, "no new work" is pointless. ID: 63405 ·

rbpeake Send message Joined: 27 Feb 08 Posts: 41 Credit: 1,402,356 RAC: 0	Message 63408 - Posted: 24 Jan 2021, 19:34:44 UTC My other project is Folding@home which is outside the Boinc ecosystem. But you have given me the idea to just reduce the core count a little on that project so I leave an opening for future potential CPDN work. Thanks. Regards, Bob P. ID: 63408 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 63409 - Posted: 24 Jan 2021, 20:04:54 UTC - in response to Message 63408. Last modified: 24 Jan 2021, 20:05:38 UTC My other project is Folding@home which is outside the Boinc ecosystem. But you have given me the idea to just reduce the core count a little on that project so I leave an opening for future potential CPDN work. Thanks. I don't like computers sitting idle. I make sure something is running on everything all the time. I've never tried Folding so I don't know how you get them to interact. But I'm guessing if you just left Boinc running, when you noticed it grabbed some CPDN, you could turn the wick down on Folding a bit. Having Folding use all your cores shouldn't stop Boinc thinking they're all available to Boinc. ID: 63409 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4341 Credit: 16,496,276 RAC: 6,460	Message 63410 - Posted: 25 Jan 2021, 8:27:23 UTC - in response to Message 63362. The 3 Windows batches are showing early signs of failures, but no hard fails yet, so I can't see what the problem is. Still only four hard fails across all three batches and looking at the successes coming in I think we can say these EU tasks don't have the same problem the SAFR ones did. Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer. ID: 63410 ·

Jim1348 Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653	Message 63413 - Posted: 25 Jan 2021, 20:12:18 UTC - in response to Message 63409. I've never tried Folding so I don't know how you get them to interact. But I'm guessing if you just left Boinc running, when you noticed it grabbed some CPDN, you could turn the wick down on Folding a bit. Having Folding use all your cores shouldn't stop Boinc thinking they're all available to Boinc. Yes, they operate independently, so BOINC will still get work even with Folding running. For that matter, you could run them both at the same time, and the operating system will split its resources more or less equally between them. But the overall efficiency drops a bit, so I would not do it for long. But I use Folding mainly on the GPUs, and just have to reserve a single CPU core in BOINC to support it. ID: 63413 ·

JIM Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,054,149 RAC: 344	Message 63414 - Posted: 26 Jan 2021, 2:45:33 UTC - in response to Message 63410. The 3 Windows batches are showing early signs of failures, but no hard fails yet, so I can't see what the problem is. Still only four hard fails across all three batches and looking at the successes coming in I think we can say these EU tasks don't have the same problem the SAFR ones did. Yes, the EU seem to be a good batch. Had 2 WU’s finish successfully this morning (Eastern Standard Time U.S.). Three more should finish in a few hours (knock wood). ID: 63414 ·

Les Bayliss Volunteer moderator Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0	Message 63415 - Posted: 26 Jan 2021, 4:02:23 UTC - in response to Message 63414. Thanks for those Jim. The stats seem good on these 3 batches. ID: 63415 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 63416 - Posted: 26 Jan 2021, 15:27:00 UTC - in response to Message 63410. The 3 Windows batches are showing early signs of failures, but no hard fails yet, so I can't see what the problem is. Still only four hard fails across all three batches and looking at the successes coming in I think we can say these EU tasks don't have the same problem the SAFR ones did. No problems have occurred here so far except two just failed, but that's because the computer inexplicably locked up (I can't tell why, it has no monitor) and I had to power it off. I don't think CPDN tasks like being rudely interrupted. They're fine with Boinc switching tasks (I have "leave applications in memory" ticked), but they can't stand a computer crash. Some better checkpointing would help, it should have gone back to the previous known good stage. These are the offending ones: https://www.cpdn.org/result.php?resultid=22000528 https://www.cpdn.org/result.php?resultid=21999670 ID: 63416 ·

Mr. P Hucker Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918	Message 63418 - Posted: 28 Jan 2021, 15:03:48 UTC - in response to Message 63416. The 3 Windows batches are showing early signs of failures, but no hard fails yet, so I can't see what the problem is. Still only four hard fails across all three batches and looking at the successes coming in I think we can say these EU tasks don't have the same problem the SAFR ones did. No problems have occurred here so far except two just failed, but that's because the computer inexplicably locked up (I can't tell why, it has no monitor) and I had to power it off. I don't think CPDN tasks like being rudely interrupted. They're fine with Boinc switching tasks (I have "leave applications in memory" ticked), but they can't stand a computer crash. Some better checkpointing would help, it should have gone back to the previous known good stage. These are the offending ones: https://www.cpdn.org/result.php?resultid=22000528 https://www.cpdn.org/result.php?resultid=21999670 Same happened with two on a working machine, which I rebooted cleanly. Should Boinc not gracefully shut down running CPDN tasks itself? https://www.cpdn.org/result.php?resultid=22000610 https://www.cpdn.org/result.php?resultid=21998804 ID: 63418 ·

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4341 Credit: 16,496,276 RAC: 6,460	Message 63419 - Posted: 28 Jan 2021, 15:35:51 UTC Same happened with two on a working machine, which I rebooted cleanly. Should Boinc not gracefully shut down running CPDN tasks itself? You are right, BOINC should restart the task from the last checkpoint reached. In the past, my memory is of this being a bigger problem with Linux tasks but I haven't had a problem with it recently, even when I have updated the Linux kernel which requires a reboot. My experience a few years ago was that a kernel change combined with a reboot greatly increased the chances of tasks crashing. To minimise the chances of tasks crashing, I suspend tasks individually, exit BOINC manager and client before rebooting. On restarting, I resume tasks one at a time, allowing a couple of minutes between resuming individual tasks. I don't know if on the most recent task types this makes any difference but it used to. I don't know what happens with other projects. For a fair comparison you might need to look at something like LHC@home which like CPDN has a large number of files open at once, all of which need closing down by BOINC when exiting. If you reboot without exiting BOINC first, again in theory tasks should resume from previous checkpoint but experience tells me that doing so dramatically increases the chances of failure though last time I had a power failure, all tasks survived. I am not really sure if this is a BOINC issue or a CPDN one which makes sorting it out difficult. Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer. ID: 63419 ·