Message boards : Number crunching : New work Discussion
Message board moderation
Previous · 1 . . . 76 · 77 · 78 · 79 · 80 · 81 · 82 . . . 91 · Next
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I was gettng Einstein and Milky Way to work OK on my Linux box. I set up to run Milky Way on my Linux box. According to the top command, one instance of Milky Way is running on one processor, but it is using 679.7% of the processor it is running on. How is that possible? And besides that, all the other Boinc processes are sleeping. In the Boinc Manager task list, one process is running but it says (8 CPUs) A bunch have finished: valid. They take about 1000 seconds each. Most of the rest are predicted to take about an hour each. More than one at a time of those run, and that allows other Boinc processes to run too. Amazing. How do they do that? |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
Last batch of N216 testing tasks were successful and thetesting of Mac HADCM3s tasks seems to be good so hoping that there will be some main site work soon Yay! Hopefully I can get another VM or two running here soon. I just got some more AMD hardware to throw at problems. I am not paranoid enough to believe there is a conspiracy to shut down distributed computing, but I would not be surprised if some though so. It's one of the weirder things to get some conspiracy to shut down. Reality, far as I'm concerned, is somewhat simpler: There's not much interest outside the legacy types who've been doing it since SETI was running on Pentiums, and it's a bit of a pain in the rear to handle in terms of how tasks have to be phrased in order for distributed computing to provide a good answer. When you can go spin up an AWS supercomputer for grant money, "attracting random people with a weird mismash of computers to get work back to you at some point" becomes less appealing. I mean, is anyone under about 35 posting in this thread? |
Send message Joined: 12 Apr 21 Posts: 318 Credit: 14,977,739 RAC: 10,025 |
You're running MilkyWay N-Body Simulation. That sub-project is multi-threaded and will use the most threads available (up to 16) unless you set some controls using app_config.xml. Top and htop commands interpret multithread usage in a strange way (over 100% usage of a single thread). It sounds like you might be running it on an 8 thread CPU and it's using all of the threads. |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
More HADCM3S in testing. No use to me as I haven't managed to get the virtualisation working. And of course no guarantees about when/if these will translate into main site work, though I think it likely they will at some point. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
This one worked for me on my Red Hat Enterprise Linux 8 machine. This was just before they set it to send these only to Mac machines. Most of the others of this set crashed in 30 seconds or less (IIRC). But this one worked OK. Tis is straight Linux, no VM. Task 22191699 Name hadcm3s_1k9d_200012_168_926_012129726_2 Workunit 12129726 Created 29 Jan 2022, 20:46:55 UTC Sent 29 Jan 2022, 20:48:05 UTC Report deadline 12 Jan 2023, 2:08:05 UTC Received 1 Feb 2022, 13:43:03 UTC <---<<< Server state Over Outcome Success Client state Done Exit status 0 (0x00000000) Computer ID 1511241 Completed 211,754.62 210,243.20 4,354.56 |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
Yes, I had quite a few completed from that batch, then a bunch of them failed. No obvious reason why some worked and some didn't but the first eight or nine did complete. |
Send message Joined: 12 Apr 21 Posts: 318 Credit: 14,977,739 RAC: 10,025 |
Dave, do you also have an Intel PC? If so, have you tried setting up a macOS VM on it? For me, I couldn't get it to work on Ryzen 5900X but did on i7-4790 (both Windows 10). |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Dave, do you also have an Intel PC? If so, have you tried setting up a macOS VM on it? For me, I couldn't get it to work on Ryzen 5900X but did on i7-4790 (both Windows 10). Andrey, Both SolarSonyk and I have had success on AMD Ryzens with Mac guests on a Linux host. Whatever problem you are having with VirtualBox in Windows on AMD, isn't translating to KVM on a Linux host. |
Send message Joined: 12 Apr 21 Posts: 318 Credit: 14,977,739 RAC: 10,025 |
Yes, I knew that and I bet that most people here running macOS VMs are on Linux. However, Dave isn't able to for some reason. I even thought of trying it myself via WSL2 Ubuntu or Hyper-V Ubuntu but found out that nested virtualization on AMD processors isn't available until Windows 11 and I didn't want to upgrade. Later someone posted that they were able to do it on Windows via VirtualBox so I tried on Ryzen first since it's a much more powerful machine but unfortunately it didn't work. It did work on an older Intel PC so that's what I've been using to crunch Mac tasks. |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
Dave, do you also have an Intel PC? If so, have you tried setting up a macOS VM on it? For me, I couldn't get it to work on Ryzen 5900X but did on i7-4790 (both Windows 10). I have a Ryzen7. I have no problems running another instance of Ubuntu in VB or KVM but haven't managed to get the MacOS VM to work on it. At some point, I will try a clean install and see if I can get it going on that. I haven't yet tried nesting via the Ubuntu VM. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
Don't bother with nested virt, it has a substantial impact on performance. I got it working on cloud machines with nested virt, but performance wasn't any better than an older Intel box. I've got an AMD 3700X and 3900X running MacOS VMs on the AMD side now, though I have to fiddle with the CPU allocations on the 3900X - MacOS doesn't like non-power-of-two CPU counts per socket, so I *think* the right approach is to tell it I have three sockets, each with 4 cores, if I want it to use all 12 cores. Or I can just run 8 tasks and let it have more L3 per task. No idea on Windows based stuff, sorry. :/ |
Send message Joined: 12 Apr 21 Posts: 318 Credit: 14,977,739 RAC: 10,025 |
SolarSyonyk, I was looking through your Mac tasks to see how fast you process them and noticed that a lot of your recent tasks (all of the ones returned in May) have an error I haven't seen before, even though the tasks are marked as successfully completed. What promted me to look how far it goes is that you weren't getting full credit for the completed tasks. Have you changed anything with your systems recently? Model crashed: INANCLA: Error opening file |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
Huh... no, I haven't changed a thing since the older units that chewed through. I see they seem to be coming back abandoned, but have full trickles? I just saw them computing at my end and assumed that since they were running, things were fine. I don't particularly care about the credit, just that the work is actually being useful. It looks like one of my machines is working properly, and the 5775C is doing the work but somehow dropping the tasks before they're reported? Even though they have trickles? Feel free to reach out via email - I can dump the WUs out if they're not being completed usefully. I had some trouble very early on with some WUs being abandoned when I copied a VM around and the server couldn't disambiguate between machines. I suppose I should change the MAC addresses to help with that problem. I only have a couple "loads" left from this set of WUs, though. Sorry, I didn't think to check stuff, since it had been working fine previously. // EDIT: I'm seeing this on my working system as well. The full error message in stderr.txt, far enough down, ha "tmp/pipe_dummy" as well - looks like that's the file it's trying to open? But stuff is running well enough, and it's generating trickles, so... ? I've changed the MAC addresses on the machines in the startup scripts, so hopefully they don't stomp on each other in the future. I'll re-join them to the project after they're drained out as well, so hopefully they'll get properly disambiguated. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
I'm getting the "Model crashed: INANCLA: Error opening file " error in stderr as well, but the tasks finish successfully and upload all zips and trickles. I brought that error up on the dev site and they said it wasn't an issue and everything was returned as expected. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
Awesome - so my only actual problem is that one of my machine got stomped and it's doing work for no actual credit. I'll let it run, maybe the trickles will be useful, and then I'll rejoin it after this batch is done. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The trickle_up files can be used to return data, but these days they're mostly just there to say: "I'm still running". It's the zip files that contain the calculation results. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
Ah, alright. I'll dump the tasks on that machine and hope someone else can figure them out. Looks like there were two that were still valid, so they're running now. Yeah, I figured out what happened. The new machine I brought online with the same MAC address (different host, so it doesn't matter on the LAN, but for remote ID purposes...) smashed the tasks the other one was running - it had the same ID as the other one in the requests. So, if you're doing the VM thing, randomize your MAC addresses, or you'll create problems. Sorry, again... waste of some perfectly good compute cycles. :/ |
Send message Joined: 9 Mar 22 Posts: 30 Credit: 1,065,239 RAC: 556 |
The new machine I brought online with the same MAC address (different host, so it doesn't matter on the LAN ... Well, this does matter if the interfaces are connected to the same network segment. Like IPs, MACs must be unique. Fortunately you already posted a (mostly) working solution: So, if you're doing the VM thing, randomize your MAC addresses, or you'll create problems. |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
Don't bother with nested virt, it has a substantial impact on performance. I got it working on cloud machines with nested virt, but performance wasn't any better than an older Intel box.I guessed there would be a substantial performance hit. If I do it, it will be more about proof of concept than anything. Certainly not going to be doing it regularly. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
The reason it doesn't matter (in terms of network function) is that my VMs are each on separate physical hosts, and the default network configuration for KVM (which I'm not changing) is to have NAT happening on each machine. So the VMs get an IP in the 192.168.122.0/24 range, while the actual host machines have their unique MAC addresses and are on the physical network segment out here. LAN -> VM Hosts (unique MACs) -> VMs (identical MACs) works, just leads to BOINC confusion. Apparently it can deal with some conditions, but not all of them, and if it can't tell the difference between two VMs, you end up with tasks getting abandoned. So if you're using the scripts to run the VMs, just twiddle a few bits in the MAC address line. I guessed there would be a substantial performance hit. If I do it, it will be more about proof of concept than anything. Certainly not going to be doing it regularly. It's certainly a fun little trick to get working! I ran a bunch of tasks on some cloud boxes (so nested virt with MacOS on Linux on presumably Linux), but they were sub-1%/hour progress, and not particularly cheap to run. |
©2024 cpdn.org