climateprediction.net home page
New work Discussion

New work Discussion

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 76 · 77 · 78 · 79 · 80 · 81 · 82 . . . 91 · Next

AuthorMessage
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1054
Credit: 16,510,175
RAC: 433
Message 65438 - Posted: 14 May 2022, 4:17:49 UTC - in response to Message 65437.  

I was gettng Einstein and Milky Way to work OK on my Linux box.


I set up to run Milky Way on my Linux box. According to the top command, one instance of Milky Way is running on one processor, but it is using 679.7% of the processor it is running on. How is that possible? And besides that, all the other Boinc processes are sleeping. In the Boinc Manager task list, one process is running but it says (8 CPUs)
A bunch have finished: valid. They take about 1000 seconds each. Most of the rest are predicted to take about an hour each. More than one at a time of those run, and that allows other Boinc processes to run too.
Amazing. How do they do that?
ID: 65438 · Report as offensive
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 251
Credit: 31,560,817
RAC: 30,969
Message 65439 - Posted: 14 May 2022, 4:22:22 UTC - in response to Message 65431.  

Last batch of N216 testing tasks were successful and thetesting of Mac HADCM3s tasks seems to be good so hoping that there will be some main site work soon


Yay! Hopefully I can get another VM or two running here soon. I just got some more AMD hardware to throw at problems.

I am not paranoid enough to believe there is a conspiracy to shut down distributed computing, but I would not be surprised if some though so.


It's one of the weirder things to get some conspiracy to shut down.

Reality, far as I'm concerned, is somewhat simpler: There's not much interest outside the legacy types who've been doing it since SETI was running on Pentiums, and it's a bit of a pain in the rear to handle in terms of how tasks have to be phrased in order for distributed computing to provide a good answer. When you can go spin up an AWS supercomputer for grant money, "attracting random people with a weird mismash of computers to get work back to you at some point" becomes less appealing.

I mean, is anyone under about 35 posting in this thread?
ID: 65439 · Report as offensive
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 247
Credit: 11,759,171
RAC: 18,771
Message 65440 - Posted: 14 May 2022, 7:16:38 UTC - in response to Message 65438.  

You're running MilkyWay N-Body Simulation. That sub-project is multi-threaded and will use the most threads available (up to 16) unless you set some controls using app_config.xml. Top and htop commands interpret multithread usage in a strange way (over 100% usage of a single thread). It sounds like you might be running it on an 8 thread CPU and it's using all of the threads.
ID: 65440 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4341
Credit: 16,485,487
RAC: 6,130
Message 65449 - Posted: 18 May 2022, 16:12:01 UTC

More HADCM3S in testing. No use to me as I haven't managed to get the virtualisation working. And of course no guarantees about when/if these will translate into main site work, though I think it likely they will at some point.
ID: 65449 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1054
Credit: 16,510,175
RAC: 433
Message 65450 - Posted: 18 May 2022, 18:46:39 UTC - in response to Message 65449.  

This one worked for me on my Red Hat Enterprise Linux 8 machine. This was just before they set it to send these only to Mac machines. Most of the others of this set crashed in 30 seconds or less (IIRC). But this one worked OK. Tis is straight Linux, no VM.
Task 22191699
Name 	hadcm3s_1k9d_200012_168_926_012129726_2
Workunit 	12129726
Created 	29 Jan 2022, 20:46:55 UTC
Sent 	29 Jan 2022, 20:48:05 UTC
Report deadline 	12 Jan 2023, 2:08:05 UTC
Received 	1 Feb 2022, 13:43:03 UTC       <---<<<
Server state 	Over
Outcome 	Success
Client state 	Done
Exit status 	0 (0x00000000)
Computer ID 	1511241

Completed 211,754.62 210,243.20 4,354.56
ID: 65450 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4341
Credit: 16,485,487
RAC: 6,130
Message 65451 - Posted: 18 May 2022, 20:03:16 UTC - in response to Message 65450.  

Yes, I had quite a few completed from that batch, then a bunch of them failed. No obvious reason why some worked and some didn't but the first eight or nine did complete.
ID: 65451 · Report as offensive
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 247
Credit: 11,759,171
RAC: 18,771
Message 65452 - Posted: 18 May 2022, 20:21:19 UTC - in response to Message 65451.  

Dave, do you also have an Intel PC? If so, have you tried setting up a macOS VM on it? For me, I couldn't get it to work on Ryzen 5900X but did on i7-4790 (both Windows 10).
ID: 65452 · Report as offensive
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,458,099
RAC: 2,895
Message 65453 - Posted: 18 May 2022, 22:56:57 UTC - in response to Message 65452.  

Dave, do you also have an Intel PC? If so, have you tried setting up a macOS VM on it? For me, I couldn't get it to work on Ryzen 5900X but did on i7-4790 (both Windows 10).

Andrey,

Both SolarSonyk and I have had success on AMD Ryzens with Mac guests on a Linux host. Whatever problem you are having with VirtualBox in Windows on AMD, isn't translating to KVM on a Linux host.
ID: 65453 · Report as offensive
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 247
Credit: 11,759,171
RAC: 18,771
Message 65454 - Posted: 19 May 2022, 7:43:26 UTC - in response to Message 65453.  

Yes, I knew that and I bet that most people here running macOS VMs are on Linux. However, Dave isn't able to for some reason. I even thought of trying it myself via WSL2 Ubuntu or Hyper-V Ubuntu but found out that nested virtualization on AMD processors isn't available until Windows 11 and I didn't want to upgrade. Later someone posted that they were able to do it on Windows via VirtualBox so I tried on Ryzen first since it's a much more powerful machine but unfortunately it didn't work. It did work on an older Intel PC so that's what I've been using to crunch Mac tasks.
ID: 65454 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4341
Credit: 16,485,487
RAC: 6,130
Message 65455 - Posted: 19 May 2022, 9:51:25 UTC

Dave, do you also have an Intel PC? If so, have you tried setting up a macOS VM on it? For me, I couldn't get it to work on Ryzen 5900X but did on i7-4790 (both Windows 10).


I have a Ryzen7. I have no problems running another instance of Ubuntu in VB or KVM but haven't managed to get the MacOS VM to work on it. At some point, I will try a clean install and see if I can get it going on that. I haven't yet tried nesting via the Ubuntu VM.
ID: 65455 · Report as offensive
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 251
Credit: 31,560,817
RAC: 30,969
Message 65456 - Posted: 19 May 2022, 18:00:35 UTC

Don't bother with nested virt, it has a substantial impact on performance. I got it working on cloud machines with nested virt, but performance wasn't any better than an older Intel box.

I've got an AMD 3700X and 3900X running MacOS VMs on the AMD side now, though I have to fiddle with the CPU allocations on the 3900X - MacOS doesn't like non-power-of-two CPU counts per socket, so I *think* the right approach is to tell it I have three sockets, each with 4 cores, if I want it to use all 12 cores. Or I can just run 8 tasks and let it have more L3 per task.

No idea on Windows based stuff, sorry. :/
ID: 65456 · Report as offensive
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 247
Credit: 11,759,171
RAC: 18,771
Message 65457 - Posted: 19 May 2022, 19:51:23 UTC - in response to Message 65456.  

SolarSyonyk, I was looking through your Mac tasks to see how fast you process them and noticed that a lot of your recent tasks (all of the ones returned in May) have an error I haven't seen before, even though the tasks are marked as successfully completed. What promted me to look how far it goes is that you weren't getting full credit for the completed tasks. Have you changed anything with your systems recently?
Model crashed: INANCLA: Error opening file  
ID: 65457 · Report as offensive
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 251
Credit: 31,560,817
RAC: 30,969
Message 65458 - Posted: 19 May 2022, 20:14:16 UTC
Last modified: 19 May 2022, 20:35:54 UTC

Huh... no, I haven't changed a thing since the older units that chewed through. I see they seem to be coming back abandoned, but have full trickles? I just saw them computing at my end and assumed that since they were running, things were fine. I don't particularly care about the credit, just that the work is actually being useful.

It looks like one of my machines is working properly, and the 5775C is doing the work but somehow dropping the tasks before they're reported? Even though they have trickles?

Feel free to reach out via email - I can dump the WUs out if they're not being completed usefully. I had some trouble very early on with some WUs being abandoned when I copied a VM around and the server couldn't disambiguate between machines. I suppose I should change the MAC addresses to help with that problem. I only have a couple "loads" left from this set of WUs, though.

Sorry, I didn't think to check stuff, since it had been working fine previously.

// EDIT:

I'm seeing this on my working system as well. The full error message in stderr.txt, far enough down, ha "tmp/pipe_dummy" as well - looks like that's the file it's trying to open? But stuff is running well enough, and it's generating trickles, so... ?

I've changed the MAC addresses on the machines in the startup scripts, so hopefully they don't stomp on each other in the future. I'll re-join them to the project after they're drained out as well, so hopefully they'll get properly disambiguated.
ID: 65458 · Report as offensive
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2167
Credit: 64,458,099
RAC: 2,895
Message 65459 - Posted: 19 May 2022, 21:13:37 UTC

I'm getting the "Model crashed: INANCLA: Error opening file " error in stderr as well, but the tasks finish successfully and upload all zips and trickles. I brought that error up on the dev site and they said it wasn't an issue and everything was returned as expected.
ID: 65459 · Report as offensive
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 251
Credit: 31,560,817
RAC: 30,969
Message 65460 - Posted: 19 May 2022, 21:46:00 UTC

Awesome - so my only actual problem is that one of my machine got stomped and it's doing work for no actual credit. I'll let it run, maybe the trickles will be useful, and then I'll rejoin it after this batch is done.
ID: 65460 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 65461 - Posted: 19 May 2022, 21:56:49 UTC - in response to Message 65460.  

The trickle_up files can be used to return data, but these days they're mostly just there to say: "I'm still running".
It's the zip files that contain the calculation results.
ID: 65461 · Report as offensive
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 251
Credit: 31,560,817
RAC: 30,969
Message 65462 - Posted: 19 May 2022, 23:19:24 UTC
Last modified: 19 May 2022, 23:28:45 UTC

Ah, alright. I'll dump the tasks on that machine and hope someone else can figure them out. Looks like there were two that were still valid, so they're running now.

Yeah, I figured out what happened. The new machine I brought online with the same MAC address (different host, so it doesn't matter on the LAN, but for remote ID purposes...) smashed the tasks the other one was running - it had the same ID as the other one in the requests.

So, if you're doing the VM thing, randomize your MAC addresses, or you'll create problems.

Sorry, again... waste of some perfectly good compute cycles. :/
ID: 65462 · Report as offensive
computezrmle

Send message
Joined: 9 Mar 22
Posts: 30
Credit: 963,113
RAC: 46,932
Message 65463 - Posted: 20 May 2022, 3:56:40 UTC - in response to Message 65462.  

The new machine I brought online with the same MAC address (different host, so it doesn't matter on the LAN ...

Well, this does matter if the interfaces are connected to the same network segment.
Like IPs, MACs must be unique.

Fortunately you already posted a (mostly) working solution:
So, if you're doing the VM thing, randomize your MAC addresses, or you'll create problems.
ID: 65463 · Report as offensive
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4341
Credit: 16,485,487
RAC: 6,130
Message 65464 - Posted: 20 May 2022, 12:07:52 UTC

Don't bother with nested virt, it has a substantial impact on performance. I got it working on cloud machines with nested virt, but performance wasn't any better than an older Intel box.
I guessed there would be a substantial performance hit. If I do it, it will be more about proof of concept than anything. Certainly not going to be doing it regularly.
ID: 65464 · Report as offensive
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 251
Credit: 31,560,817
RAC: 30,969
Message 65465 - Posted: 20 May 2022, 15:23:51 UTC - in response to Message 65463.  


Well, this does matter if the interfaces are connected to the same network segment.
Like IPs, MACs must be unique.


The reason it doesn't matter (in terms of network function) is that my VMs are each on separate physical hosts, and the default network configuration for KVM (which I'm not changing) is to have NAT happening on each machine. So the VMs get an IP in the 192.168.122.0/24 range, while the actual host machines have their unique MAC addresses and are on the physical network segment out here.

LAN -> VM Hosts (unique MACs) -> VMs (identical MACs) works, just leads to BOINC confusion. Apparently it can deal with some conditions, but not all of them, and if it can't tell the difference between two VMs, you end up with tasks getting abandoned.

So if you're using the scripts to run the VMs, just twiddle a few bits in the MAC address line.

I guessed there would be a substantial performance hit. If I do it, it will be more about proof of concept than anything. Certainly not going to be doing it regularly.


It's certainly a fun little trick to get working! I ran a bunch of tasks on some cloud boxes (so nested virt with MacOS on Linux on presumably Linux), but they were sub-1%/hour progress, and not particularly cheap to run.
ID: 65465 · Report as offensive
Previous · 1 . . . 76 · 77 · 78 · 79 · 80 · 81 · 82 . . . 91 · Next

Message boards : Number crunching : New work Discussion

©2024 climateprediction.net