climateprediction.net home page
If you have used VirtualBox for BOINC and have had issues, please can you share these?

If you have used VirtualBox for BOINC and have had issues, please can you share these?

Message boards : Number crunching : If you have used VirtualBox for BOINC and have had issues, please can you share these?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
wujj123456

Send message
Joined: 14 Sep 08
Posts: 87
Credit: 32,981,759
RAC: 14,695
Message 67271 - Posted: 4 Jan 2023, 1:04:20 UTC - in response to Message 67269.  
Last modified: 4 Jan 2023, 1:05:10 UTC

– Disk footprint: The VM images take a decent amount of space.

– Network transfers: The VM images need to be downloaded.

AFAIK, the image download happens only once and there is only one copy of the image. The base VM image is not copied to each slot from my experience with LHC. The disk requirement is still larger due to the snapshot image the task generates while running. I believe the snapshot is incremental difference from the base image and seems to be triggered by checkpoints, but I haven't looked in more detail. This one-time image download is likely negligible given OpenIFS' upload requirement.

– Network transfers control taken away from the boinc client: All vboxwrapper based applications which I have encountered so far perform network transfers from within the VM, completely outside of the control of the boinc client.

Very good point. It ignores the proxy configuration on my boinc client which I use to limit all boinc traffic. This could be pretty problematic for OpenIFS if it uploads from within VM and upload server changes frequently. Given VM can share directory with boinc though, I feel this can be done properly by having boinc client to do the upload. Perhaps it's just LHC that depends on a distributed filesystem inside the VM needs network from within the VM. Thankfully it doesn't do any upload but others might have hard requirements for a proxy that won't be happy with VM ignoring it.
ID: 67271 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 809
Credit: 13,612,636
RAC: 5,507
Message 67377 - Posted: 5 Jan 2023, 21:20:45 UTC - in response to Message 67271.  

– Network transfers control taken away from the boinc client: All vboxwrapper based applications which I have encountered so far perform network transfers from within the VM, completely outside of the control of the boinc client.

Very good point. It ignores the proxy configuration on my boinc client which I use to limit all boinc traffic. This could be pretty problematic for OpenIFS if it uploads from within VM and upload server changes frequently. Given VM can share directory with boinc though, I feel this can be done properly by having boinc client to do the upload. Perhaps it's just LHC that depends on a distributed filesystem inside the VM needs network from within the VM. Thankfully it doesn't do any upload but others might have hard requirements for a proxy that won't be happy with VM ignoring it.

That's not the way I would expect to develop it. I would aim to have the vbox app treated as much like a non-vbox app as possible. So input AND output files would go in/out via the shared folder. As long as that is set up correctly, I don't see why the client shouldn't handle network in the normal way. Actually, I'm surprised that other vbox apps are allowed to even access the network. OpenIFS and its wrapper code only talk directly to the client and hand off uploads to it. Maybe there's some history there as people worked through the best way to create vbox apps. There are advantages coming late to the party sometimes.
ID: 67377 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 87
Credit: 32,981,759
RAC: 14,695
Message 67380 - Posted: 5 Jan 2023, 22:04:08 UTC - in response to Message 67377.  

That's not the way I would expect to develop it. I would aim to have the vbox app treated as much like a non-vbox app as possible. So input AND output files would go in/out via the shared folder. As long as that is set up correctly, I don't see why the client shouldn't handle network in the normal way. Actually, I'm surprised that other vbox apps are allowed to even access the network. OpenIFS and its wrapper code only talk directly to the client and hand off uploads to it. Maybe there's some history there as people worked through the best way to create vbox apps. There are advantages coming late to the party sometimes.

Perfect. Vbox or native, LHC apps require the host to be always online due to the use of distributed cvmfs. It's certainly different from what most BOINC projects do from my experirience.
ID: 67380 · Report as offensive     Reply Quote
xii5ku

Send message
Joined: 27 Mar 21
Posts: 79
Credit: 78,302,757
RAC: 1,077
Message 67386 - Posted: 6 Jan 2023, 9:39:07 UTC - in response to Message 67380.  

There are two types of network access by existing vboxwrapper based applications:

– LHC@home's applications perform massive I/O through their cluster filesystem, cvmfs. That's common between their virtualized and native application. It would require a drastic change of the client-server architecture of LHC@home to move this network I/O into BOINC, hence it will obviously never happen.

– Cosmology@home's and (I think) Rosetta@home's virtualized applications only use network access in order to look up (and if applicable, side-load) some sort of updates. This I/O is very lightweight in comparison to LHC@home's. But: 1.) Same as LHC@home's, it circumvents BOINC's mechanisms and policies. 2.) It causes a period during startup of the application during which the host CPUs are idling. 3.) It's an IME fragile process which occasionally causes these applications to get stuck in this stage, resulting in never-ending tasks without CPU usage.
ID: 67386 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 943
Credit: 34,325,575
RAC: 11,402
Message 67389 - Posted: 6 Jan 2023, 10:12:30 UTC - in response to Message 67386.  

LHC were very proud of what they'd managed to achieve in integrating BOINC and CERN's disparate requirements through VMs. I remember listening to Ben Segal's account of what was then a work-in-progress at the 2010 BOINC workshop in London.

Ben's presentation slides are still available online, and give a flavour of the constraints they were working under. Perhaps the key slide says:

Summary of the basic approach

Solve client application porting problems using VM’s
Use “VMwrapper” to link VM’s to BOINC core client & server
Provide a host <-> guest-VM communication/control layer

.. and in addition ..

Solve the image size problem and physics job production interfaces using the CernVM project together with the Co-Pilot adapter system.
Access the slides via https://boinc.berkeley.edu/trac/wiki/WorkShop10#Schedule

But even as he was talking, it was clear that certain problems hadn't been overcome. In particular, that "host <-> guest-VM communication/control layer" couldn't signal back to BOINC that the VM was idle and its compute resources could be released for another project to use. I think the fault there possibly lay in BOINC: it didn't then, and probably still can't now, dynamically adjust for hosts with variable resource availability.
ID: 67389 · Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 5 Aug 04
Posts: 171
Credit: 10,329,626
RAC: 25,083
Message 67467 - Posted: 9 Jan 2023, 19:20:13 UTC
Last modified: 9 Jan 2023, 19:24:25 UTC

Okay, let me tell you my experience with vBox here.

Before, please keep in mind, I'm a totally Windows-Guy, I never had something to do with Unix / Linux .

When LHC@Home started first with vBox and Theory I started using vBox and lets say it worked, no big problems.

When Atlas started running vBox I have immediatly started to run und support it. The mess was, problems raised more and more, so I wrote a checklist for the user how to setup a working Atlas@Home system. Here you can take a look at Version 3 (!) of this checklist: https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161&postid=29359#29359

In this phase, the most common problems have been missing settings in BIOS (VT-X and other), not enough RAM in the Box and the computer getting sluggish if you run too much vBox-Tasks, still not using all cores.

For many many years I habe run this setting in Windows 10 and it was okay.It worked in an acceptable manner.

But with upcoming newer releases of vBox the whole system really got unmanageable. I got more and more tasks with the postponed Status. We never could figure out what was the real reason for this, but the postponed tasks are dead, wasted crunching time.

So, I, the only Windows-Guy, never having done something with linux, has setup one VM (still with vBox) with Ubuntu 20.04 (with a lot of help from colleagues) that runs Atlas-Native. This worked like a charme and meanwhile I have one VMWare-VM (Ubuntu 22.04) on every WIndows-PC. This VM uses as many Cores and BOINC-Projekts as i want and all works fine, The HOSTs are not sluggish and I have no problems. Running hundreds of Atlas-Native without any problem.

The big problem with the way, LCH@Home has realizied the vBox-Struktur is, that they run more than one VM at the same time. This costs a lot of Memory and CPU-Cykles and when the box has enough CPU-Stress, there are happening some small timeouts that make the VM unmanageable => postponed .

I have run Rosetta and more Projects that use vBox-VMs, but none of them was really flawless. I have lost 1/3 of the VMs to postponed.

So, I won't run any project that forces me to run several VMs of vBox.

Perhaps is it possible to build a setup for WIndows, that you need only one VM (like I do now) and inside you run several tasks as if it is a real linux-System. I run 3x 4-CoreTasks Atlas-Native in most of my VMs


Supporting BOINC, a great concept !
ID: 67467 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 247
Credit: 12,041,675
RAC: 20,255
Message 67494 - Posted: 10 Jan 2023, 12:19:33 UTC - in response to Message 67467.  
Last modified: 10 Jan 2023, 12:21:18 UTC

Yeti, just like the avatar, is a legend, when it comes to VBox and BOINC, just check out those troubleshooting guides! :-)

Yeti,
I know you have extensive experience with VBox, but have you ever thought about checking out WSL2? It's part of Windows and is a very lightweight way to virtualize Linux on Windows. I use it extensively for BOINC projects that work better on Linux or require Linux. It recently became Generally Available on Windows (10 & 11) and Microsoft streamlined the installation process. Even systemd is available on it now, which I'm just starting to explore. Graphics interface is also available but I've never tried it as I control both Windows and Linux BOINC clients via the Windows BOINC manager. I use it for LHC too to run native ATLAS & Theory, with Squid proxy. It has its quirks on LHC. To run Theory native you need to enable vsyscall emulation in the wslconfig file. It's easy to do and having that enabled doesn't break anything else that's running concurrently. ATLAS can only run single threaded on WSL2, still not sure why, so you may not like that. Hoping that with systemd ATLAS will run multithreaded and vsyscall emulation won't be needed but still haven't tested it yet.
ID: 67494 · Report as offensive     Reply Quote
Yeti
Avatar

Send message
Joined: 5 Aug 04
Posts: 171
Credit: 10,329,626
RAC: 25,083
Message 67500 - Posted: 10 Jan 2023, 13:35:42 UTC - in response to Message 67494.  

Andrey,

for my servers I switched from an early Hyper-V to VMWare and until now, my servers all are running as Guests under VMWare. So, I'm used and experienced with VMWare and switched from vBox to VMWare-Workstation, having the possibility to move a VM from a client to Server and backwards.

So, I never tried anything with WSP(1/2) and if I must tell the truth, I don't like to learn this. This would cost me a lot of time again.


Supporting BOINC, a great concept !
ID: 67500 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 809
Credit: 13,612,636
RAC: 5,507
Message 67576 - Posted: 11 Jan 2023, 21:07:16 UTC - in response to Message 67494.  

Yeti, just like the avatar, is a legend, when it comes to VBox and BOINC, just check out those troubleshooting guides! :-)
What troubleshooting guides? Can someone direct me? Might be useful for development purposes.
ID: 67576 · Report as offensive     Reply Quote
gemini8

Send message
Joined: 4 Dec 15
Posts: 52
Credit: 2,182,959
RAC: 836
Message 67579 - Posted: 11 Jan 2023, 21:22:12 UTC - in response to Message 67576.  

Yeti, just like the avatar, is a legend, when it comes to VBox and BOINC, just check out those troubleshooting guides! :-)

What troubleshooting guides? Can someone direct me? Might be useful for development purposes.

The checklist mentioned in this posting.
- - - - - - - - - -
Greetings, Jens
ID: 67579 · Report as offensive     Reply Quote
[SG]Felix

Send message
Joined: 4 Oct 15
Posts: 34
Credit: 9,069,332
RAC: 14,637
Message 67857 - Posted: 18 Jan 2023, 18:05:58 UTC

For me, VBox works on every project.

But there is one problem, which is not boinc specific, but VBox related. If I want to do VBox Boinc WUs, i have to start the Client with Admin rights. Also, if I want to start a regular Virtual Machine, I have to do it with Admin rights. If i don't, i get the following VBox error:

Critical error

COM-Object for VirtualBox couldn't be created

Errorcode: 
E_ACCESSDENIED (0x80070005)
Komponente: 
VirtualBoxClientWrap


But with Admin rights, everythink with VBox works.

At Cosmology, they use an old version of the vbox wrapper, but with manually changing it to a newer one, i can get a Error Rate of <1%, instead of if I recall correctly about 60-70%.

So if the project Admins keep it up to date, I do not have problems, at least where boinc is the reason for.

Greets
Felix
ID: 67857 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 52,932,477
RAC: 8,823
Message 67858 - Posted: 18 Jan 2023, 18:29:28 UTC
Last modified: 18 Jan 2023, 18:31:48 UTC

Thank you [SG]Felix.

I think the more people that come forward with issues, and/or possible solutions or observations, in the end all of this information will be useful.

And hopefully you will also get a solution, which can be documented, to help others at a later date.
ID: 67858 · Report as offensive     Reply Quote
xii5ku

Send message
Joined: 27 Mar 21
Posts: 79
Credit: 78,302,757
RAC: 1,077
Message 67931 - Posted: 21 Jan 2023, 12:08:18 UTC - in response to Message 67857.  

SG Felix wrote:
If I want to do VBox Boinc WUs, i have to start the Client with Admin rights. Also, if I want to start a regular Virtual Machine, I have to do it with Admin rights. If i don't, i get the following VBox error:

Critical error

COM-Object for VirtualBox couldn't be created

Errorcode: 
E_ACCESSDENIED (0x80070005)
Komponente: 
VirtualBoxClientWrap
Check with "id" for your own user ID and with "id boinc" for the boinc user ID whether or not they are member of the vboxusers group.
If they are not, add them to the group:
sudo usermod -a -G vboxusers $USER
sudo usermod -a -G vboxusers boinc
To test if this solved your problem for your own user, 1. either simply open a new terminal with a login shell, or log out and back in entirely, 2. then try starting a VM without elevated privileges from the new login.

To test if this solved your problem for boinc, 1. shutdown and restart the client, 2. try starting a vbox based task with the client running normally with "boinc" user ID.
ID: 67931 · Report as offensive     Reply Quote
[SG]Felix

Send message
Joined: 4 Oct 15
Posts: 34
Credit: 9,069,332
RAC: 14,637
Message 67948 - Posted: 21 Jan 2023, 20:33:45 UTC

Thanks xii5ku, I should have mentioned, windows 10 is my main System, on which VBox runs :)
So no sudo usermod :)

Greets
Felix
ID: 67948 · Report as offensive     Reply Quote
xii5ku

Send message
Joined: 27 Mar 21
Posts: 79
Credit: 78,302,757
RAC: 1,077
Message 67954 - Posted: 21 Jan 2023, 22:09:15 UTC - in response to Message 67948.  
Last modified: 21 Jan 2023, 22:12:35 UTC

SG Felix wrote:
windows 10 is my main System, on which VBox runs :)
So no sudo usermod :)
Hm, not sure then. (Last time I used VBox on Windows myself was a while ago on Win 7 Pro.) According to a superficial web search, uninstalling + reinstalling VBox and running the installer as admin while doing so might help. Or overwriting the contents of C:\Users\%USERNAME%\.VirtualBox\VirtualBox.xml by that of VirtualBox.xml-prev perhaps. Or a reset of the access permissions of the .VirtualBox folder and everything in it.
ID: 67954 · Report as offensive     Reply Quote
Purest Green

Send message
Joined: 22 Aug 05
Posts: 2
Credit: 1,522,278
RAC: 1,702
Message 68667 - Posted: 18 Apr 2023, 0:00:48 UTC

Issues with LHC tasks for very many days including a previous installation, but now resolved. Don't know if helpful but, since you ask the question -
2023-03-24: new motherboard (and hence new UEFI BIOS)
LHC tasks e.g. ATLAS & Theory Simulation 300.07 persistently terminated after 17 to 20 seconds.
Yeti's checklist was followed:
BIOS amended to permit hardware virtualisation; Leomoon CPU-V confirmed hardware virtualisation was supported and enabled. LHC tasks persistently terminated as before...
Hyper-V not enabled, Docker not installed
Ryzen 7 3700X; 32GB RAM; 70GB free disc space + 43GB reported "available to BOINC"
Windows 10 Pro v 22H2
BOINC Manager v. 7.16.11, wxWidgets version 3.0.1
VirtualBox 7.0.6
VirtualBox Extension Pack 7.0.6 installed 2023-04-13
(Note BOINC program and data are running on different drives.)
No apparent anti-virus conflicts advised... but LHC tasks persistently terminated.
Was unsure how to set the ports options advised in the checklist so took the 'nuclear' option of simple uninstall/reinstall of VB and BOINC;
LHC Theory Simulation 300.07 tasks, confirmed to be using VB, are STILL RUNNING now after many minutes in (currently on VirtualBox v 6.1.12, no Extension Pack downloaded), though one task stated 'Ready to report' after just 15 mins, while others continued to run to various times including 1hr 29 mins. One now has an estimated remaining time of 9 days, but the basic termination fault seems now to have been remedied by the reinstall.
ID: 68667 · Report as offensive     Reply Quote
Purest Green

Send message
Joined: 22 Aug 05
Posts: 2
Credit: 1,522,278
RAC: 1,702
Message 68741 - Posted: 15 May 2023, 21:20:34 UTC - in response to Message 68667.  

Issues with LHC tasks for very many days including a previous installation, but now resolved. Don't know if helpful but, since you ask the question -
2023-03-24: new motherboard (and hence new UEFI BIOS)
LHC tasks e.g. ATLAS & Theory Simulation 300.07 persistently terminated after 17 to 20 seconds.
Yeti's checklist was followed:
BIOS amended to permit hardware virtualisation; Leomoon CPU-V confirmed hardware virtualisation was supported and enabled. LHC tasks persistently terminated as before...
Hyper-V not enabled, Docker not installed
Ryzen 7 3700X; 32GB RAM; 70GB free disc space + 43GB reported "available to BOINC"
Windows 10 Pro v 22H2
BOINC Manager v. 7.16.11, wxWidgets version 3.0.1
VirtualBox 7.0.6
VirtualBox Extension Pack 7.0.6 installed 2023-04-13
(Note BOINC program and data are running on different drives.)
No apparent anti-virus conflicts advised... but LHC tasks persistently terminated.
Was unsure how to set the ports options advised in the checklist so took the 'nuclear' option of simple uninstall/reinstall of VB and BOINC;
LHC Theory Simulation 300.07 tasks, confirmed to be using VB, are STILL RUNNING now after many minutes in (currently on VirtualBox v 6.1.12, no Extension Pack downloaded), though one task stated 'Ready to report' after just 15 mins, while others continued to run to various times including 1hr 29 mins. One now has an estimated remaining time of 9 days, but the basic termination fault seems now to have been remedied by the reinstall.


Update:
After the PC being turned off for 2 weeks and following a Windows update, the same fault recurred - that is cessation of LHC computing after a few seconds.
Surprisingly, Leomoon CPU-V reported that hardware virtualisation was now NEITHER supported NOR enabled.
BIOS settings showed that AMD-v was, in fact, still enabled.
For reasons I cannot remember I changed Windows Security's Device Security's Core Isolation from Memory Integrity ON to OFF, which required a restart.
Leomoon CPU-V confirmed hardware virtualisation was supported and enabled. LHC now runs.
Experimentally, turned Core Isolation from Memory Integrity OFF to ON. LHC continues to run, so far for about 30 mins.
ID: 68741 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4347
Credit: 16,541,921
RAC: 6,087
Message 69910 - Posted: 17 Oct 2023, 9:18:59 UTC

Stupidly, I didn't make a note of it. Recently installed tiny10 (a minimal windows 10 version) under VB. I already had Ubuntu running as guest under ubuntu host. Install kept failing till I found the right thing to change in bios (On my motherboard it had a different name from that suggested when I looked up the issue on a websearch.) Anyway tiny10 can run on as little as 2GB RAM. I don't starve it as I want to run tasks in the windows version of BOINC! What I do find is that compared to running tasks from the same batch using WINE there is something like a 20% performance hit.

My next job will be to get a new task on it, then copy the relevant folder from one installation to the other and check whether or not there is any difference in the results of the task under WINE and that in the VB. I do know that tasks that crash under a native Windows installation with the sig11 fault more often than not succeed when WINE is used.
ID: 69910 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : If you have used VirtualBox for BOINC and have had issues, please can you share these?

©2024 climateprediction.net