uVirt: how to pass raw disks or pcidevices?

I tried multiple aspects now, but I am stuck:

In /etc/config/virt I added:

        list disks '/vm/MYIMG.qcow2 /dev/sda'

did not work, then I tried:

        list disks '/vm/MYIMG.qcow2'
        list disks '/dev/sda'

Did not work either.

Then I double checked muvirt/files/muvirt.init · master · traversetech / muvirt-feed · GitLab and noticed that there is a option for pcidevice to pass. My idea: pass the entire SATA controller:

root@muvirt:/# lspci
0000:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev 10)
0001:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev 10)
0001:01:00.0 PCI bridge: Pericom Semiconductor Device b304 (rev 01)
0001:02:01.0 PCI bridge: Pericom Semiconductor Device b304 (rev 01)
0001:02:02.0 PCI bridge: Pericom Semiconductor Device b304 (rev 01)
0001:04:00.0 Unclassified device [0002]: MEDIATEK Corp. MT7915E 802.11ax PCI Express Wireless Network Adapter
0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev 10)
0002:01:00.0 SATA controller: JMicron Technology Corp. JMB58x AHCI SATA controller (rev ff)

Hence, added to /etc/config/virt:

        option pcidevice '0002:01:00.0'

When starting the VM again, it looks as if the kernel crashes:

root@muvirt:/# /etc/init.d/muvirt start vm
Starting VM vm
Disk 0: /vm/vm.qcow2
Network 0: lan
        MAC: 52:54:00:93:25:f9
root@muvirt:/# [ 1416.789560] br-lan: port 2(tap0) entered blocking state
[ 1416.794798] br-lan: port 2(tap0) entered disabled state
[ 1416.800238] device tap0 entered promiscuous mode
[ 1416.805263] br-lan: port 2(tap0) entered blocking state
[ 1416.810504] br-lan: port 2(tap0) entered listening state
[ 1418.684658] qemu-system-aar invoked oom-killer: gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
[ 1418.695887] CPU: 1 PID: 8706 Comm: qemu-system-aar Not tainted 5.10.46 #0
[ 1418.702669] Hardware name: traverse ten64/ten64, BIOS 2020.07-rc1-gb47b96d4 07/07/2021
[ 1418.710581] Call trace:
[ 1418.713024]  dump_backtrace+0x0/0x1b0
[ 1418.716681]  show_stack+0x18/0x30
[ 1418.719990]  dump_stack+0xdc/0x11c
[ 1418.723383]  dump_header+0x44/0x184
[ 1418.726867]  oom_kill_process+0x1d0/0x1d8
[ 1418.730869]  out_of_memory+0x1bc/0x560
[ 1418.734611]  __alloc_pages_slowpath.constprop.126+0x71c/0xa10
[ 1418.740350]  __alloc_pages_nodemask+0x25c/0x2a0
[ 1418.744876]  alloc_pages_vma+0x8c/0x220
[ 1418.748706]  handle_mm_fault+0x7b0/0x1048
[ 1418.752709]  __get_user_pages+0x1e4/0x388
[ 1418.756711]  __gup_longterm_locked+0x8c/0x4a8
[ 1418.761061]  __get_user_pages_remote+0x48/0x2b0
[ 1418.765584]  pin_user_pages_remote+0x18/0x30
[ 1418.769851]  0xffff800008cb9f50
[ 1418.772984]  0xffff800008cba238
[ 1418.776118]  0xffff800008cbd2d0
[ 1418.779254]  vfio_group_set_kvm+0xcc/0x590 [vfio]
[ 1418.783954]  __arm64_sys_ioctl+0x1cc/0x1120
[ 1418.788131]  do_el0_svc+0x80/0xe0
[ 1418.791440]  el0_svc+0x18/0x28
[ 1418.794486]  el0_sync_handler+0x90/0xc8
[ 1418.798316]  el0_sync+0x164/0x180
[ 1418.801759] Mem-Info:
[ 1418.804031] active_anon:3097 inactive_anon:1166263 isolated_anon:0
[ 1418.804031]  active_file:95 inactive_file:207 isolated_file:0
[ 1418.804031]  unevictable:0 dirty:0 writeback:0
[ 1418.804031]  slab_reclaimable:1524 slab_unreclaimable:9242
[ 1418.804031]  mapped:491 shmem:970 pagetables:2526 bounce:0
[ 1418.804031]  free:34204 free_pcp:577 free_cma:1000
[ 1418.821353] br-lan: port 2(tap0) entered learning state
[ 1418.836103] Node 0 active_anon:12388kB inactive_anon:4665052kB active_file:380kB inactive_file:1120kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2256kB dirty:0kB writeback:0kB shmem:3880kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 4562944kB writeback_tmp:0kB kernel_stack:3056kB all_unreclaimable? yes
[ 1418.869820] Node 0 DMA free:120028kB min:996kB low:2956kB high:4916kB reserved_highatomic:0KB active_anon:0kB inactive_anon:1825512kB active_file:112kB inactive_file:2304kB unevictable:0kB writepending:0kB present:2029568kB managed:1963964kB mlocked:0kB pagetables:1952kB bounce:0kB free_pcp:796kB local_pcp:244kB free_cma:4000kB
[ 1418.898853] lowmem_reserve[]: 0 0 29570 29570
[ 1418.903379] Node 0 Normal free:15276kB min:15384kB low:45664kB high:75944kB reserved_highatomic:0KB active_anon:12300kB inactive_anon:2839156kB active_file:924kB inactive_file:260kB unevictable:0kB writepending:0kB present:30932992kB managed:30281508kB mlocked:0kB pagetables:8152kB bounce:0kB free_pcp:1612kB local_pcp:164kB free_cma:0kB
[ 1418.933192] lowmem_reserve[]: 0 0 0 0
[ 1418.937017] Node 0 DMA: 0*4kB 3*8kB (UC) 2*16kB (UC) 4*32kB (C) 6*64kB (UMC) 3*128kB (U) 2*256kB (UM) 4*512kB (UMC) 2*1024kB (MC) 2*2048kB (MC) 27*4096kB (M) = 120248kB
[ 1418.952095] Node 0 Normal: 1027*4kB (UME) 400*8kB (UME) 133*16kB (UE) 35*32kB (UE) 19*64kB (UE) 5*128kB (ME) 6*256kB (UME) 3*512kB (UM) 1*1024kB (M) 0*2048kB 0*4096kB = 16508kB
[ 1418.968030] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 1418.976807] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB
[ 1418.985410] Node 0 hugepages_total=13340 hugepages_free=13340 hugepages_surp=0 hugepages_size=2048kB
[ 1418.994621] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB
[ 1419.002964] 1693 total pagecache pages
[ 1419.006790] 0 pages in swap cache
[ 1419.010182] Swap cache stats: add 0, delete 0, find 0/0
[ 1419.015484] Free swap  = 0kB
[ 1419.018441] Total swap = 0kB
[ 1419.021397] 8240640 pages RAM
[ 1419.024435] 0 pages HighMem/MovableOnly
[ 1419.028269] 179272 pages reserved
[ 1419.031581] 4096 pages cma reserved
[ 1419.035147] Tasks state (memory values in pages):
[ 1419.039929] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[ 1419.048647] [   1821]    81  1821      307       12    49152        0             0 ubusd
[ 1419.056987] [   1832]     0  1832      327       15    40960        0             0 ash
[ 1419.064998] [   1914]     0  1914      242       11    32768        0             0 urngd
[ 1419.073258] [   2837]   514  2837      301       96    36864        0             0 logd
[ 1419.081434] [   2933]     0  2933      538       49    45056        0             0 rpcd
[ 1419.089608] [   3044]     0  3044      365       10    36864        0             0 mdadm
[ 1419.097870] [   3142]     0  3142      261       10    36864        0             0 dropbear
[ 1419.106311] [   3189]     0  3189      608       14    40960        0             0 hostapd
[ 1419.114748] [   3192]   101  3192     1378      113    53248        0             0 hostapd
[ 1419.123183] [   3250]     0  3250      456       54    40960        0             0 netifd
[ 1419.131531] [   3335]     0  3335      326       13    36864        0             0 crond
[ 1419.139792] [   4116]     0  4116      392       23    40960        0             0 uhttpd
[ 1419.148141] [   4176]     0  4176      435       32    40960        0             0 dbus-daemon
[ 1419.156922] [   4335]     0  4335      361       24    40960        0             0 blockd
[ 1419.165270] [   4528]     0  4528     1731      209    45056        0             0 nginx
[ 1419.173531] [   4558]     0  4558     1786      259    45056        0             0 nginx
[ 1419.181791] [   4559]     0  4559     1786      259    45056        0             0 nginx
[ 1419.190052] [   4560]     0  4560     1786      259    45056        0             0 nginx
[ 1419.198314] [   4561]     0  4561     1786      259    45056        0             0 nginx
[ 1419.206575] [   4562]     0  4562     1786      259    45056        0             0 nginx
[ 1419.214837] [   4563]     0  4563     1786      259    45056        0             0 nginx
[ 1419.223099] [   4565]     0  4565     1786      258    45056        0             0 nginx
[ 1419.231279] [   4566]     0  4566     1786      259    45056        0             0 nginx
[ 1419.239540] [   5117]     0  5117      563       46    40960        0             0 starter
[ 1419.247974] [   5154]     0  5154     1953      149    49152        0             0 charon
[ 1419.256323] [   5576]     0  5576      608       21    36864        0             0 muvirt-console
[ 1419.265287] [   5579]     0  5579      325      138    32768        0             0 sh
[ 1419.273290] [   5581]     0  5581      325      138    32768        0             0 sleep
[ 1419.281552] [   5643]     0  5643      608       25    40960        0             0 ntpd
[ 1419.289727] [   5647]   123  5647      325       14    36864        0             0 ntpd
[ 1419.297902] [   5792]     0  5792     1408       96    49152        0             0 ttyd
[ 1419.306081] [   6229]     0  6229     4212      399    69632        0             0 ModemManager
[ 1419.314951] [   6692]     0  6692      251       16    36864        0             0 odhcp6c
[ 1419.323390] [   8706]     0  8706  7909338  1166141  9445376        0             0 qemu-system-aar
[ 1419.332437] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=qemu-system-aar,pid=8706,uid=0
[ 1419.345861] Out of memory: Killed process 8706 (qemu-system-aar) total-vm:31637352kB, anon-rss:4664564kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:9224kB oom_score_adj:0
[ 1419.373706] oom_reaper: reaped process 8706 (qemu-system-aar), now anon-rss:0kB, file-rss:32kB, shmem-rss:0kB
[ 1419.772608] br-lan: port 2(tap0) entered disabled state
[ 1419.778167] device tap0 left promiscuous mode
[ 1419.782531] br-lan: port 2(tap0) entered disabled state

So, as there is a segfault, I went back into the muvirt/files/muvirt.init · master · traversetech / muvirt-feed · GitLab, and there in line 256 I see that the script runs a modprobe vfio-pci.

With the latest uVirt supplied, I ran:

root@muvirt:/# cat /etc/openwrt_release 
DISTRIB_ID='muvirt'
DISTRIB_RELEASE='21.02-SNAPSHOT'
DISTRIB_REVISION='r0+16214-8e81d977ac'
DISTRIB_TARGET='arm64/efi'
DISTRIB_ARCH='aarch64_generic'
DISTRIB_DESCRIPTION='muvirt 21.02-SNAPSHOT r0+16214-8e81d977ac'
DISTRIB_TAINTS='no-all busybox'

root@muvirt:/# modprobe vfio-pci
root@muvirt:/# lsmod | grep vfio-pci
root@muvirt:/# 

While I thought the driver is missing, I noticed that just the script is wrong, as vfio uses a underscore:

root@muvirt:/# lsmod | grep vfio
vfio                   28672  3 vfio_pci,vfio_iommu_type1,vfio_fsl_mc
vfio_fsl_mc            20480  0 
vfio_iommu_type1       36864  0 
vfio_pci               57344  0 
vfio_virqfd            16384  1 vfio_pci

Now, I am puzzled why passing the SATA controller does not function.

At the same time: how can I just directly pass through a device?

1 Like

This syntax is correct for making a raw block device available to the VM.
Is it possible something is holding an open file handle to /dev/sda? (like LVM).
That would prevent qemu from opening it. The system log (e.g logread -l 200) might show the error QEMU is encountering.

That looks very weird. What does your /proc/meminfo look like before you start the VM?

Do you have enough hugepages (HugePages_Free * 2MiB) for the VM? VFIO can act funny when the VM doesn’t have hugepages behind it.

I have noted that SATA controllers are known to have stability issues under passthrough but that should not prevent you booting a VM with the SATA controller passed through to it.

I tried today again to passthrough the full pciedevice, and I get a:

root@muvirt:/# /etc/init.d/muvirt start vm
Starting VM vm
Disk 0: /vm/vm.qcow2
Network 0: lan
        MAC: 52:54:00:93:25:f9
root@muvirt:/# [   52.223541] br-lan: port 3(tap0) entered blocking state
[   52.228810] br-lan: port 3(tap0) entered disabled state
[   52.234217] device tap0 entered promiscuous mode
[   52.239190] br-lan: port 3(tap0) entered blocking state
[   52.244439] br-lan: port 3(tap0) entered listening state
[   53.090135] vfio-pci 0002:01:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[   53.099635] vfio-pci 0002:01:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[   53.109204] vfio-pci 0002:01:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[   54.276816] br-lan: port 3(tap0) entered learning state
[   54.366208] br-lan: port 3(tap0) entered disabled state
[   54.371700] device tap0 left promiscuous mode
[   54.376138] br-lan: port 3(tap0) entered disabled state

the config looks as follows:

root@muvirt:/# cat /etc/config/virt 

config muvirt 'system'
        option scratch '/mnt/scratch/'
        option defaultnet 'lan'
        option hugetlb '26680'

config vm 'vm'
        option memory '2048'
        option numprocs '7'
        list disks '/vm/vm.qcow2'
        option pcidevice '0002:01:00.0' 
        list network 'lan'
        option mac '52:54:00:93:25:f9'
        option enable '1'
        option provisioned '1'

Grepping for running qemu, shows that it just died:

root@muvirt:/# ps -w | grep qemu
 8232 root      1308 S    grep qemu

When trying out /dev/sda, I see:

root@muvirt:/# /etc/init.d/muvirt start vm
Starting VM vm
Disk 0: /vm/vm.qcow2
Disk 1: /dev/sda
Network 0: lan
        MAC: 52:54:00:93:25:f9
root@muvirt:/# [   58.963354] br-lan: port 2(tap0) entered blocking state
[   58.968611] br-lan: port 2(tap0) entered disabled state
[   58.974013] device tap0 entered promiscuous mode
[   58.978995] br-lan: port 2(tap0) entered blocking state
[   58.984246] br-lan: port 2(tap0) entered listening state
[   60.994039] br-lan: port 2(tap0) entered learning state
[   63.000245] br-lan: port 2(tap0) entered forwarding state
[   63.005651] br-lan: topology change detected, propagating

with config:

root@muvirt:/# cat /etc/config/virt 

config muvirt 'system'
        option scratch '/mnt/scratch/'
        option defaultnet 'lan'
        option hugetlb '26680'

config vm 'vm'
        option memory '2048'
        option numprocs '7'
        list disks '/vm/vm.qcow2 /dev/sda'
#       option pcidevice '0002:01:00.0' 
        list network 'lan'
        option mac '52:54:00:93:25:f9'
        option enable '1'
        option provisioned '1'

When I grep for running qemu, I see:

root@muvirt:/# ps -w | grep qemu
 7577 root     2345m S    qemu-system-aarch64 --enable-kvm -m 2048 -cpu host -M virt,gic-version=3 -smp 7 -mem-path /tmp/hugetlbfs

so it does run, but somehow it does not respond.

Another question: is it possible to pass UUIDs or PARTUUIDs of the devices as well? I mean passing /dev/sda is not ideal, as it is not a constant name. It could change from one boot to another.

EDIT: adding /dev/sda does indeed work. Somehow my VM obtained a new IP address, hence I couldn’t find it anymore. However, passing the pcidevice i.e. the SATA controller still does not work. Hence, I either would like to pass a UUID for each device or the full pcidevice. Any idea how to fix this?

EDIT2: @mcbridematt I provide you a small MR, that allows to specific UUID=**** as a disk in /etc/config/virt and then uVirt resolves it: Allow uVirt to pass UUID as device, such as: (!8) · Merge requests · traversetech / muvirt-feed · GitLab

Still I wonder why providing the entire pcidevice fails.

Ah ha! Some cards need power state management to be disabled.
Can you try booting muvirt with this on the kernel command line:

vfio-pci.disable_idle_d3=1

You can add this to /boot/grub/grub.cfg to make it permanent. (edit: GRUB is defaulting to a built-in grub.cfg - on my TODO list to fix)

This is supposed to be a default setting but I forgot to add it back in last time I updated OpenWrt in muvirt.

Good idea!

yeah, I tried it today, but indeed changing /boot/grub/grub.cfg in muvirt does not have any effect.

@mcbridematt will you accept my patch? Would be nice if the next muvirt contains it the UUID feature :slight_smile:

@psiegl @mcbridematt hey I’ve found that this is yet required on new kernel to make sata controller available again

Ah ha! Some cards need power state management to be disabled.
Can you try booting muvirt with this on the kernel command line:
vfio-pci.disable_idle_d3=1