uVirt: how to pass raw disks or pcidevices?

I tried multiple aspects now, but I am stuck:

In /etc/config/virt I added:

        list disks '/vm/MYIMG.qcow2 /dev/sda'

did not work, then I tried:

        list disks '/vm/MYIMG.qcow2'
        list disks '/dev/sda'

Did not work either.

Then I double checked muvirt/files/muvirt.init · master · traversetech / muvirt-feed · GitLab and noticed that there is a option for pcidevice to pass. My idea: pass the entire SATA controller:

root@muvirt:/# lspci
0000:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev 10)
0001:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev 10)
0001:01:00.0 PCI bridge: Pericom Semiconductor Device b304 (rev 01)
0001:02:01.0 PCI bridge: Pericom Semiconductor Device b304 (rev 01)
0001:02:02.0 PCI bridge: Pericom Semiconductor Device b304 (rev 01)
0001:04:00.0 Unclassified device [0002]: MEDIATEK Corp. MT7915E 802.11ax PCI Express Wireless Network Adapter
0002:00:00.0 PCI bridge: Freescale Semiconductor Inc Device 80c0 (rev 10)
0002:01:00.0 SATA controller: JMicron Technology Corp. JMB58x AHCI SATA controller (rev ff)

Hence, added to /etc/config/virt:

        option pcidevice '0002:01:00.0'

When starting the VM again, it looks as if the kernel crashes:

root@muvirt:/# /etc/init.d/muvirt start vm
Starting VM vm
Disk 0: /vm/vm.qcow2
Network 0: lan
        MAC: 52:54:00:93:25:f9
root@muvirt:/# [ 1416.789560] br-lan: port 2(tap0) entered blocking state
[ 1416.794798] br-lan: port 2(tap0) entered disabled state
[ 1416.800238] device tap0 entered promiscuous mode
[ 1416.805263] br-lan: port 2(tap0) entered blocking state
[ 1416.810504] br-lan: port 2(tap0) entered listening state
[ 1418.684658] qemu-system-aar invoked oom-killer: gfp_mask=0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
[ 1418.695887] CPU: 1 PID: 8706 Comm: qemu-system-aar Not tainted 5.10.46 #0
[ 1418.702669] Hardware name: traverse ten64/ten64, BIOS 2020.07-rc1-gb47b96d4 07/07/2021
[ 1418.710581] Call trace:
[ 1418.713024]  dump_backtrace+0x0/0x1b0
[ 1418.716681]  show_stack+0x18/0x30
[ 1418.719990]  dump_stack+0xdc/0x11c
[ 1418.723383]  dump_header+0x44/0x184
[ 1418.726867]  oom_kill_process+0x1d0/0x1d8
[ 1418.730869]  out_of_memory+0x1bc/0x560
[ 1418.734611]  __alloc_pages_slowpath.constprop.126+0x71c/0xa10
[ 1418.740350]  __alloc_pages_nodemask+0x25c/0x2a0
[ 1418.744876]  alloc_pages_vma+0x8c/0x220
[ 1418.748706]  handle_mm_fault+0x7b0/0x1048
[ 1418.752709]  __get_user_pages+0x1e4/0x388
[ 1418.756711]  __gup_longterm_locked+0x8c/0x4a8
[ 1418.761061]  __get_user_pages_remote+0x48/0x2b0
[ 1418.765584]  pin_user_pages_remote+0x18/0x30
[ 1418.769851]  0xffff800008cb9f50
[ 1418.772984]  0xffff800008cba238
[ 1418.776118]  0xffff800008cbd2d0
[ 1418.779254]  vfio_group_set_kvm+0xcc/0x590 [vfio]
[ 1418.783954]  __arm64_sys_ioctl+0x1cc/0x1120
[ 1418.788131]  do_el0_svc+0x80/0xe0
[ 1418.791440]  el0_svc+0x18/0x28
[ 1418.794486]  el0_sync_handler+0x90/0xc8
[ 1418.798316]  el0_sync+0x164/0x180
[ 1418.801759] Mem-Info:
[ 1418.804031] active_anon:3097 inactive_anon:1166263 isolated_anon:0
[ 1418.804031]  active_file:95 inactive_file:207 isolated_file:0
[ 1418.804031]  unevictable:0 dirty:0 writeback:0
[ 1418.804031]  slab_reclaimable:1524 slab_unreclaimable:9242
[ 1418.804031]  mapped:491 shmem:970 pagetables:2526 bounce:0
[ 1418.804031]  free:34204 free_pcp:577 free_cma:1000
[ 1418.821353] br-lan: port 2(tap0) entered learning state
[ 1418.836103] Node 0 active_anon:12388kB inactive_anon:4665052kB active_file:380kB inactive_file:1120kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2256kB dirty:0kB writeback:0kB shmem:3880kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 4562944kB writeback_tmp:0kB kernel_stack:3056kB all_unreclaimable? yes
[ 1418.869820] Node 0 DMA free:120028kB min:996kB low:2956kB high:4916kB reserved_highatomic:0KB active_anon:0kB inactive_anon:1825512kB active_file:112kB inactive_file:2304kB unevictable:0kB writepending:0kB present:2029568kB managed:1963964kB mlocked:0kB pagetables:1952kB bounce:0kB free_pcp:796kB local_pcp:244kB free_cma:4000kB
[ 1418.898853] lowmem_reserve[]: 0 0 29570 29570
[ 1418.903379] Node 0 Normal free:15276kB min:15384kB low:45664kB high:75944kB reserved_highatomic:0KB active_anon:12300kB inactive_anon:2839156kB active_file:924kB inactive_file:260kB unevictable:0kB writepending:0kB present:30932992kB managed:30281508kB mlocked:0kB pagetables:8152kB bounce:0kB free_pcp:1612kB local_pcp:164kB free_cma:0kB
[ 1418.933192] lowmem_reserve[]: 0 0 0 0
[ 1418.937017] Node 0 DMA: 0*4kB 3*8kB (UC) 2*16kB (UC) 4*32kB (C) 6*64kB (UMC) 3*128kB (U) 2*256kB (UM) 4*512kB (UMC) 2*1024kB (MC) 2*2048kB (MC) 27*4096kB (M) = 120248kB
[ 1418.952095] Node 0 Normal: 1027*4kB (UME) 400*8kB (UME) 133*16kB (UE) 35*32kB (UE) 19*64kB (UE) 5*128kB (ME) 6*256kB (UME) 3*512kB (UM) 1*1024kB (M) 0*2048kB 0*4096kB = 16508kB
[ 1418.968030] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 1418.976807] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB
[ 1418.985410] Node 0 hugepages_total=13340 hugepages_free=13340 hugepages_surp=0 hugepages_size=2048kB
[ 1418.994621] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB
[ 1419.002964] 1693 total pagecache pages
[ 1419.006790] 0 pages in swap cache
[ 1419.010182] Swap cache stats: add 0, delete 0, find 0/0
[ 1419.015484] Free swap  = 0kB
[ 1419.018441] Total swap = 0kB
[ 1419.021397] 8240640 pages RAM
[ 1419.024435] 0 pages HighMem/MovableOnly
[ 1419.028269] 179272 pages reserved
[ 1419.031581] 4096 pages cma reserved
[ 1419.035147] Tasks state (memory values in pages):
[ 1419.039929] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[ 1419.048647] [   1821]    81  1821      307       12    49152        0             0 ubusd
[ 1419.056987] [   1832]     0  1832      327       15    40960        0             0 ash
[ 1419.064998] [   1914]     0  1914      242       11    32768        0             0 urngd
[ 1419.073258] [   2837]   514  2837      301       96    36864        0             0 logd
[ 1419.081434] [   2933]     0  2933      538       49    45056        0             0 rpcd
[ 1419.089608] [   3044]     0  3044      365       10    36864        0             0 mdadm
[ 1419.097870] [   3142]     0  3142      261       10    36864        0             0 dropbear
[ 1419.106311] [   3189]     0  3189      608       14    40960        0             0 hostapd
[ 1419.114748] [   3192]   101  3192     1378      113    53248        0             0 hostapd
[ 1419.123183] [   3250]     0  3250      456       54    40960        0             0 netifd
[ 1419.131531] [   3335]     0  3335      326       13    36864        0             0 crond
[ 1419.139792] [   4116]     0  4116      392       23    40960        0             0 uhttpd
[ 1419.148141] [   4176]     0  4176      435       32    40960        0             0 dbus-daemon
[ 1419.156922] [   4335]     0  4335      361       24    40960        0             0 blockd
[ 1419.165270] [   4528]     0  4528     1731      209    45056        0             0 nginx
[ 1419.173531] [   4558]     0  4558     1786      259    45056        0             0 nginx
[ 1419.181791] [   4559]     0  4559     1786      259    45056        0             0 nginx
[ 1419.190052] [   4560]     0  4560     1786      259    45056        0             0 nginx
[ 1419.198314] [   4561]     0  4561     1786      259    45056        0             0 nginx
[ 1419.206575] [   4562]     0  4562     1786      259    45056        0             0 nginx
[ 1419.214837] [   4563]     0  4563     1786      259    45056        0             0 nginx
[ 1419.223099] [   4565]     0  4565     1786      258    45056        0             0 nginx
[ 1419.231279] [   4566]     0  4566     1786      259    45056        0             0 nginx
[ 1419.239540] [   5117]     0  5117      563       46    40960        0             0 starter
[ 1419.247974] [   5154]     0  5154     1953      149    49152        0             0 charon
[ 1419.256323] [   5576]     0  5576      608       21    36864        0             0 muvirt-console
[ 1419.265287] [   5579]     0  5579      325      138    32768        0             0 sh
[ 1419.273290] [   5581]     0  5581      325      138    32768        0             0 sleep
[ 1419.281552] [   5643]     0  5643      608       25    40960        0             0 ntpd
[ 1419.289727] [   5647]   123  5647      325       14    36864        0             0 ntpd
[ 1419.297902] [   5792]     0  5792     1408       96    49152        0             0 ttyd
[ 1419.306081] [   6229]     0  6229     4212      399    69632        0             0 ModemManager
[ 1419.314951] [   6692]     0  6692      251       16    36864        0             0 odhcp6c
[ 1419.323390] [   8706]     0  8706  7909338  1166141  9445376        0             0 qemu-system-aar
[ 1419.332437] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=qemu-system-aar,pid=8706,uid=0
[ 1419.345861] Out of memory: Killed process 8706 (qemu-system-aar) total-vm:31637352kB, anon-rss:4664564kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:9224kB oom_score_adj:0
[ 1419.373706] oom_reaper: reaped process 8706 (qemu-system-aar), now anon-rss:0kB, file-rss:32kB, shmem-rss:0kB
[ 1419.772608] br-lan: port 2(tap0) entered disabled state
[ 1419.778167] device tap0 left promiscuous mode
[ 1419.782531] br-lan: port 2(tap0) entered disabled state

So, as there is a segfault, I went back into the muvirt/files/muvirt.init · master · traversetech / muvirt-feed · GitLab, and there in line 256 I see that the script runs a modprobe vfio-pci.

With the latest uVirt supplied, I ran:

root@muvirt:/# cat /etc/openwrt_release 
DISTRIB_ID='muvirt'
DISTRIB_RELEASE='21.02-SNAPSHOT'
DISTRIB_REVISION='r0+16214-8e81d977ac'
DISTRIB_TARGET='arm64/efi'
DISTRIB_ARCH='aarch64_generic'
DISTRIB_DESCRIPTION='muvirt 21.02-SNAPSHOT r0+16214-8e81d977ac'
DISTRIB_TAINTS='no-all busybox'

root@muvirt:/# modprobe vfio-pci
root@muvirt:/# lsmod | grep vfio-pci
root@muvirt:/# 

While I thought the driver is missing, I noticed that just the script is wrong, as vfio uses a underscore:

root@muvirt:/# lsmod | grep vfio
vfio                   28672  3 vfio_pci,vfio_iommu_type1,vfio_fsl_mc
vfio_fsl_mc            20480  0 
vfio_iommu_type1       36864  0 
vfio_pci               57344  0 
vfio_virqfd            16384  1 vfio_pci

Now, I am puzzled why passing the SATA controller does not function.

At the same time: how can I just directly pass through a device?

1 Like

This syntax is correct for making a raw block device available to the VM.
Is it possible something is holding an open file handle to /dev/sda? (like LVM).
That would prevent qemu from opening it. The system log (e.g logread -l 200) might show the error QEMU is encountering.

That looks very weird. What does your /proc/meminfo look like before you start the VM?

Do you have enough hugepages (HugePages_Free * 2MiB) for the VM? VFIO can act funny when the VM doesn’t have hugepages behind it.

I have noted that SATA controllers are known to have stability issues under passthrough but that should not prevent you booting a VM with the SATA controller passed through to it.

I tried today again to passthrough the full pciedevice, and I get a:

root@muvirt:/# /etc/init.d/muvirt start vm
Starting VM vm
Disk 0: /vm/vm.qcow2
Network 0: lan
        MAC: 52:54:00:93:25:f9
root@muvirt:/# [   52.223541] br-lan: port 3(tap0) entered blocking state
[   52.228810] br-lan: port 3(tap0) entered disabled state
[   52.234217] device tap0 entered promiscuous mode
[   52.239190] br-lan: port 3(tap0) entered blocking state
[   52.244439] br-lan: port 3(tap0) entered listening state
[   53.090135] vfio-pci 0002:01:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[   53.099635] vfio-pci 0002:01:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[   53.109204] vfio-pci 0002:01:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[   54.276816] br-lan: port 3(tap0) entered learning state
[   54.366208] br-lan: port 3(tap0) entered disabled state
[   54.371700] device tap0 left promiscuous mode
[   54.376138] br-lan: port 3(tap0) entered disabled state

the config looks as follows:

root@muvirt:/# cat /etc/config/virt 

config muvirt 'system'
        option scratch '/mnt/scratch/'
        option defaultnet 'lan'
        option hugetlb '26680'

config vm 'vm'
        option memory '2048'
        option numprocs '7'
        list disks '/vm/vm.qcow2'
        option pcidevice '0002:01:00.0' 
        list network 'lan'
        option mac '52:54:00:93:25:f9'
        option enable '1'
        option provisioned '1'

Grepping for running qemu, shows that it just died:

root@muvirt:/# ps -w | grep qemu
 8232 root      1308 S    grep qemu

When trying out /dev/sda, I see:

root@muvirt:/# /etc/init.d/muvirt start vm
Starting VM vm
Disk 0: /vm/vm.qcow2
Disk 1: /dev/sda
Network 0: lan
        MAC: 52:54:00:93:25:f9
root@muvirt:/# [   58.963354] br-lan: port 2(tap0) entered blocking state
[   58.968611] br-lan: port 2(tap0) entered disabled state
[   58.974013] device tap0 entered promiscuous mode
[   58.978995] br-lan: port 2(tap0) entered blocking state
[   58.984246] br-lan: port 2(tap0) entered listening state
[   60.994039] br-lan: port 2(tap0) entered learning state
[   63.000245] br-lan: port 2(tap0) entered forwarding state
[   63.005651] br-lan: topology change detected, propagating

with config:

root@muvirt:/# cat /etc/config/virt 

config muvirt 'system'
        option scratch '/mnt/scratch/'
        option defaultnet 'lan'
        option hugetlb '26680'

config vm 'vm'
        option memory '2048'
        option numprocs '7'
        list disks '/vm/vm.qcow2 /dev/sda'
#       option pcidevice '0002:01:00.0' 
        list network 'lan'
        option mac '52:54:00:93:25:f9'
        option enable '1'
        option provisioned '1'

When I grep for running qemu, I see:

root@muvirt:/# ps -w | grep qemu
 7577 root     2345m S    qemu-system-aarch64 --enable-kvm -m 2048 -cpu host -M virt,gic-version=3 -smp 7 -mem-path /tmp/hugetlbfs

so it does run, but somehow it does not respond.

Another question: is it possible to pass UUIDs or PARTUUIDs of the devices as well? I mean passing /dev/sda is not ideal, as it is not a constant name. It could change from one boot to another.

EDIT: adding /dev/sda does indeed work. Somehow my VM obtained a new IP address, hence I couldn’t find it anymore. However, passing the pcidevice i.e. the SATA controller still does not work. Hence, I either would like to pass a UUID for each device or the full pcidevice. Any idea how to fix this?

EDIT2: @mcbridematt I provide you a small MR, that allows to specific UUID=**** as a disk in /etc/config/virt and then uVirt resolves it: Allow uVirt to pass UUID as device, such as: (!8) · Merge requests · traversetech / muvirt-feed · GitLab

Still I wonder why providing the entire pcidevice fails.

Ah ha! Some cards need power state management to be disabled.
Can you try booting muvirt with this on the kernel command line:

vfio-pci.disable_idle_d3=1

You can add this to /boot/grub/grub.cfg to make it permanent. (edit: GRUB is defaulting to a built-in grub.cfg - on my TODO list to fix)

This is supposed to be a default setting but I forgot to add it back in last time I updated OpenWrt in muvirt.

Good idea!

yeah, I tried it today, but indeed changing /boot/grub/grub.cfg in muvirt does not have any effect.

@mcbridematt will you accept my patch? Would be nice if the next muvirt contains it the UUID feature :slight_smile: