KubeVirt (VM Workloads) - Control Plane

Overview

The KubeVirt add-on lets an mk8s cluster run VM workloads (spec.type: vm). It installs the KubeVirt and CDI (Containerized Data Importer) operators and configures them for Control Plane: VM components are scoped to a dedicated VM node pool, the feature gates the platform needs are enabled, and boot-disk imports are wired through CDI. Once the add-on is enabled and at least one VM-capable node is present, the location reports vm in its capabilities and VM workloads can be scheduled there.

Supported providers

Any provider whose nodes can expose hardware virtualization (/dev/kvm) — that means bare-metal nodes or instances with nested virtualization enabled. On AWS, nested virtualization is available on 8th-generation Intel instance types (c8i, m8i, r8i and variants); bare-metal (.metal) instances also work. On generic / BYOK clusters, use bare-metal hosts or VMs that pass through KVM.

Requirements

Setting up a cluster for VMs has four parts:

A node pool with hardware virtualization

VM nodes must be able to run KVM. On AWS, enable nested virtualization on a supported instance type via cpuOptions.nestedVirtualization; elsewhere use bare-metal / KVM-capable hosts.

Label (and ideally taint) the VM node pool

KubeVirt’s components and the VMs themselves are scoped to nodes labelled cpln.io/nodeType: vm. Add that label to the node pool. A matching taint cpln.io/nodeType=vm:NoSchedule is recommended to keep ordinary pods off the VM nodes — KubeVirt’s VM placement tolerates it.

Enable the kubevirt add-on

Add kubevirt to addOns. This installs the KubeVirt + CDI operators and their custom resources.

Provide storage for boot disks

VM boot disks are imported by CDI into a PVC (backed by a volume set), so the cluster needs a working CSI / default StorageClass. If your default StorageClass is block-mode, set scratchSpaceStorageClass to a filesystem-mode class so CDI has scratch space for imports.

The dedicated cpln.io/nodeType: vm node pool is what makes a node VM-capable. Without a labelled node, the KubeVirt components have nowhere to run and no VM workload can schedule.

How to enable

Cluster manifest (AWS example)

A dedicated VM node pool with nested virtualization, the required label and taint, plus the add-on:

YAML

spec:
  provider:
    aws:
      # ...existing provider config...
      nodePools:
        - name: vm
          instanceTypes:
            - c8i.xlarge          # 8th-gen Intel — supports nested virtualization
          cpuOptions:
            nestedVirtualization: true
          labels:
            cpln.io/nodeType: vm
          taints:
            - key: cpln.io/nodeType
              value: vm
              effect: NoSchedule
          minSize: 1
          maxSize: 4
          bootDiskSize: 100
          subnetIds:
            - ${SUBNET_1}
  addOns:
    kubevirt: {}
    nodeLocalDns: {}            # recommended (see DNS note below)

On generic / BYOK clusters, label and taint the node pool the same way and ensure the hosts expose /dev/kvm; nested-virtualization flags are provider-specific.

Console

When creating or editing the cluster, open Add-ons, toggle on KubeVirt, and make sure you have a node pool labelled cpln.io/nodeType: vm on VM-capable instances.

Configuration

Option	Description
`scratchSpaceStorageClass`	Filesystem-mode `StorageClass` CDI uses for import scratch space. Required when the cluster default StorageClass is block-mode; otherwise optional.

YAML

addOns:
  kubevirt:
    scratchSpaceStorageClass: my-filesystem-sc

The add-on also configures the KubeVirt CR automatically — the DataVolumes, BlockVolume, and Sidecar feature gates, and the VM node placement — so you don’t set these yourself.

DNS

VM guests reach in-cluster services through a platform DNS forwarder. The nodeLocalDns add-on (a per-node CoreDNS cache) is recommended alongside KubeVirt for reliable VM DNS. It is not strictly required — VM cloud-init falls back to TCP DNS (options use-vc) — but enabling it avoids edge cases.

Verify

After the add-on reconciles and a VM node is ready:

The cluster’s location reports vm under its workload-type capabilities.
A VM workload targeting this location schedules, imports its boot disk (watch Importing boot disk: NN% in the deployment status), and boots.

Console access

For day-to-day access, use the platform paths documented on the VM workload page — they work the same on your own cluster and require no kubectl:

SSH (Linux): cpln workload connect / cpln workload exec.
RDP (Windows) or any TCP service: cpln port-forward to the guest port, then connect to localhost.

When a guest will not boot far enough to accept SSH or RDP — a kernel panic, a UEFI failure, or a stalled first boot — you need the raw graphical console. On a cluster you operate (mk8s or BYOK) you have direct access, so use KubeVirt’s virtctl against the cluster:

Find the VM instance and its namespace

VM instances are labelled with their workload name and run in the GVC namespace:

kubectl get vmi -A -l cpln/workload=<workload-name>

Open the VNC console

Point virtctl at the instance (use the NAME and NAMESPACE from the previous step). virtctl launches your local VNC viewer:

virtctl vnc <vmi-name> -n <gvc-namespace> --kubeconfig <cluster-kubeconfig>

If you have no local VNC client, run with --proxy-only; virtctl prints a local 127.0.0.1:<port> address you can point any VNC client at.

virtctl must match the cluster’s KubeVirt version, and your kubeconfig user needs access to the virtualmachineinstances/vnc subresource. The serial console (boot log, login banner) is also available as virtctl console <vmi-name> -n <gvc-namespace>.

Troubleshooting

A VM workload never starts and no instance appears

Symptom: The workload deploys but stays not-ready, and kubectl get vmi -A -l cpln/workload=<workload-name> returns nothing — or the VM instance is stuck Pending/Scheduling. The location may not report vm in its capabilities.Cause: No VM-capable node, or KubeVirt’s components have nowhere to run. VMs and the KubeVirt control plane are scoped to nodes labelled cpln.io/nodeType: vm.Fix: Confirm at least one ready, labelled node exists:

kubectl get nodes -l cpln.io/nodeType=vm

If the list is empty, add the label (and the matching taint) to a node pool on KVM-capable instances as shown in How to enable, and verify the kubevirt add-on has reconciled.

Boot disk import is stuck

Symptom: The deployment status sits at Importing boot disk: NN% and never reaches running.Cause: CDI imports the boot disk into a PVC before the VM can boot. Imports stall when there is no working default StorageClass, or when the default StorageClass is block-mode and CDI has no filesystem-mode scratch space.Fix: Ensure a default StorageClass exists. If it is block-mode, set scratchSpaceStorageClass to a filesystem-mode class:

YAML

addOns:
  kubevirt:
    scratchSpaceStorageClass: my-filesystem-sc

Inspect the importer pod for the underlying error:

kubectl logs -n <gvc-namespace> -l app=containerized-data-importer

Windows VM fails to boot (UEFI)

Symptom: The VM never reaches the OS; the VNC/serial console shows a UEFI error such as failed to load Boot0001 ... from PciRoot(0x0).Cause: Control Plane runs these VMs with non-persistent EFI NVRAM, so the firmware boots with an empty variable store. The image must place the Windows boot manager at the UEFI fallback path (\EFI\BOOT\BOOTX64.EFI); without it the firmware has nothing to boot. Enabling Secure Boot causes the same class of failure because it requires persistent NVRAM.Fix: Rebuild the Windows image with the fallback bootloader (see Publishing & converting VM images). Do not set firmware.secureBoot — it is not yet available for this reason.

Raw TCP to another workload over the mesh resets (HTTP works)

Symptom: A VM (often Windows) reaches another workload’s internal endpoint fine over HTTP, but a plain TCP protocol (SQL, RDP, custom TCP) connects only intermittently and otherwise resets.Cause: Istio’s DNS proxy auto-allocates a virtual IP per ServiceEntry that can resolve to a non-serving endpoint; HTTP re-routes by Host header, raw TCP does not.Fix: Enable the MESH_DISABLE_IP_AUTOALLOCATE actuator setting for that location, then reconcile the workload and re-establish the connection.

Can't connect to the VM from my machine

Symptom: SSH or RDP to the VM times out from outside the cluster.Cause: A VM has no public endpoint by default; its ports only govern in-cluster service-mesh traffic.Fix: Reach the guest with cpln port-forward — it tunnels directly to a replica and the target port does not need to be listed in the workload’s ports:

cpln port-forward <workload-name> 13389:3389 --gvc <gvc> --location <location>

Then connect your client to localhost:13389. For public HTTP access, attach a domain instead.

VM can't resolve cpln.local or reach other workloads

Symptom: Inside the guest, *.cpln.local names or cross-location peers fail to resolve.Cause: VM guests resolve in-cluster names through a platform DNS forwarder; large answers can be truncated over UDP on some guests.Fix: Enable the nodeLocalDns add-on (recommended alongside KubeVirt). On Linux guests you can also force TCP DNS by adding options use-vc to /etc/resolv.conf from cloud-init.

Next steps

VM Workloads

Configure and run a VM

Publishing VM Images

Build boot images

Volume Sets

Storage for boot and data disks

mk8s Overview

Managed Kubernetes basics

​Overview

​Supported providers

​Requirements

​How to enable

​Cluster manifest (AWS example)

​Console

​Configuration

​DNS

​Verify

​Console access

​Troubleshooting

​Next steps

VM Workloads

Publishing VM Images

Volume Sets

mk8s Overview

Overview

Supported providers

Requirements

How to enable

Cluster manifest (AWS example)

Console

Configuration

DNS

Verify

Console access

Troubleshooting

Next steps