Skip to content

Conversation

@PhilippMatthes
Copy link
Member

@PhilippMatthes PhilippMatthes commented Dec 29, 2025

In this pull request we implemented a cortex filtering pipeline for KVM. This pipeline uses the hypervisor CRD as single source of truth to find out on which hypervisors a vm can be scheduled. To complete this implementation, we extended the hypervisor crd in this pull request. The hypervisor crd pull request added additional fields and removed outdated ones, which need to be autodiscovered in the kvm node agent. The following fields are now populated:

Support filtering based on hypervisor type and other capabilities:

  • Export the hypervisor type, architecture, supported devices, supported cpu modes, and supported features

Capacity filtering:

  • Aggregate the allocated and total available capacity and populate the corresponding fields

(Bonus)

  • Add numa cell capacity & allocation information so we can implement numa sensitive initial placement

When done:

  • Test with ssh-forwarded libvirt socket

Note

The scope of this PR is to establish a minimum viable scheduling pipeline in cortex, with the least amount of changes possible. Refactorings of the hypervisor crd spec can follow if needed.

@PhilippMatthes

This comment was marked as resolved.

@PhilippMatthes PhilippMatthes marked this pull request as ready for review January 2, 2026 12:57
PhilippMatthes added a commit to cobaltcore-dev/cortex that referenced this pull request Jan 5, 2026
## Background

For virtual machines spawned on the kvm hypervisor, we want to no longer
use nova and placement as source of truth. Instead, filters should use
the hypervisor crd exposed by the [hypervisor
operator](github.com/cobaltcore-dev/openstack-hypervisor-operator) and
populated by the [node
agent](https://github.com/cobaltcore-dev/kvm-node-agent). This
contribution replaces the implementation of all filters that were
originally ported from nova accordingly. Afterward, we can disable
filters in nova one-by-one, moving the compute placement logic over to
cortex.

> [!TIP]
> You can use the newly added [mirror
tool](93fdcc0)
to mirror hypervisor resources from our compute cluster over to the
local cluster.

## Completion

- [x]
~internal/scheduling/decisions/nova/plugins/filters/filter_compute_capabilities.go~
(REMOVED)
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_capabilities.go
(NEW)
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_correct_az.go
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_external_customer.go
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_has_accelerators.go
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_has_enough_capacity.go
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_has_requested_traits.go
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_host_instructions.go
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_maintenance.go
(NEW)
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_packed_virtqueue.go
- [x]
~internal/scheduling/decisions/nova/plugins/filters/filter_project_aggregates.go~
(REMOVED)
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_allowed_projects.go
(NEW)
- [x]
~internal/scheduling/decisions/nova/plugins/filters/filter_disabled.go~
(REMOVED)
- [x]
internal/scheduling/decisions/nova/plugins/filters/filter_status_conditions.go
(NEW)

## Dependencies

> [!NOTE]
> The scope of this PR is to establish a minimum viable scheduling
pipeline with the current state. Extensive refactorings, for example of
the filter for requested traits, are out of scope.

Hypervisor operator PR:
cobaltcore-dev/openstack-hypervisor-operator#217
KVM node agent PR:
cobaltcore-dev/kvm-node-agent#40
@PhilippMatthes

This comment was marked as outdated.

@PhilippMatthes

This comment was marked as resolved.

@PhilippMatthes
Copy link
Member Author

Refactored and tested:

apiVersion: kvm.cloud.sap/v1
kind: Hypervisor
status:
  allocation:
    cpu: "3"
    memory: 4064Mi
  capabilities:
    cpuArch: x86_64
    # ...
  capacity:
    cpu: "128"
    memory: 1056566456Ki
  cells:
  - allocation:
      cpu: "3"
      memory: 4064Mi
    capacity:
      cpu: "64"
      memory: 528110060Ki
    cellID: 0
  - allocation:
      cpu: "0"
      memory: "0"
    capacity:
      cpu: "64"
      memory: 528456396Ki
    cellID: 1
  # ...
  domainCapabilities:
    arch: x86_64
    hypervisorType: ch
    supportedCpuModes:
    - mode/host-passthrough
    supportedDevices:
    - video
    - video/none
    supportedFeatures: []
  libVirtVersion: 48.0.0 # logic was simplified
  numInstances: 2 # logic was simplified
  # ...

@PhilippMatthes PhilippMatthes requested a review from notandy January 6, 2026 09:54
PhilippMatthes added a commit to cobaltcore-dev/openstack-hypervisor-operator that referenced this pull request Jan 7, 2026
## Background

In [this pull request](cobaltcore-dev/cortex#441) we implemented a cortex filtering pipeline for KVM. This pipeline uses the hypervisor CRD as single source of truth to find out on which hypervisors a vm can be scheduled. To complete this implementation, we need to extend the hypervisor CRD.

## Tasks

Support filtering based on hypervisor type and other capabilities:
- [x] The capabilities struct should be extended to support the hypervisor type. Later, we will probably need to extend this struct further.
- [x] Add fields for supported devices (e.g. video device), cpu modes, and features (for migration filtering) 

Capacity filtering:
- [x] We need a spec + status not only for the size of the host, but also for the currently used capacity. This will be used by cortex to filter out hosts without the required capacity. This scheduling logic can be made more intelligent in the future, by mapping out individual numa cells and including reserved capacity.

Pinned projects:
- [x] Provide a spec to declare pinned projects on this hypervisor.

(Bonus) 
- [x] Add numa cell capacity & allocation information so we can implement numa sensitive initial placement

When finished:
- [x] Ensure backwards compatibility so we can roll this out without any issues

 
## Dependencies

> [!NOTE]
> The scope of this PR is to establish a minimum viable scheduling pipeline in cortex, with the least amount of changes possible. Refactorings of the hypervisor crd spec can follow if needed.

KVM node agent PR: cobaltcore-dev/kvm-node-agent#40
@PhilippMatthes
Copy link
Member Author

^ Rebased on current main

@github-actions
Copy link

github-actions bot commented Jan 7, 2026

Merging this branch changes the coverage (2 decrease, 3 increase)

Impacted Packages Coverage Δ 🤖
github.com/cobaltcore-dev/kvm-node-agent/internal/controller 30.82% (-1.39%) 👎
github.com/cobaltcore-dev/kvm-node-agent/internal/emulator 0.00% (ø)
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt 34.78% (+26.28%) 🌟
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/capabilities 31.25% (-36.25%) 💀 💀 💀
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/domcapabilities 31.25% (+31.25%) 🌟
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/dominfo 21.74% (+21.74%) 🌟

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/cobaltcore-dev/kvm-node-agent/internal/controller/hypervisor_controller.go 47.87% (-1.61%) 94 (-3) 45 (-3) 49 👎
github.com/cobaltcore-dev/kvm-node-agent/internal/emulator/libvirt.go 0.00% (ø) 0 0 0
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/capabilities/client.go 31.25% (-30.04%) 16 (-15) 5 (-14) 11 (-1) 💀 💀 💀
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/capabilities/schema.go 0.00% (-88.89%) 0 (-9) 0 (-8) 0 (-1) 💀 💀 💀 💀 💀
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/domcapabilities/client.go 31.25% (+31.25%) 16 (+16) 5 (+5) 11 (+11) 🌟
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/domcapabilities/example.go 0.00% (ø) 0 0 0
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/domcapabilities/schema.go 0.00% (ø) 0 0 0
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/dominfo/client.go 21.74% (+21.74%) 23 (+23) 5 (+5) 18 (+18) 🌟
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/dominfo/example.go 0.00% (ø) 0 0 0
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/dominfo/schema.go 0.00% (ø) 0 0 0
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/interface.go 0.00% (ø) 0 0 0
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/interface_mock.go 33.33% (+2.08%) 36 (-60) 12 (-18) 24 (-42) 👍
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/libvirt.go 76.55% (+76.55%) 145 (+112) 111 (+111) 34 (+1) 🌟
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/utils.go 72.41% (+72.41%) 29 (+9) 21 (+21) 8 (-12) 🌟

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/cobaltcore-dev/kvm-node-agent/internal/controller/hypervisor_controller_test.go
  • github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/capabilities/client_test.go
  • github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/domcapabilities/client_test.go
  • github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/domcapabilities/schema_test.go
  • github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/dominfo/client_test.go
  • github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/dominfo/schema_test.go
  • github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/libvirt_test.go
  • github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/utils_test.go

@PhilippMatthes PhilippMatthes merged commit dbdf44c into main Jan 7, 2026
6 checks passed
@PhilippMatthes PhilippMatthes deleted the cortex-filtering branch January 7, 2026 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants