interrupt: Initial interrupt manager traits #1

sameo · 2019-12-17T17:06:39Z

Signed-off-by: Samuel Ortiz sameo@linux.intel.com

sameo · 2019-12-17T17:08:46Z

andreeaflorescu

Can I add a commit on top of your commit with some other fixes?

andreeaflorescu · 2019-12-19T09:18:34Z

src/interrupt/mod.rs

+/// Data type to store an interrupt source identifier.
+pub type InterruptIndex = u32;
+
+/// Data type to store an interrupt source type.


It is going to be tricky to handle the interrupt type. We might want to explain here why we want it to be a u32 instead of an enum and how do we expect this to be used.

andreeaflorescu · 2019-12-19T09:20:11Z

src/interrupt/mod.rs

+/// Data type to store an interrupt event.
+pub type InterruptEvent = u32;
+
+pub const IRQ_TRIGGERED: InterruptEvent = 0;


What were these 2?

Leftovers, they need to be removed.

andreeaflorescu · 2019-12-19T09:21:50Z

src/interrupt/mod.rs

+    /// # Arguments
+    /// * index: sub-index into the group.
+    /// * config: configuration data for the interrupt source.
+    fn modify(&self, index: InterruptIndex, config: &MsiIrqSourceConfig) -> Result<()>;


nit: rename to update?

andreeaflorescu · 2019-12-19T09:22:05Z

src/interrupt/mod.rs

+    fn enable(&self) -> Result<()>;
+
+    /// Disable the interrupt sources in the group to generate interrupts.
+    fn disable(&self) -> Result<()>;


Can you pls remind me why were these needed?

Because the guest driver can decide to disable and re-enable interrupts.

The guest kernel may choose different interrupt mode. For PCI device, guest kernel may trigger following transitions:

interrupt disabled -> interrupt enabled

legacy interrupt -> PCI MSI interrupt

PCI MSI interrupt -> legacy interrupt

legacy interrupt -> PCI MSIx interrupt

PCI MSIx interrupt -> legacy interrupt.
On mode transitation, disable() is called on the old interrupt group object and enable() is called on the new interrupt group.

src/interrupt/mod.rs

andreeaflorescu · 2019-12-19T09:51:51Z

src/interrupt/mod.rs

+    /// An interrupt notifier allows for external components and processes
+    /// to inject interrupts into a guest, by writing to the file returned
+    /// by this method.
+    fn notifier(&self, index: InterruptIndex) -> Option<File>;


Not sure if File is the best option here. Maybe we can have something that implement the traits needed (write, read?). I don't know which traits are needed actually. I'll experiment with this a bit.

We should also have a default implementation that returns none as this should not be implemented by non vhost-user backends.

Not sure if File is the best option here. Maybe we can have something that implement the traits needed (write, read?). I don't know which traits are needed actually. I'll experiment with this a bit.

It's not really about traits that need to be implemented but about the fact that it can be made into a raw fd, to pass it down to the vhost-user backend process.
A File is something that would work (That's the inner type for Eventfd...) and is the least common denominator I found between Windows and Linux.

jiangliu · 2020-01-09T06:07:04Z

Thanks for putting things together, I will compare this PR with my original implementation.
1)InterruptSourceGroup::{interrupt_type,len,base} are needed because we need to reconstruct an interrupt group object on the destination side for live-upgrading.
2) trigger()/ack() are standard interrupt handling flow for level triggered interrupt. So ack() should be kept .
3) It will make things complex to split out InterruptSourceGroupMsi. Eventually we will need an interrupt controller to manage interrupt mode for devices supporting multiple interrupt modes.

pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
    intr_groups: Vec<Arc<Box<dyn InterruptSourceGroup>>>,
    msi_config: Vec<InterruptSourceConfig>,
}

andreeaflorescu · 2020-01-09T13:10:29Z

Thanks for putting things together, I will compare this PR with my original implementation.
1)InterruptSourceGroup::{interrupt_type,len,base} are needed because we need to reconstruct an interrupt group object on the destination side for live-upgrading.

Is the expectation to use InterrruptSourceGroup directly for live update? I was considering this to be the common interrupt trait for all devices, which should be extended for other device specific implementations.

trigger()/ack() are standard interrupt handling flow for level triggered interrupt. So ack() should be kept .

I looked at the device implementations in crosvm and firecracker and it looks like ack is actually used only in the virtio implementation. So I was thinking that ack should be a function of a trait in vm-virtio as the other devices don't really need it.

Basically, you would have a VirtioInterrupt in vm-virtio something like:

pub enum InterruptError {
    SingalUsedQueueFailed(io::Error),
    SignalConfigurationChangedFailed(io::Error),
}

pub trait VirtioInterrupt {
    // The `vector` parameter is only used with MSIx
    fn signal_used_queue(&self, vector: u16) -> InterruptError;
    fn signal_configuration_change(&self, vector: u16) -> InterruptError;
    fn ack(&self, bit_mask: usize) {
        // With MSIx this is not used so the default here is no-op.
    }
}

It will make things complex to split out InterruptSourceGroupMsi. Eventually we will need an interrupt controller to manage interrupt mode for devices supporting multiple interrupt modes.
pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
    intr_groups: Vec<Arc<Box<dyn InterruptSourceGroup>>>,
    msi_config: Vec<InterruptSourceConfig>,
}

I think the interrupt controller should not be defined in vm-device and should be separate because we have different use cases and I don't think we can merge them into something that works/is good for both Firecracker and Cloud Hypervisor/Dragonball.

jiangliu · 2020-01-09T15:11:35Z

When design the interrupt subsystem, we need to think about both the abstraction and the way to use the interrupt abstraction.
The design goal is simple the device backend drivers. Device backend driver doesn't need to manage any interrupt implementation details, it only needs a way to trigger interrupt for events as below:

self.config
       .trigger(0, VIRTIO_INTR_VRING)
       .map_err(|err| DeviceError::VirtIoFailure(VirtIoError::IOError(err)))

instead of

    fn signal_used_queue(&self) -> result::Result<(), DeviceError> {
        self.interrupt_status
            .fetch_or(VIRTIO_MMIO_INT_VRING as usize, Ordering::SeqCst);
        self.interrupt_evt.write(1).map_err(|e| {
            error!("Failed to signal used queue: {:?}", e);
            METRICS.net.event_fails.inc();
            DeviceError::FailedSignalingUsedQueue(e)
        })
    }

For a device supporting both multi-queues and MSI, the code is almost the same:

self.config
       .trigger(queue_idx, VIRTIO_INTR_VRING)
       .map_err(|e| VmmEpollError::IOError(e))?;

For PCI and MMIO device with MSI, we need a DeviceInterruptManager to manage the backend device's interrupt mode. Current the DeviceInterruptManager is defined as:

#[derive(Copy, Clone, PartialEq, Serialize, Deserialize)]
pub enum DeviceInterruptMode {
    Disabled = 0,
    LegacyIrq = 1,
    GenericMsiIrq = 2,
    PciMsiIrq = 3,
    PciMsixIrq = 4,
}

/// A struct to manage interrupts and interrupt modes for a device.
///
/// The interrupt manager may support multiple working mode. For example, an interrupt manager
/// for a PCI device may work in legacy mode, PCI MSI mode or PCI MSIx mode. Under certain
/// conditions, the interrupt manager may switch between interrupt working modes. To simplify
/// implementation, switching working mode is only supported at configuration stage and will be
/// disabled at runtime stage. The DeviceInterruptManager::enable() switches the interrupt manager
/// from configuration stage into runtime stage. And DeviceInterruptManager::disable() switches
/// from runtime stage back to configuration stage.
#[vmm_serde::export_as_pub()]
pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
    intr_groups: Vec<Arc<Box<dyn InterruptSourceGroup>>>,
    msi_config: Vec<InterruptSourceConfig>,
}

And take firecracker/device/src/virtio/net.rs as an example, we have simplified it as below in dragonball:

pub struct Net {
    tap: Option<Tap>,
    device_info: VirtioDeviceInfo<EpollSlotAllocator>,
    queue_sizes: Arc<Vec<u16>>,
    rx_rate_limiter: Option<RateLimiter>,
    tx_rate_limiter: Option<RateLimiter>,
}       

struct NetEpollHandler {
    rx: RxVirtio,
    tap: Tap,   
    tx: TxVirtio,
    config: VirtioDeviceConfig,
    epoll_slots: Option<Box<dyn EpollSlotGroupT>>,
    // TODO(smbarber): http://crbug.com/753630
    // Remove once MRG_RXBUF is supported and this variable is actually used.
    #[allow(dead_code)]
    acked_features: u64,
    #[cfg(test)]
    test_mutators: tests::TestMutators,
}

jiangliu · 2020-01-09T15:35:07Z

Thanks for putting things together, I will compare this PR with my original implementation.
1)InterruptSourceGroup::{interrupt_type,len,base} are needed because we need to reconstruct an interrupt group object on the destination side for live-upgrading.

Is the expectation to use InterrruptSourceGroup directly for live update? I was considering this to be the common interrupt trait for all devices, which should be extended for other device specific implementations.

It's a common trait for all devices, but live-upgrading has dependency on InterruptSourceGroup::{interrupt_type,len,base}. Actually all of InterruptSourceGroup::{interrupt_type,len,base} are simple accessors.

trigger()/ack() are standard interrupt handling flow for level triggered interrupt. So ack() should be kept .

I looked at the device implementations in crosvm and firecracker and it looks like ack is actually used only in the virtio implementation. So I was thinking that ack should be a function of a trait in vm-virtio as the other devices don't really need it.

Basically, you would have a VirtioInterrupt in vm-virtio something like:
pub enum InterruptError {
    SingalUsedQueueFailed(io::Error),
    SignalConfigurationChangedFailed(io::Error),
}

pub trait VirtioInterrupt {
    // The `vector` parameter is only used with MSIx
    fn signal_used_queue(&self, vector: u16) -> InterruptError;
    fn signal_configuration_change(&self, vector: u16) -> InterruptError;
    fn ack(&self, bit_mask: usize) {
        // With MSIx this is not used so the default here is no-op.
    }
}

It's a common design pattern for a hardware device to have a interrupt status register. When the device want to signal an interrupt, it does:

set a bit corresponding to the event in the status register
trigger interrupt
the OS device driver receives the interrupt and read the status register
the OS device driver writes the value read from the status register back to the status register to clear the bits set.
Virtio-mmio devices work in this way, many other devices (such as PCI devices) also works in this way.
So ack() is not only needed by virtio subsystem.

It will make things complex to split out InterruptSourceGroupMsi. Eventually we will need an interrupt controller to manage interrupt mode for devices supporting multiple interrupt modes.
pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
    intr_groups: Vec<Arc<Box<dyn InterruptSourceGroup>>>,
    msi_config: Vec<InterruptSourceConfig>,
}
I think the interrupt controller should not be defined in vm-device and should be separate because we have different use cases and I don't think we can merge them into something that works/is good for both Firecracker and Cloud Hypervisor/Dragonball.

Yes , we have originally put DeviceInterruptManager in file dragonball/device/src/interrupt_manager.rs, instead of put it into the vm-device trait.
It's just used to show the way an interrupt group object may be used.

sameo · 2020-01-10T23:24:36Z

Thanks for putting things together, I will compare this PR with my original implementation.
1)InterruptSourceGroup::{interrupt_type,len,base} are needed because we need to reconstruct an interrupt group object on the destination side for live-upgrading.

So your live upgrade workflow, for a given device, would be to get a serialized {interrupt_type, len, base} upgrade payload on the receiving side, and then call create_group() with the deserialized data to create an identical interrupt source group?
The sending side is the device you're trying to move to the destination side, it's the same device that created the interrupt source group initially. So having those gettters is essentially a helper for the device implementation in the VMM not to cache that information (That it gets from the guest OS or firmware). Is that a correct description of the problem?

trigger()/ack() are standard interrupt handling flow for level triggered interrupt. So ack() should be kept .

Do you agree ack() would only be needed for legacy interrupts and be a noop for MSI ones? Same thing for the flag parameter that we'd have to pass to both trigger() and ack()`, it would be unused for MSI interrupts.
With Andreea, we wondered which devices use legacy interrupts and need a status register, and the only ones we could see are the virtio ones, when they're not MSI based. The legacy ones we know about are not falling in that bucket and the logic does not apply to VFIO. So the point here is about not overloading an API for a use case that's actually fairly marginal. And to follow up on your example, with the proposed changes, delivering an interrupt for most devices would look like:

self.config
       .trigger(0)
       .map_err(|err| DeviceError::IOError(err))

while for a virtio devices, using legacy interrupt it would be slightly more complex:

self.interrupt_status
            .fetch_or(VIRTIO_MMIO_INT_VRING as usize, Ordering::SeqCst);
self.config
       .trigger(0)
       .map_err(|err| DeviceError::IOError(err))

The way the interrupt is actually delivered is abstracted from the device implementation in both cases. I agree you could argue that setting the interrupt_status is part of the delivery, but what I'm saying is that it's specific to virtio+legacy afaik.

It will make things complex to split out InterruptSourceGroupMsi.

Looking back at it a few weeks after our original discussion, I now tend to agree with that although the update operations is very much an MSI specific one.

jiangliu · 2020-01-11T02:59:39Z

Thanks for putting things together, I will compare this PR with my original implementation.
1)InterruptSourceGroup::{interrupt_type,len,base} are needed because we need to reconstruct an interrupt group object on the destination side for live-upgrading.

So your live upgrade workflow, for a given device, would be to get a serialized {interrupt_type, len, base} upgrade payload on the receiving side, and then call create_group() with the deserialized data to create an identical interrupt source group?
The sending side is the device you're trying to move to the destination side, it's the same device that created the interrupt source group initially. So having those gettters is essentially a helper for the device implementation in the VMM not to cache that information (That it gets from the guest OS or firmware). Is that a correct description of the problem?

Yes, these three methods are getters. But information is not gotten from firmware or guest OS, it's defined by the device interrupt working mode and resource allocation.

trigger()/ack() are standard interrupt handling flow for level triggered interrupt. So ack() should be kept .

Do you agree ack() would only be needed for legacy interrupts and be a noop for MSI ones? Same thing for the flag parameter that we'd have to pass to both trigger() and ack()`, it would be unused for MSI interrupts.
With Andreea, we wondered which devices use legacy interrupts and need a status register, and the only ones we could see are the virtio ones, when they're not MSI based. The legacy ones we know about are not falling in that bucket and the logic does not apply to VFIO. So the point here is about not overloading an API for a use case that's actually fairly marginal. And to follow up on your example, with the proposed changes, delivering an interrupt for most devices would look like:
self.config
       .trigger(0)
       .map_err(|err| DeviceError::IOError(err))
while for a virtio devices, using legacy interrupt it would be slightly more complex:
self.interrupt_status
            .fetch_or(VIRTIO_MMIO_INT_VRING as usize, Ordering::SeqCst);
self.config
       .trigger(0)
       .map_err(|err| DeviceError::IOError(err))
The way the interrupt is actually delivered is abstracted from the device implementation in both cases. I agree you could argue that setting the interrupt_status is part of the delivery, but what I'm saying is that it's specific to virtio+legacy afaik.

It's true that ack() and flags are only used by legacy irqs. But ack() and flags are not limited to virtio + legacy(firecracker works in this way), it's almost for all legacy device. Let's take some examples:

firecracker/devices/src/legacy/serial.rs: Serial::interrupt_identification is the interrupt status register,
and Serial::iir_reset() is the ack().
For intel e1000 NIC, ICR is the interrupt status register. (Most PCI devices follow into this case).

And for firecracker project,

/dragonball.git $ grep interrupt_status firecracker.git/devices/src/ -r | wc -l
      26

So abstracting away the interrupt status does help to reduce duplicated code.

It will make things complex to split out InterruptSourceGroupMsi.

Looking back at it a few weeks after our original discussion, I now tend to agree with that although the update operations is very much an MSI specific one.

That's the really key difference.
You and @andreeaflorescu want a set up traits to describe different interrupt types.
We propose a superset trait to cover all interrupt types.
The reason we are proposing a superset trait is that it ease the way to use the InterruptGroup objects.
Take PCI device as an example, we prefer the way:

pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
    intr_groups: Vec<Arc<Box<dyn InterruptSourceGroup>>>,
    msi_config: Vec<InterruptSourceConfig>,
}

instead of

pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
   intr_legacy: Arc<Box<dyn InterruptSourceGroup>>,
   intr_msi: Arc<Box<dyn InterruptSourceGroupMsi>>
   intr_msix: Arc<Box<dyn InterruptSourceGroupMsi>>
    msi_config: Vec<InterruptSourceConfig>,
}

Use u64 for guest memory address type since this can make vm-device independent from on vm-memory. Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>

Change DeviceIo interface parameters to base and offset, so that devices with several IO ranges can use it to locate right range. Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>

As suggested from rust-vmm#18 (comment) Suggested-by: Andreea Florescu <fandree@amazon.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>

In order to get a real multiple threads handling to enhance performance, the DeviceIo trait need adopt interior mutability pattern. Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>

Based on resources definition, this adds device IO manager to manage all devices IO ranges. Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>

IO manager is responsible for handling IO operation when VMExit. It works out the specific device according to the address range and hand over to DeviceIo trait. Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>

Unit tests for IO manager. Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>

Append missing tests for resources and fix some typo. Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>

sameo · 2020-01-22T10:46:31Z

Thanks for putting things together, I will compare this PR with my original implementation.
1)InterruptSourceGroup::{interrupt_type,len,base} are needed because we need to reconstruct an interrupt group object on the destination side for live-upgrading.

So your live upgrade workflow, for a given device, would be to get a serialized {interrupt_type, len, base} upgrade payload on the receiving side, and then call create_group() with the deserialized data to create an identical interrupt source group?
The sending side is the device you're trying to move to the destination side, it's the same device that created the interrupt source group initially. So having those gettters is essentially a helper for the device implementation in the VMM not to cache that information (That it gets from the guest OS or firmware). Is that a correct description of the problem?

Yes, these three methods are getters. But information is not gotten from firmware or guest OS, it's defined by the device interrupt working mode and resource allocation.

You're right, they're basically defined by the VMM or a device. What I'm trying to say here is that the entity (VMM or device) that creates the interrupt source group on the source VMM is responsible to recreate it on the destination VMM, and since it was able to create it on the source side it should pass the information needed for creation on the target side.

trigger()/ack() are standard interrupt handling flow for level triggered interrupt. So ack() should be kept .

Do you agree ack() would only be needed for legacy interrupts and be a noop for MSI ones? Same thing for the flag parameter that we'd have to pass to both trigger() and ack()`, it would be unused for MSI interrupts.
With Andreea, we wondered which devices use legacy interrupts and need a status register, and the only ones we could see are the virtio ones, when they're not MSI based. The legacy ones we know about are not falling in that bucket and the logic does not apply to VFIO. So the point here is about not overloading an API for a use case that's actually fairly marginal. And to follow up on your example, with the proposed changes, delivering an interrupt for most devices would look like:
self.config
       .trigger(0)
       .map_err(|err| DeviceError::IOError(err))
while for a virtio devices, using legacy interrupt it would be slightly more complex:
self.interrupt_status
            .fetch_or(VIRTIO_MMIO_INT_VRING as usize, Ordering::SeqCst);
self.config
       .trigger(0)
       .map_err(|err| DeviceError::IOError(err))
The way the interrupt is actually delivered is abstracted from the device implementation in both cases. I agree you could argue that setting the interrupt_status is part of the delivery, but what I'm saying is that it's specific to virtio+legacy afaik.
It's true that ack() and flags are only used by legacy irqs. But ack() and flags are not limited to virtio + legacy(firecracker works in this way), it's almost for all legacy device. Let's take some examples:

firecracker/devices/src/legacy/serial.rs: Serial::interrupt_identification is the interrupt status register,
and Serial::iir_reset() is the ack().

For intel e1000 NIC, ICR is the interrupt status register. (Most PCI devices follow into this case).

And for firecracker project,
/dragonball.git $ grep interrupt_status firecracker.git/devices/src/ -r | wc -l
      26
So abstracting away the interrupt status does help to reduce duplicated code.

It will make things complex to split out InterruptSourceGroupMsi.

Looking back at it a few weeks after our original discussion, I now tend to agree with that although the update operations is very much an MSI specific one.

That's the really key difference.
You and @andreeaflorescu want a set up traits to describe different interrupt types.
We propose a superset trait to cover all interrupt types.
The reason we are proposing a superset trait is that it ease the way to use the InterruptGroup objects.
Take PCI device as an example, we prefer the way:
pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
    intr_groups: Vec<Arc<Box<dyn InterruptSourceGroup>>>,
    msi_config: Vec<InterruptSourceConfig>,
}
instead of
pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
   intr_legacy: Arc<Box<dyn InterruptSourceGroup>>,
   intr_msi: Arc<Box<dyn InterruptSourceGroupMsi>>
   intr_msix: Arc<Box<dyn InterruptSourceGroupMsi>>
    msi_config: Vec<InterruptSourceConfig>,
}

Yes, I think I agree with that. As a matter of fact, in Cloud Hypervisor we have transitioned to a InterruptSourceGroup and InterruptManager, see cloud-hypervisor/cloud-hypervisor#623 and cloud-hypervisor/cloud-hypervisor#637

The InterruptSourceGroup trait that we're using is defined at https://github.com/cloud-hypervisor/cloud-hypervisor/blob/master/vm-device/src/interrupt/mod.rs#L137.
Essentially, compared to this PR, the changes are:

We have one InterruptSourceGroup trait for both legacy and MSI
- update() takes a generic InterruptSourceConfig parameter, which is an enum. We're back to the discussion we had with @andreeaflorescu about having traits depend on enums and how that could potentially break crate users when we extend the enum. In practice I think that it's reasonable to say that supporting a new kind of interrupts (unknown to me at the moment) would deserve a new major version for that crate (or a new minor if we're still in 0.x).
We added mask/unmask operations, which are fairly typical interrupt related semantics

Overall I feel this trait is a reasonable compromise between our 2 proposals. In practice, this simplifies a lot our code base and how we manage interrupts. It's a significant win in terms of abstractions and I think we could all benefit from it.

@jiangliu @andreeaflorescu Please let me know what you think.

jiangliu · 2020-01-22T14:02:03Z

Thanks for putting things together, I will compare this PR with my original implementation.
1)InterruptSourceGroup::{interrupt_type,len,base} are needed because we need to reconstruct an interrupt group object on the destination side for live-upgrading.

So your live upgrade workflow, for a given device, would be to get a serialized {interrupt_type, len, base} upgrade payload on the receiving side, and then call create_group() with the deserialized data to create an identical interrupt source group?
The sending side is the device you're trying to move to the destination side, it's the same device that created the interrupt source group initially. So having those gettters is essentially a helper for the device implementation in the VMM not to cache that information (That it gets from the guest OS or firmware). Is that a correct description of the problem?

Yes, these three methods are getters. But information is not gotten from firmware or guest OS, it's defined by the device interrupt working mode and resource allocation.

You're right, they're basically defined by the VMM or a device. What I'm trying to say here is that the entity (VMM or device) that creates the interrupt source group on the source VMM is responsible to recreate it on the destination VMM, and since it was able to create it on the source side it should pass the information needed for creation on the target side.
trigger()/ack() are standard interrupt handling flow for level triggered interrupt. So ack() should be kept .

Do you agree ack() would only be needed for legacy interrupts and be a noop for MSI ones? Same thing for the flag parameter that we'd have to pass to both trigger() and ack()`, it would be unused for MSI interrupts.
With Andreea, we wondered which devices use legacy interrupts and need a status register, and the only ones we could see are the virtio ones, when they're not MSI based. The legacy ones we know about are not falling in that bucket and the logic does not apply to VFIO. So the point here is about not overloading an API for a use case that's actually fairly marginal. And to follow up on your example, with the proposed changes, delivering an interrupt for most devices would look like:
self.config
       .trigger(0)
       .map_err(|err| DeviceError::IOError(err))
while for a virtio devices, using legacy interrupt it would be slightly more complex:
self.interrupt_status
            .fetch_or(VIRTIO_MMIO_INT_VRING as usize, Ordering::SeqCst);
self.config
       .trigger(0)
       .map_err(|err| DeviceError::IOError(err))
The way the interrupt is actually delivered is abstracted from the device implementation in both cases. I agree you could argue that setting the interrupt_status is part of the delivery, but what I'm saying is that it's specific to virtio+legacy afaik.
It's true that ack() and flags are only used by legacy irqs. But ack() and flags are not limited to virtio + legacy(firecracker works in this way), it's almost for all legacy device. Let's take some examples:

firecracker/devices/src/legacy/serial.rs: Serial::interrupt_identification is the interrupt status register,
and Serial::iir_reset() is the ack().

For intel e1000 NIC, ICR is the interrupt status register. (Most PCI devices follow into this case).

And for firecracker project,
/dragonball.git $ grep interrupt_status firecracker.git/devices/src/ -r | wc -l
      26
So abstracting away the interrupt status does help to reduce duplicated code.

It will make things complex to split out InterruptSourceGroupMsi.

Looking back at it a few weeks after our original discussion, I now tend to agree with that although the update operations is very much an MSI specific one.

That's the really key difference.
You and @andreeaflorescu want a set up traits to describe different interrupt types.
We propose a superset trait to cover all interrupt types.
The reason we are proposing a superset trait is that it ease the way to use the InterruptGroup objects.
Take PCI device as an example, we prefer the way:
pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
    intr_groups: Vec<Arc<Box<dyn InterruptSourceGroup>>>,
    msi_config: Vec<InterruptSourceConfig>,
}
instead of
pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
   intr_legacy: Arc<Box<dyn InterruptSourceGroup>>,
   intr_msi: Arc<Box<dyn InterruptSourceGroupMsi>>
   intr_msix: Arc<Box<dyn InterruptSourceGroupMsi>>
    msi_config: Vec<InterruptSourceConfig>,
}
Yes, I think I agree with that. As a matter of fact, in Cloud Hypervisor we have transitioned to a InterruptSourceGroup and InterruptManager, see cloud-hypervisor/cloud-hypervisor#623 and cloud-hypervisor/cloud-hypervisor#637

The InterruptSourceGroup trait that we're using is defined at https://github.com/cloud-hypervisor/cloud-hypervisor/blob/master/vm-device/src/interrupt/mod.rs#L137.
Essentially, compared to this PR, the changes are:

We have one InterruptSourceGroup trait for both legacy and MSI

update() takes a generic InterruptSourceConfig parameter, which is an enum. We're back to the discussion we had with @andreeaflorescu about having traits depend on enums and how that could potentially break crate users when we extend the enum. In practice I think that it's reasonable to say that supporting a new kind of interrupts (unknown to me at the moment) would deserve a new major version for that crate (or a new minor if we're still in 0.x).

We added mask/unmask operations, which are fairly typical interrupt related semantics

Overall I feel this trait is a reasonable compromise between our 2 proposals. In practice, this simplifies a lot our code base and how we manage interrupts. It's a significant win in terms of abstractions and I think we could all benefit from it.

@jiangliu @andreeaflorescu Please let me know what you think.

I'm OK with the proposal, and I could add another layer over interrupt source group to solve the legacy interrupt flags issue. If we are all agree with this, I will update the PR accordingly.

But I still haven't gotten the point why we can't support base()/len()/type() getters on the interrupt source group trait:(

sameo · 2020-01-22T17:24:23Z

I'm OK with the proposal, and I could add another layer over interrupt source group to solve the legacy interrupt flags issue. If we are all agree with this, I will update the PR accordingly.

Cool. Let's see what @andreeaflorescu thinks.

But I still haven't gotten the point why we can't support base()/len()/type() getters on the interrupt source group trait:(

I'm not sure we absolutely need it in this form. If what we're looking for is a way to serialize an interrupt source group, then we may want to have a specific method for that.

And since I think an interrupt source group creator should also be the one serializing it for migrating/upgrading, it should have all the information it needs to do so (because it created it).

I guess we'll hit that issue with Cloud hypervisor soon, so by then we can propose to extend the trait accordingly. For now I'd leave those getters out of it.

andreeaflorescu · 2020-01-23T08:19:17Z

Hey, I am a bit busy on another thread now. I can take a look at this tomorrow or on Monday. Is that okay with you?

sameo · 2020-01-23T08:35:20Z

Hey, I am a bit busy on another thread now. I can take a look at this tomorrow or on Monday. Is that okay with you?

Of course it's fine.

andreeaflorescu · 2020-01-28T12:27:13Z

  * `update()` takes a generic `InterruptSourceConfig` parameter, which is an enum. We're back to the discussion we had with @andreeaflorescu about having traits depend on enums and how that could potentially break crate users when we extend the enum. In practice I think that it's reasonable to say that supporting a new kind of interrupts (unknown to me at the moment) would deserve a new major version for that crate (or a new minor if we're still in 0.x).

My concern with having an enum was not necessarily related to breaking compatibility. I was more thinking about a situation in which you have a custom, non-rust-vmm implementation of InterruptSourceGroup. Say that that implementation is actually of type LegacyInterrupt, but the implementation is custom and cannot be pushed to rust-vmm.
The second problem I have with the enum is strips off the genericness of the interface because people can use this always as:

match type {
    Legacy => do_something(),
    MSI => do_something_else(),
}

Not having a type would force you to properly use generics and not actual types.

If we can't have another workaround for it, we should just pay extra attention into how this interface is used in rust-vmm crates.

Should we submit a PR to vm-device with the proposed changes?

sameo · 2020-01-28T14:43:01Z

Should we submit a PR to vm-device with the proposed changes?

Let me update that PR here, and then we could morph @jiangliu 's pending PR into it? I find this would be less confusing.

andreeaflorescu · 2020-01-28T15:04:29Z

What does morph mean? :)))
I would open a new PR because the initial PR has a lot of comments and it might get super confusing while reviewing, not sure if that's what you mean.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

Wrapped lines at 100 chars. Added more details about the InterruptType. Signed-off-by: Andreea Florescu <fandree@amazon.com>

In the InterruptSourceGroup trait the only method that is going to be implemented for every interrupt type is the trigger method. We can make it easier for simple VMMs (like the ones using only legacy IRQs) to implement this trait by provided no-op implementation for all the other trait methods. Signed-off-by: Andreea Florescu <fandree@amazon.com>

Most methods in the traits need to mutate the state. The inital idea was that save mutability can be achieved by wrapping the trait implementers in a Mutex. I find this to be misleading when reading the code. The proposal is to use mut where needed unless there is a good reason not to. Signed-off-by: Andreea Florescu <fandree@amazon.com>

We extend the trait to support the mask/unmask operations. We also merge the MSI interrupt source group trait into the main one, which leads to using interrupt source type and config enums instead of type wrappers. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

sameo · 2020-01-28T15:47:11Z

What does morph mean? :)))

Transform :-)

I would open a new PR because the initial PR has a lot of comments and it might get super confusing while reviewing, not sure if that's what you mean.

Fine with me, but then we should close @jiangliu PR to avoid having 2 PRs, from mostly the same people, about the same topic.

sameo · 2020-01-28T15:47:30Z

I updated this current PR with the discussed/agreed upon modifications.

andreeaflorescu · 2020-01-28T16:35:07Z

@jiangliu are you okay with that? The problem is that PR already has 100 comments on it and they are not referring to the code we want to merge any more.

jiangliu · 2020-01-29T02:21:56Z

@jiangliu are you okay with that? The problem is that PR already has 100 comments on it and they are not referring to the code we want to merge any more.

Sure:)
Let me summarize it, the way to move forward is:

close PR Add interrupt traits and a KVM based implementation rust-vmm/vm-device#11
open another PR for clean review history
get the new PR merged
create another PR for changes from cloud hypervisor
By this way we could get a clean history and move forward:)
Is that correct?

sameo · 2020-01-29T08:25:55Z

@jiangliu are you okay with that? The problem is that PR already has 100 comments on it and they are not referring to the code we want to merge any more.

Sure:)
Let me summarize it, the way to move forward is:

close PR rust-vmm#11

open another PR for clean review history

get the new PR merged

create another PR for changes from cloud hypervisor

This current PR (#1) already contains the changes from Cloud Hypervisor, this step would not be needed.

By this way we could get a clean history and move forward:)

We should squash this one and make it a single commit PR, if that's what you mean.

Is that correct?

That sounds fine to me, yes.

andreeaflorescu · 2020-01-29T08:32:15Z

Cool. Let's close the existing PR and open a new one with these changes. I'll try to get some people from my team to review it as well.

jiangliu · 2020-01-31T18:59:02Z

  * `update()` takes a generic `InterruptSourceConfig` parameter, which is an enum. We're back to the discussion we had with @andreeaflorescu about having traits depend on enums and how that could potentially break crate users when we extend the enum. In practice I think that it's reasonable to say that supporting a new kind of interrupts (unknown to me at the moment) would deserve a new major version for that crate (or a new minor if we're still in 0.x).
My concern with having an enum was not necessarily related to breaking compatibility. I was more thinking about a situation in which you have a custom, non-rust-vmm implementation of InterruptSourceGroup. Say that that implementation is actually of type LegacyInterrupt, but the implementation is custom and cannot be pushed to rust-vmm.
The second problem I have with the enum is strips off the genericness of the interface because people can use this always as:
match type {
    Legacy => do_something(),
    MSI => do_something_else(),
}
Not having a type would force you to properly use generics and not actual types.

If we can't have another workaround for it, we should just pay extra attention into how this interface is used in rust-vmm crates.

Should we submit a PR to vm-device with the proposed changes?

When supporting VFIO later, we may have a code snippet as

/// Type of interrupt source.
#[derive(Clone)]
pub enum InterruptSourceType {
    #[cfg(feature = "legacy_irq")]
    /// Legacy Pin-based Interrupt.
    /// On x86 platforms, legacy interrupts are routed through 8259 PICs and/or IOAPICs.
    LegacyIrq,
    #[cfg(feature = "pci_msi_irq")]
    /// Message Signaled Interrupt (PCI MSI/PCI MSIx).
    /// Some non-PCI devices (like HPET on x86) make use of generic MSI in platform specific ways.
    PciMsiIrq,
    #[cfg(feature = "vfio_msi_irq")]
    /// Message Signalled Interrupt for PCI MSI/PCI MSIx based VFIO devices.
    VfioMsiIrq(Arc<VfioDevice>, u32),
}

So it would be better to keep it as an enum so we could attach more data to Irq type.

sameo · 2020-01-31T21:36:59Z

@jiangliu

So it would be better to keep it as an enum so we could attach more data to Irq type.

That's actually what we included in the last version of the proposal.
I squashed all our changes into one commit and opened PR #3 to make things clearer. See
https://github.com/sameo/vm-device/pull/3/files#diff-3d79bc3554f81f73e7f779e96b131729R69-R78

The idea, since I believe we all seem to agree with this last proposal, would be to replace rust-vmm#21 with the equivalent of PR #3. We should merge the interface first (based, again, on PR #3) and then discuss about about a KVM implementation through a follow up PR.

sameo · 2020-02-06T18:27:25Z

PR #3 supersedes that one.

andreeaflorescu reviewed Dec 19, 2019

View reviewed changes

sameo force-pushed the topic/interrupt branch from 233af25 to 05a5f7c Compare December 19, 2019 13:19

Jing Liu and others added 9 commits January 14, 2020 22:03

Remove vm-memory dependency

c1e1452

Use u64 for guest memory address type since this can make vm-device independent from on vm-memory. Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>

Improve DeviceIo interface

06b9e00

Change DeviceIo interface parameters to base and offset, so that devices with several IO ranges can use it to locate right range. Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>

lib: Improve DeviceIo methods documentation

d4664f8

As suggested from rust-vmm#18 (comment) Suggested-by: Andreea Florescu <fandree@amazon.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

Update rust-vmm-ci

3d51c9e

Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>

Make DeviceIo internal mutability

6aae202

In order to get a real multiple threads handling to enhance performance, the DeviceIo trait need adopt interior mutability pattern. Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>

Add IO manager support

ef21cf1

Based on resources definition, this adds device IO manager to manage all devices IO ranges. Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>

Add read and write operations handling

372bedf

IO manager is responsible for handling IO operation when VMExit. It works out the specific device according to the address range and hand over to DeviceIo trait. Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>

Add unit tests

66166a3

Unit tests for IO manager. Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>

Append missing tests for resources

3daea68

Append missing tests for resources and fix some typo. Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>

Samuel Ortiz and others added 3 commits January 28, 2020 16:43

interrupt: Initial interrupt manager traits

8d7f8b3

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>

interrupts: documentation improvements

a649808

Wrapped lines at 100 chars. Added more details about the InterruptType. Signed-off-by: Andreea Florescu <fandree@amazon.com>

andreeaflorescu and others added 2 commits January 28, 2020 16:43

sameo force-pushed the topic/interrupt branch from fbb2d7c to f886e49 Compare January 28, 2020 15:44

sameo closed this Feb 6, 2020

interrupt: Initial interrupt manager traits #1

interrupt: Initial interrupt manager traits #1

Uh oh!

Conversation

sameo commented Dec 17, 2019

Uh oh!

sameo commented Dec 17, 2019

Uh oh!

andreeaflorescu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jiangliu commented Jan 9, 2020

Uh oh!

andreeaflorescu commented Jan 9, 2020

Uh oh!

jiangliu commented Jan 9, 2020

Uh oh!

jiangliu commented Jan 9, 2020 • edited by sameo Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sameo commented Jan 10, 2020

Uh oh!

jiangliu commented Jan 11, 2020 • edited by sameo Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sameo commented Jan 22, 2020

Uh oh!

jiangliu commented Jan 22, 2020

Uh oh!

sameo commented Jan 22, 2020

Uh oh!

andreeaflorescu commented Jan 23, 2020

Uh oh!

sameo commented Jan 23, 2020

Uh oh!

andreeaflorescu commented Jan 28, 2020

Uh oh!

sameo commented Jan 28, 2020

Uh oh!

andreeaflorescu commented Jan 28, 2020

Uh oh!

sameo commented Jan 28, 2020

Uh oh!

sameo commented Jan 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andreeaflorescu commented Jan 28, 2020

Uh oh!

jiangliu commented Jan 29, 2020

Uh oh!

sameo commented Jan 29, 2020

Uh oh!

andreeaflorescu commented Jan 29, 2020

Uh oh!

jiangliu commented Jan 31, 2020

Uh oh!

sameo commented Jan 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiangliu commented Jan 9, 2020 •

edited by sameo

Loading

jiangliu commented Jan 11, 2020 •

edited by sameo

Loading

sameo commented Jan 28, 2020 •

edited

Loading

sameo commented Jan 31, 2020 •

edited

Loading