Skip to content

Conversation

@sameo
Copy link
Owner

@sameo sameo commented Dec 17, 2019

Signed-off-by: Samuel Ortiz sameo@linux.intel.com

@sameo
Copy link
Owner Author

sameo commented Dec 17, 2019

cc @andreeaflorescu

Copy link

@andreeaflorescu andreeaflorescu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I add a commit on top of your commit with some other fixes?

/// Data type to store an interrupt source identifier.
pub type InterruptIndex = u32;

/// Data type to store an interrupt source type.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is going to be tricky to handle the interrupt type. We might want to explain here why we want it to be a u32 instead of an enum and how do we expect this to be used.

Comment on lines 72 to 75
/// Data type to store an interrupt event.
pub type InterruptEvent = u32;

pub const IRQ_TRIGGERED: InterruptEvent = 0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What were these 2?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftovers, they need to be removed.

/// # Arguments
/// * index: sub-index into the group.
/// * config: configuration data for the interrupt source.
fn modify(&self, index: InterruptIndex, config: &MsiIrqSourceConfig) -> Result<()>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to update?

Comment on lines 120 to 121
fn enable(&self) -> Result<()>;

/// Disable the interrupt sources in the group to generate interrupts.
fn disable(&self) -> Result<()>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pls remind me why were these needed?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the guest driver can decide to disable and re-enable interrupts.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guest kernel may choose different interrupt mode. For PCI device, guest kernel may trigger following transitions:

  1. interrupt disabled -> interrupt enabled
  2. legacy interrupt -> PCI MSI interrupt
  3. PCI MSI interrupt -> legacy interrupt
  4. legacy interrupt -> PCI MSIx interrupt
  5. PCI MSIx interrupt -> legacy interrupt.
    On mode transitation, disable() is called on the old interrupt group object and enable() is called on the new interrupt group.

/// An interrupt notifier allows for external components and processes
/// to inject interrupts into a guest, by writing to the file returned
/// by this method.
fn notifier(&self, index: InterruptIndex) -> Option<File>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if File is the best option here. Maybe we can have something that implement the traits needed (write, read?). I don't know which traits are needed actually. I'll experiment with this a bit.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also have a default implementation that returns none as this should not be implemented by non vhost-user backends.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if File is the best option here. Maybe we can have something that implement the traits needed (write, read?). I don't know which traits are needed actually. I'll experiment with this a bit.

It's not really about traits that need to be implemented but about the fact that it can be made into a raw fd, to pass it down to the vhost-user backend process.
A File is something that would work (That's the inner type for Eventfd...) and is the least common denominator I found between Windows and Linux.

@jiangliu
Copy link

jiangliu commented Jan 9, 2020

Thanks for putting things together, I will compare this PR with my original implementation.
1)InterruptSourceGroup::{interrupt_type,len,base} are needed because we need to reconstruct an interrupt group object on the destination side for live-upgrading.
2) trigger()/ack() are standard interrupt handling flow for level triggered interrupt. So ack() should be kept .
3) It will make things complex to split out InterruptSourceGroupMsi. Eventually we will need an interrupt controller to manage interrupt mode for devices supporting multiple interrupt modes.

pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
    intr_groups: Vec<Arc<Box<dyn InterruptSourceGroup>>>,
    msi_config: Vec<InterruptSourceConfig>,
}

@andreeaflorescu
Copy link

Thanks for putting things together, I will compare this PR with my original implementation.
1)InterruptSourceGroup::{interrupt_type,len,base} are needed because we need to reconstruct an interrupt group object on the destination side for live-upgrading.

Is the expectation to use InterrruptSourceGroup directly for live update? I was considering this to be the common interrupt trait for all devices, which should be extended for other device specific implementations.

  1. trigger()/ack() are standard interrupt handling flow for level triggered interrupt. So ack() should be kept .

I looked at the device implementations in crosvm and firecracker and it looks like ack is actually used only in the virtio implementation. So I was thinking that ack should be a function of a trait in vm-virtio as the other devices don't really need it.

Basically, you would have a VirtioInterrupt in vm-virtio something like:

pub enum InterruptError {
    SingalUsedQueueFailed(io::Error),
    SignalConfigurationChangedFailed(io::Error),
}

pub trait VirtioInterrupt {
    // The `vector` parameter is only used with MSIx
    fn signal_used_queue(&self, vector: u16) -> InterruptError;
    fn signal_configuration_change(&self, vector: u16) -> InterruptError;
    fn ack(&self, bit_mask: usize) {
        // With MSIx this is not used so the default here is no-op.
    }
}
  1. It will make things complex to split out InterruptSourceGroupMsi. Eventually we will need an interrupt controller to manage interrupt mode for devices supporting multiple interrupt modes.
pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
    intr_groups: Vec<Arc<Box<dyn InterruptSourceGroup>>>,
    msi_config: Vec<InterruptSourceConfig>,
}

I think the interrupt controller should not be defined in vm-device and should be separate because we have different use cases and I don't think we can merge them into something that works/is good for both Firecracker and Cloud Hypervisor/Dragonball.

@jiangliu
Copy link

jiangliu commented Jan 9, 2020

When design the interrupt subsystem, we need to think about both the abstraction and the way to use the interrupt abstraction.
The design goal is simple the device backend drivers. Device backend driver doesn't need to manage any interrupt implementation details, it only needs a way to trigger interrupt for events as below:

self.config
       .trigger(0, VIRTIO_INTR_VRING)
       .map_err(|err| DeviceError::VirtIoFailure(VirtIoError::IOError(err)))

instead of

    fn signal_used_queue(&self) -> result::Result<(), DeviceError> {
        self.interrupt_status
            .fetch_or(VIRTIO_MMIO_INT_VRING as usize, Ordering::SeqCst);
        self.interrupt_evt.write(1).map_err(|e| {
            error!("Failed to signal used queue: {:?}", e);
            METRICS.net.event_fails.inc();
            DeviceError::FailedSignalingUsedQueue(e)
        })
    }

For a device supporting both multi-queues and MSI, the code is almost the same:

self.config
       .trigger(queue_idx, VIRTIO_INTR_VRING)
       .map_err(|e| VmmEpollError::IOError(e))?;

For PCI and MMIO device with MSI, we need a DeviceInterruptManager to manage the backend device's interrupt mode. Current the DeviceInterruptManager is defined as:

#[derive(Copy, Clone, PartialEq, Serialize, Deserialize)]
pub enum DeviceInterruptMode {
    Disabled = 0,
    LegacyIrq = 1,
    GenericMsiIrq = 2,
    PciMsiIrq = 3,
    PciMsixIrq = 4,
}

/// A struct to manage interrupts and interrupt modes for a device.
///
/// The interrupt manager may support multiple working mode. For example, an interrupt manager
/// for a PCI device may work in legacy mode, PCI MSI mode or PCI MSIx mode. Under certain
/// conditions, the interrupt manager may switch between interrupt working modes. To simplify
/// implementation, switching working mode is only supported at configuration stage and will be
/// disabled at runtime stage. The DeviceInterruptManager::enable() switches the interrupt manager
/// from configuration stage into runtime stage. And DeviceInterruptManager::disable() switches
/// from runtime stage back to configuration stage.
#[vmm_serde::export_as_pub()]
pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
    intr_groups: Vec<Arc<Box<dyn InterruptSourceGroup>>>,
    msi_config: Vec<InterruptSourceConfig>,
}

And take firecracker/device/src/virtio/net.rs as an example, we have simplified it as below in dragonball:

pub struct Net {
    tap: Option<Tap>,
    device_info: VirtioDeviceInfo<EpollSlotAllocator>,
    queue_sizes: Arc<Vec<u16>>,
    rx_rate_limiter: Option<RateLimiter>,
    tx_rate_limiter: Option<RateLimiter>,
}       

struct NetEpollHandler {
    rx: RxVirtio,
    tap: Tap,   
    tx: TxVirtio,
    config: VirtioDeviceConfig,
    epoll_slots: Option<Box<dyn EpollSlotGroupT>>,
    // TODO(smbarber): http://crbug.com/753630
    // Remove once MRG_RXBUF is supported and this variable is actually used.
    #[allow(dead_code)]
    acked_features: u64,
    #[cfg(test)]
    test_mutators: tests::TestMutators,
}

@jiangliu
Copy link

jiangliu commented Jan 9, 2020

Thanks for putting things together, I will compare this PR with my original implementation.
1)InterruptSourceGroup::{interrupt_type,len,base} are needed because we need to reconstruct an interrupt group object on the destination side for live-upgrading.

Is the expectation to use InterrruptSourceGroup directly for live update? I was considering this to be the common interrupt trait for all devices, which should be extended for other device specific implementations.

It's a common trait for all devices, but live-upgrading has dependency on InterruptSourceGroup::{interrupt_type,len,base}. Actually all of InterruptSourceGroup::{interrupt_type,len,base} are simple accessors.

  1. trigger()/ack() are standard interrupt handling flow for level triggered interrupt. So ack() should be kept .

I looked at the device implementations in crosvm and firecracker and it looks like ack is actually used only in the virtio implementation. So I was thinking that ack should be a function of a trait in vm-virtio as the other devices don't really need it.

Basically, you would have a VirtioInterrupt in vm-virtio something like:

pub enum InterruptError {
    SingalUsedQueueFailed(io::Error),
    SignalConfigurationChangedFailed(io::Error),
}

pub trait VirtioInterrupt {
    // The `vector` parameter is only used with MSIx
    fn signal_used_queue(&self, vector: u16) -> InterruptError;
    fn signal_configuration_change(&self, vector: u16) -> InterruptError;
    fn ack(&self, bit_mask: usize) {
        // With MSIx this is not used so the default here is no-op.
    }
}

It's a common design pattern for a hardware device to have a interrupt status register. When the device want to signal an interrupt, it does:

  1. set a bit corresponding to the event in the status register
  2. trigger interrupt
  3. the OS device driver receives the interrupt and read the status register
  4. the OS device driver writes the value read from the status register back to the status register to clear the bits set.
    Virtio-mmio devices work in this way, many other devices (such as PCI devices) also works in this way.
    So ack() is not only needed by virtio subsystem.
  1. It will make things complex to split out InterruptSourceGroupMsi. Eventually we will need an interrupt controller to manage interrupt mode for devices supporting multiple interrupt modes.
pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
    intr_groups: Vec<Arc<Box<dyn InterruptSourceGroup>>>,
    msi_config: Vec<InterruptSourceConfig>,
}

I think the interrupt controller should not be defined in vm-device and should be separate because we have different use cases and I don't think we can merge them into something that works/is good for both Firecracker and Cloud Hypervisor/Dragonball.

Yes , we have originally put DeviceInterruptManager in file dragonball/device/src/interrupt_manager.rs, instead of put it into the vm-device trait.
It's just used to show the way an interrupt group object may be used.

@sameo
Copy link
Owner Author

sameo commented Jan 10, 2020

Thanks for putting things together, I will compare this PR with my original implementation.
1)InterruptSourceGroup::{interrupt_type,len,base} are needed because we need to reconstruct an interrupt group object on the destination side for live-upgrading.

So your live upgrade workflow, for a given device, would be to get a serialized {interrupt_type, len, base} upgrade payload on the receiving side, and then call create_group() with the deserialized data to create an identical interrupt source group?
The sending side is the device you're trying to move to the destination side, it's the same device that created the interrupt source group initially. So having those gettters is essentially a helper for the device implementation in the VMM not to cache that information (That it gets from the guest OS or firmware). Is that a correct description of the problem?

  1. trigger()/ack() are standard interrupt handling flow for level triggered interrupt. So ack() should be kept .

Do you agree ack() would only be needed for legacy interrupts and be a noop for MSI ones? Same thing for the flag parameter that we'd have to pass to both trigger() and ack()`, it would be unused for MSI interrupts.
With Andreea, we wondered which devices use legacy interrupts and need a status register, and the only ones we could see are the virtio ones, when they're not MSI based. The legacy ones we know about are not falling in that bucket and the logic does not apply to VFIO. So the point here is about not overloading an API for a use case that's actually fairly marginal. And to follow up on your example, with the proposed changes, delivering an interrupt for most devices would look like:

self.config
       .trigger(0)
       .map_err(|err| DeviceError::IOError(err))

while for a virtio devices, using legacy interrupt it would be slightly more complex:

self.interrupt_status
            .fetch_or(VIRTIO_MMIO_INT_VRING as usize, Ordering::SeqCst);
self.config
       .trigger(0)
       .map_err(|err| DeviceError::IOError(err))

The way the interrupt is actually delivered is abstracted from the device implementation in both cases. I agree you could argue that setting the interrupt_status is part of the delivery, but what I'm saying is that it's specific to virtio+legacy afaik.

  1. It will make things complex to split out InterruptSourceGroupMsi.

Looking back at it a few weeks after our original discussion, I now tend to agree with that although the update operations is very much an MSI specific one.

@jiangliu
Copy link

jiangliu commented Jan 11, 2020

Thanks for putting things together, I will compare this PR with my original implementation.
1)InterruptSourceGroup::{interrupt_type,len,base} are needed because we need to reconstruct an interrupt group object on the destination side for live-upgrading.

So your live upgrade workflow, for a given device, would be to get a serialized {interrupt_type, len, base} upgrade payload on the receiving side, and then call create_group() with the deserialized data to create an identical interrupt source group?
The sending side is the device you're trying to move to the destination side, it's the same device that created the interrupt source group initially. So having those gettters is essentially a helper for the device implementation in the VMM not to cache that information (That it gets from the guest OS or firmware). Is that a correct description of the problem?

Yes, these three methods are getters. But information is not gotten from firmware or guest OS, it's defined by the device interrupt working mode and resource allocation.

  1. trigger()/ack() are standard interrupt handling flow for level triggered interrupt. So ack() should be kept .

Do you agree ack() would only be needed for legacy interrupts and be a noop for MSI ones? Same thing for the flag parameter that we'd have to pass to both trigger() and ack()`, it would be unused for MSI interrupts.
With Andreea, we wondered which devices use legacy interrupts and need a status register, and the only ones we could see are the virtio ones, when they're not MSI based. The legacy ones we know about are not falling in that bucket and the logic does not apply to VFIO. So the point here is about not overloading an API for a use case that's actually fairly marginal. And to follow up on your example, with the proposed changes, delivering an interrupt for most devices would look like:

self.config
       .trigger(0)
       .map_err(|err| DeviceError::IOError(err))

while for a virtio devices, using legacy interrupt it would be slightly more complex:

self.interrupt_status
            .fetch_or(VIRTIO_MMIO_INT_VRING as usize, Ordering::SeqCst);
self.config
       .trigger(0)
       .map_err(|err| DeviceError::IOError(err))

The way the interrupt is actually delivered is abstracted from the device implementation in both cases. I agree you could argue that setting the interrupt_status is part of the delivery, but what I'm saying is that it's specific to virtio+legacy afaik.

It's true that ack() and flags are only used by legacy irqs. But ack() and flags are not limited to virtio + legacy(firecracker works in this way), it's almost for all legacy device. Let's take some examples:

  1. firecracker/devices/src/legacy/serial.rs: Serial::interrupt_identification is the interrupt status register,
    and Serial::iir_reset() is the ack().
  2. For intel e1000 NIC, ICR is the interrupt status register. (Most PCI devices follow into this case).

And for firecracker project,

/dragonball.git $ grep interrupt_status firecracker.git/devices/src/ -r | wc -l
      26

So abstracting away the interrupt status does help to reduce duplicated code.

  1. It will make things complex to split out InterruptSourceGroupMsi.

Looking back at it a few weeks after our original discussion, I now tend to agree with that although the update operations is very much an MSI specific one.

That's the really key difference.
You and @andreeaflorescu want a set up traits to describe different interrupt types.
We propose a superset trait to cover all interrupt types.
The reason we are proposing a superset trait is that it ease the way to use the InterruptGroup objects.
Take PCI device as an example, we prefer the way:

pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
    intr_groups: Vec<Arc<Box<dyn InterruptSourceGroup>>>,
    msi_config: Vec<InterruptSourceConfig>,
}

instead of

pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
   intr_legacy: Arc<Box<dyn InterruptSourceGroup>>,
   intr_msi: Arc<Box<dyn InterruptSourceGroupMsi>>
   intr_msix: Arc<Box<dyn InterruptSourceGroupMsi>>
    msi_config: Vec<InterruptSourceConfig>,
}

Jing Liu and others added 9 commits January 14, 2020 22:03
Use u64 for guest memory address type since this can make
vm-device independent from on vm-memory.

Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
Change DeviceIo interface parameters to base and offset, so that
devices with several IO ranges can use it to locate right range.

Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
As suggested from
rust-vmm#18 (comment)

Suggested-by: Andreea Florescu <fandree@amazon.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
In order to get a real multiple threads handling to enhance
performance, the DeviceIo trait need adopt interior mutability
pattern.

Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
Based on resources definition, this adds device IO manager to manage
all devices IO ranges.

Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
IO manager is responsible for handling IO operation when VMExit.
It works out the specific device according to the address range and
hand over to DeviceIo trait.

Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
Unit tests for IO manager.

Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
Append missing tests for resources and fix some typo.

Signed-off-by: Jing Liu <jing2.liu@linux.intel.com>
@sameo
Copy link
Owner Author

sameo commented Jan 22, 2020

Thanks for putting things together, I will compare this PR with my original implementation.
1)InterruptSourceGroup::{interrupt_type,len,base} are needed because we need to reconstruct an interrupt group object on the destination side for live-upgrading.

So your live upgrade workflow, for a given device, would be to get a serialized {interrupt_type, len, base} upgrade payload on the receiving side, and then call create_group() with the deserialized data to create an identical interrupt source group?
The sending side is the device you're trying to move to the destination side, it's the same device that created the interrupt source group initially. So having those gettters is essentially a helper for the device implementation in the VMM not to cache that information (That it gets from the guest OS or firmware). Is that a correct description of the problem?

Yes, these three methods are getters. But information is not gotten from firmware or guest OS, it's defined by the device interrupt working mode and resource allocation.

You're right, they're basically defined by the VMM or a device. What I'm trying to say here is that the entity (VMM or device) that creates the interrupt source group on the source VMM is responsible to recreate it on the destination VMM, and since it was able to create it on the source side it should pass the information needed for creation on the target side.

  1. trigger()/ack() are standard interrupt handling flow for level triggered interrupt. So ack() should be kept .

Do you agree ack() would only be needed for legacy interrupts and be a noop for MSI ones? Same thing for the flag parameter that we'd have to pass to both trigger() and ack()`, it would be unused for MSI interrupts.
With Andreea, we wondered which devices use legacy interrupts and need a status register, and the only ones we could see are the virtio ones, when they're not MSI based. The legacy ones we know about are not falling in that bucket and the logic does not apply to VFIO. So the point here is about not overloading an API for a use case that's actually fairly marginal. And to follow up on your example, with the proposed changes, delivering an interrupt for most devices would look like:

self.config
       .trigger(0)
       .map_err(|err| DeviceError::IOError(err))

while for a virtio devices, using legacy interrupt it would be slightly more complex:

self.interrupt_status
            .fetch_or(VIRTIO_MMIO_INT_VRING as usize, Ordering::SeqCst);
self.config
       .trigger(0)
       .map_err(|err| DeviceError::IOError(err))

The way the interrupt is actually delivered is abstracted from the device implementation in both cases. I agree you could argue that setting the interrupt_status is part of the delivery, but what I'm saying is that it's specific to virtio+legacy afaik.

It's true that ack() and flags are only used by legacy irqs. But ack() and flags are not limited to virtio + legacy(firecracker works in this way), it's almost for all legacy device. Let's take some examples:

  1. firecracker/devices/src/legacy/serial.rs: Serial::interrupt_identification is the interrupt status register,
    and Serial::iir_reset() is the ack().
  2. For intel e1000 NIC, ICR is the interrupt status register. (Most PCI devices follow into this case).

And for firecracker project,

/dragonball.git $ grep interrupt_status firecracker.git/devices/src/ -r | wc -l
      26

So abstracting away the interrupt status does help to reduce duplicated code.

  1. It will make things complex to split out InterruptSourceGroupMsi.

Looking back at it a few weeks after our original discussion, I now tend to agree with that although the update operations is very much an MSI specific one.

That's the really key difference.
You and @andreeaflorescu want a set up traits to describe different interrupt types.
We propose a superset trait to cover all interrupt types.
The reason we are proposing a superset trait is that it ease the way to use the InterruptGroup objects.
Take PCI device as an example, we prefer the way:

pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
    intr_groups: Vec<Arc<Box<dyn InterruptSourceGroup>>>,
    msi_config: Vec<InterruptSourceConfig>,
}

instead of

pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
   intr_legacy: Arc<Box<dyn InterruptSourceGroup>>,
   intr_msi: Arc<Box<dyn InterruptSourceGroupMsi>>
   intr_msix: Arc<Box<dyn InterruptSourceGroupMsi>>
    msi_config: Vec<InterruptSourceConfig>,
}

Yes, I think I agree with that. As a matter of fact, in Cloud Hypervisor we have transitioned to a InterruptSourceGroup and InterruptManager, see cloud-hypervisor/cloud-hypervisor#623 and cloud-hypervisor/cloud-hypervisor#637

The InterruptSourceGroup trait that we're using is defined at https://github.com/cloud-hypervisor/cloud-hypervisor/blob/master/vm-device/src/interrupt/mod.rs#L137.
Essentially, compared to this PR, the changes are:

  • We have one InterruptSourceGroup trait for both legacy and MSI
    • update() takes a generic InterruptSourceConfig parameter, which is an enum. We're back to the discussion we had with @andreeaflorescu about having traits depend on enums and how that could potentially break crate users when we extend the enum. In practice I think that it's reasonable to say that supporting a new kind of interrupts (unknown to me at the moment) would deserve a new major version for that crate (or a new minor if we're still in 0.x).
  • We added mask/unmask operations, which are fairly typical interrupt related semantics

Overall I feel this trait is a reasonable compromise between our 2 proposals. In practice, this simplifies a lot our code base and how we manage interrupts. It's a significant win in terms of abstractions and I think we could all benefit from it.

@jiangliu @andreeaflorescu Please let me know what you think.

@jiangliu
Copy link

Thanks for putting things together, I will compare this PR with my original implementation.
1)InterruptSourceGroup::{interrupt_type,len,base} are needed because we need to reconstruct an interrupt group object on the destination side for live-upgrading.

So your live upgrade workflow, for a given device, would be to get a serialized {interrupt_type, len, base} upgrade payload on the receiving side, and then call create_group() with the deserialized data to create an identical interrupt source group?
The sending side is the device you're trying to move to the destination side, it's the same device that created the interrupt source group initially. So having those gettters is essentially a helper for the device implementation in the VMM not to cache that information (That it gets from the guest OS or firmware). Is that a correct description of the problem?

Yes, these three methods are getters. But information is not gotten from firmware or guest OS, it's defined by the device interrupt working mode and resource allocation.

You're right, they're basically defined by the VMM or a device. What I'm trying to say here is that the entity (VMM or device) that creates the interrupt source group on the source VMM is responsible to recreate it on the destination VMM, and since it was able to create it on the source side it should pass the information needed for creation on the target side.

  1. trigger()/ack() are standard interrupt handling flow for level triggered interrupt. So ack() should be kept .

Do you agree ack() would only be needed for legacy interrupts and be a noop for MSI ones? Same thing for the flag parameter that we'd have to pass to both trigger() and ack()`, it would be unused for MSI interrupts.
With Andreea, we wondered which devices use legacy interrupts and need a status register, and the only ones we could see are the virtio ones, when they're not MSI based. The legacy ones we know about are not falling in that bucket and the logic does not apply to VFIO. So the point here is about not overloading an API for a use case that's actually fairly marginal. And to follow up on your example, with the proposed changes, delivering an interrupt for most devices would look like:

self.config
       .trigger(0)
       .map_err(|err| DeviceError::IOError(err))

while for a virtio devices, using legacy interrupt it would be slightly more complex:

self.interrupt_status
            .fetch_or(VIRTIO_MMIO_INT_VRING as usize, Ordering::SeqCst);
self.config
       .trigger(0)
       .map_err(|err| DeviceError::IOError(err))

The way the interrupt is actually delivered is abstracted from the device implementation in both cases. I agree you could argue that setting the interrupt_status is part of the delivery, but what I'm saying is that it's specific to virtio+legacy afaik.

It's true that ack() and flags are only used by legacy irqs. But ack() and flags are not limited to virtio + legacy(firecracker works in this way), it's almost for all legacy device. Let's take some examples:

  1. firecracker/devices/src/legacy/serial.rs: Serial::interrupt_identification is the interrupt status register,
    and Serial::iir_reset() is the ack().
  2. For intel e1000 NIC, ICR is the interrupt status register. (Most PCI devices follow into this case).

And for firecracker project,

/dragonball.git $ grep interrupt_status firecracker.git/devices/src/ -r | wc -l
      26

So abstracting away the interrupt status does help to reduce duplicated code.

  1. It will make things complex to split out InterruptSourceGroupMsi.

Looking back at it a few weeks after our original discussion, I now tend to agree with that although the update operations is very much an MSI specific one.

That's the really key difference.
You and @andreeaflorescu want a set up traits to describe different interrupt types.
We propose a superset trait to cover all interrupt types.
The reason we are proposing a superset trait is that it ease the way to use the InterruptGroup objects.
Take PCI device as an example, we prefer the way:

pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
    intr_groups: Vec<Arc<Box<dyn InterruptSourceGroup>>>,
    msi_config: Vec<InterruptSourceConfig>,
}

instead of

pub struct DeviceInterruptManager {
    irq_manager: Arc<KvmIrqManager>,
    intr_mode: DeviceInterruptMode,
    intr_activated: bool,
    current_idx: usize,
    intr_targets: [usize; 5],
   intr_legacy: Arc<Box<dyn InterruptSourceGroup>>,
   intr_msi: Arc<Box<dyn InterruptSourceGroupMsi>>
   intr_msix: Arc<Box<dyn InterruptSourceGroupMsi>>
    msi_config: Vec<InterruptSourceConfig>,
}

Yes, I think I agree with that. As a matter of fact, in Cloud Hypervisor we have transitioned to a InterruptSourceGroup and InterruptManager, see cloud-hypervisor/cloud-hypervisor#623 and cloud-hypervisor/cloud-hypervisor#637

The InterruptSourceGroup trait that we're using is defined at https://github.com/cloud-hypervisor/cloud-hypervisor/blob/master/vm-device/src/interrupt/mod.rs#L137.
Essentially, compared to this PR, the changes are:

  • We have one InterruptSourceGroup trait for both legacy and MSI

    • update() takes a generic InterruptSourceConfig parameter, which is an enum. We're back to the discussion we had with @andreeaflorescu about having traits depend on enums and how that could potentially break crate users when we extend the enum. In practice I think that it's reasonable to say that supporting a new kind of interrupts (unknown to me at the moment) would deserve a new major version for that crate (or a new minor if we're still in 0.x).
  • We added mask/unmask operations, which are fairly typical interrupt related semantics

Overall I feel this trait is a reasonable compromise between our 2 proposals. In practice, this simplifies a lot our code base and how we manage interrupts. It's a significant win in terms of abstractions and I think we could all benefit from it.

@jiangliu @andreeaflorescu Please let me know what you think.

I'm OK with the proposal, and I could add another layer over interrupt source group to solve the legacy interrupt flags issue. If we are all agree with this, I will update the PR accordingly.

But I still haven't gotten the point why we can't support base()/len()/type() getters on the interrupt source group trait:(

@sameo
Copy link
Owner Author

sameo commented Jan 22, 2020

I'm OK with the proposal, and I could add another layer over interrupt source group to solve the legacy interrupt flags issue. If we are all agree with this, I will update the PR accordingly.

Cool. Let's see what @andreeaflorescu thinks.

But I still haven't gotten the point why we can't support base()/len()/type() getters on the interrupt source group trait:(

I'm not sure we absolutely need it in this form. If what we're looking for is a way to serialize an interrupt source group, then we may want to have a specific method for that.

And since I think an interrupt source group creator should also be the one serializing it for migrating/upgrading, it should have all the information it needs to do so (because it created it).

I guess we'll hit that issue with Cloud hypervisor soon, so by then we can propose to extend the trait accordingly. For now I'd leave those getters out of it.

@andreeaflorescu
Copy link

Hey, I am a bit busy on another thread now. I can take a look at this tomorrow or on Monday. Is that okay with you?

@sameo
Copy link
Owner Author

sameo commented Jan 23, 2020

Hey, I am a bit busy on another thread now. I can take a look at this tomorrow or on Monday. Is that okay with you?

Of course it's fine.

@andreeaflorescu
Copy link

  * `update()` takes a generic `InterruptSourceConfig` parameter, which is an enum. We're back to the discussion we had with @andreeaflorescu about having traits depend on enums and how that could potentially break crate users when we extend the enum. In practice I think that it's reasonable to say that supporting a new kind of interrupts (unknown to me at the moment) would deserve a new major version for that crate (or a new minor if we're still in 0.x).

My concern with having an enum was not necessarily related to breaking compatibility. I was more thinking about a situation in which you have a custom, non-rust-vmm implementation of InterruptSourceGroup. Say that that implementation is actually of type LegacyInterrupt, but the implementation is custom and cannot be pushed to rust-vmm.
The second problem I have with the enum is strips off the genericness of the interface because people can use this always as:

match type {
    Legacy => do_something(),
    MSI => do_something_else(),
}

Not having a type would force you to properly use generics and not actual types.

If we can't have another workaround for it, we should just pay extra attention into how this interface is used in rust-vmm crates.

Should we submit a PR to vm-device with the proposed changes?

@sameo
Copy link
Owner Author

sameo commented Jan 28, 2020

Should we submit a PR to vm-device with the proposed changes?

Let me update that PR here, and then we could morph @jiangliu 's pending PR into it? I find this would be less confusing.

@andreeaflorescu
Copy link

What does morph mean? :)))
I would open a new PR because the initial PR has a lot of comments and it might get super confusing while reviewing, not sure if that's what you mean.

Samuel Ortiz and others added 3 commits January 28, 2020 16:43
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Wrapped lines at 100 chars.
Added more details about the InterruptType.

Signed-off-by: Andreea Florescu <fandree@amazon.com>
In the InterruptSourceGroup trait the only method that is going to
be implemented for every interrupt type is the trigger method. We
can make it easier for simple VMMs (like the ones using only legacy
IRQs) to implement this trait by provided no-op implementation for
all the other trait methods.

Signed-off-by: Andreea Florescu <fandree@amazon.com>
andreeaflorescu and others added 2 commits January 28, 2020 16:43
Most methods in the traits need to mutate the state. The inital idea
was that save mutability can be achieved by wrapping the trait
implementers in a Mutex. I find this to be misleading when reading
the code. The proposal is to use mut where needed unless there is
a good reason not to.

Signed-off-by: Andreea Florescu <fandree@amazon.com>
We extend the trait to support the mask/unmask operations.
We also merge the MSI interrupt source group trait into the
main one, which leads to using interrupt source type and config
enums instead of type wrappers.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
@sameo
Copy link
Owner Author

sameo commented Jan 28, 2020

What does morph mean? :)))

Transform :-)

I would open a new PR because the initial PR has a lot of comments and it might get super confusing while reviewing, not sure if that's what you mean.

Fine with me, but then we should close @jiangliu PR to avoid having 2 PRs, from mostly the same people, about the same topic.

@sameo
Copy link
Owner Author

sameo commented Jan 28, 2020

I updated this current PR with the discussed/agreed upon modifications.

@andreeaflorescu
Copy link

@jiangliu are you okay with that? The problem is that PR already has 100 comments on it and they are not referring to the code we want to merge any more.

@jiangliu
Copy link

@jiangliu are you okay with that? The problem is that PR already has 100 comments on it and they are not referring to the code we want to merge any more.

Sure:)
Let me summarize it, the way to move forward is:

  1. close PR Add interrupt traits and a KVM based implementation rust-vmm/vm-device#11
  2. open another PR for clean review history
  3. get the new PR merged
  4. create another PR for changes from cloud hypervisor
    By this way we could get a clean history and move forward:)
    Is that correct?

@sameo
Copy link
Owner Author

sameo commented Jan 29, 2020

@jiangliu are you okay with that? The problem is that PR already has 100 comments on it and they are not referring to the code we want to merge any more.

Sure:)
Let me summarize it, the way to move forward is:

  1. close PR rust-vmm#11
  2. open another PR for clean review history
  3. get the new PR merged
  4. create another PR for changes from cloud hypervisor

This current PR (#1) already contains the changes from Cloud Hypervisor, this step would not be needed.

By this way we could get a clean history and move forward:)

We should squash this one and make it a single commit PR, if that's what you mean.

Is that correct?

That sounds fine to me, yes.

@andreeaflorescu
Copy link

Cool. Let's close the existing PR and open a new one with these changes. I'll try to get some people from my team to review it as well.

@jiangliu
Copy link

  * `update()` takes a generic `InterruptSourceConfig` parameter, which is an enum. We're back to the discussion we had with @andreeaflorescu about having traits depend on enums and how that could potentially break crate users when we extend the enum. In practice I think that it's reasonable to say that supporting a new kind of interrupts (unknown to me at the moment) would deserve a new major version for that crate (or a new minor if we're still in 0.x).

My concern with having an enum was not necessarily related to breaking compatibility. I was more thinking about a situation in which you have a custom, non-rust-vmm implementation of InterruptSourceGroup. Say that that implementation is actually of type LegacyInterrupt, but the implementation is custom and cannot be pushed to rust-vmm.
The second problem I have with the enum is strips off the genericness of the interface because people can use this always as:

match type {
    Legacy => do_something(),
    MSI => do_something_else(),
}

Not having a type would force you to properly use generics and not actual types.

If we can't have another workaround for it, we should just pay extra attention into how this interface is used in rust-vmm crates.

Should we submit a PR to vm-device with the proposed changes?

When supporting VFIO later, we may have a code snippet as

/// Type of interrupt source.
#[derive(Clone)]
pub enum InterruptSourceType {
    #[cfg(feature = "legacy_irq")]
    /// Legacy Pin-based Interrupt.
    /// On x86 platforms, legacy interrupts are routed through 8259 PICs and/or IOAPICs.
    LegacyIrq,
    #[cfg(feature = "pci_msi_irq")]
    /// Message Signaled Interrupt (PCI MSI/PCI MSIx).
    /// Some non-PCI devices (like HPET on x86) make use of generic MSI in platform specific ways.
    PciMsiIrq,
    #[cfg(feature = "vfio_msi_irq")]
    /// Message Signalled Interrupt for PCI MSI/PCI MSIx based VFIO devices.
    VfioMsiIrq(Arc<VfioDevice>, u32),
}

So it would be better to keep it as an enum so we could attach more data to Irq type.

@sameo
Copy link
Owner Author

sameo commented Jan 31, 2020

@jiangliu

So it would be better to keep it as an enum so we could attach more data to Irq type.

That's actually what we included in the last version of the proposal.
I squashed all our changes into one commit and opened PR #3 to make things clearer. See
https://github.com/sameo/vm-device/pull/3/files#diff-3d79bc3554f81f73e7f779e96b131729R69-R78

The idea, since I believe we all seem to agree with this last proposal, would be to replace rust-vmm#21 with the equivalent of PR #3. We should merge the interface first (based, again, on PR #3) and then discuss about about a KVM implementation through a follow up PR.

@sameo
Copy link
Owner Author

sameo commented Feb 6, 2020

PR #3 supersedes that one.

@sameo sameo closed this Feb 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants