Skip to content

Conversation

@leszko
Copy link
Collaborator

@leszko leszko commented Jan 22, 2026

Summary

After reviewing recent PRs (#368 and #312), it’s clear the current Pipeline interface no longer matches real usage. This PR updates the interface to support existing and emerging pipeline patterns.

Issues

  1. Pipeline.__call__() accepts an input parameter that is unused by all pipelines.
  2. Pipeline.__call__() returns a single torch.Tensor, which does not support:
    • Multi-output pipelines (e.g. separate mask outputs)
    • Pipelines that also return audio (e.g. LTX-2)
  3. Pipeline requirements cannot adapt to frame counts produced by upstream pipelines.

Changes

  • Remove the unused input argument from Pipeline.__call__()
  • Change pipeline outputs to return a dict
    • The main video output is available under the "video" key
  • Add auto_input_size to pipeline requirements to support dynamic frame counts

Compatibility

Plugins are not yet published, so this change is intentionally not backward-compatible. We could make it backward-compatible, but I think it's not worth it. Better to update all existing pipelines.

@leszko leszko force-pushed the rafal/update-pipeline-interface branch 2 times, most recently from 1b7dc4e to 3fb8d72 Compare January 22, 2026 12:04
call_params["video"] = video_input

output = self.pipeline(**call_params)
output_dict = self.pipeline(**call_params)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we extract other keys and forward them to handle processors that produce auxiliary outputs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should actually pass the whole dict into the next processor.

But... I'd prefer to do it in a separate PR to keep this PR dedicated to just the interface change.

One thing to think about is how to pass data between pipeline processors.

  • For video, we currently extract individual frames and add then to the output_queue
  • If we want to pass all outputs, that poses a question how to pass them:
    • One option is what you did here, so treat "video" the same as before and all other params just set as parameter
    • More correct and long-term option would be to queue the whole "dict", but then we would need to rethink how the queuing and waiting works, because the each pipeline can produce / consume different n of frames

Copy link
Contributor

@yondonfu yondonfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I think this is a step in the right direction. A proper mapping of inputs <> outputs in a pipeline chain likely would benefit from some sort of binding (eg which input binds to which output) and typing system. But, we don't need to solve for that right now and can let any requirements for that emerge with use cases.

Signed-off-by: Rafal Leszko <rafal@livepeer.org>
@leszko leszko force-pushed the rafal/update-pipeline-interface branch from 3fb8d72 to 71cd5bf Compare January 26, 2026 09:18
@leszko leszko merged commit 25bae95 into main Jan 26, 2026
5 checks passed
@yondonfu yondonfu deleted the rafal/update-pipeline-interface branch January 26, 2026 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants