Skip to content

Conversation

@pzhan9
Copy link
Contributor

@pzhan9 pzhan9 commented Dec 16, 2025

Differential Revision: D89195085

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 16, 2025
@meta-codesync
Copy link

meta-codesync bot commented Dec 16, 2025

@pzhan9 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D89195085.

Summary:


In the Lightning integration, there is a need that we want to explicitly to config the client and worker hosts to:

1. use public IP address and an explicit port as the `Host`'s frontend address;
2. bind the port to INADDR_ANY, since AWS does not allow the port being bound to the public IP address.

In order to enable this configuration, this diff adds a new variant, `Alias` to channel. This variant provides 2 fields:

1. dial_to, which can be used by Lightning to set as the public IP address.
2. bind_to, which is used to specify how to bind the port.

It is Lightning's responsibility to ensure the network is configured in a way that the packages sent to `dial_to` would be routed to `bind_to`.

Reviewed By: mariusae

Differential Revision: D89190085
Summary:


As explained in D89190085, Lightning wants to explicitly set the frontend address both client and worker hosts. Since the client's frontend address is controlled through `enable_transport`, this diff adds a new enum BindSpec, and plumb it to `enable_transport`, so the user can pass an explicit address string to do that.

Differential Revision: D89190087
Summary:

This builds on the previous change to implement a *real* in-process host and local proc mesh. Thus, nothing is fake anymore in the v1 path: this_proc() behaves the same everywhere, and is managed in precisely the same way.

This also means that we can do things like hand out references to the local host (or proc), and have remove actors spawn (procs, actors) on it!

It will also simplify integration: since the client proc is now a host, it has a single front-end address that multiplexes all its procs (including the local one).

We keep around the old codepaths for the benefit of v0; this can be significantly simplified again once we drop v0 support.

Differential Revision: D89196041
Summary: Pull Request resolved: meta-pytorch#2155

Differential Revision: D89195085
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant