Skip to content

Commit 4fbae16

Browse files
shayne-fletchermeta-codesync[bot]
authored andcommitted
config: docs (#2143)
Summary: Pull Request resolved: #2143 this diff primarily adds first-class documentation for Monarch's runtime configuration by introducing a documented `monarch.config` API and Sphinx reference material. to support documentation, type hints, and per-key docstrings, `configure()` is now wrapped at the Python layer instead of being re-exported directly from the Rust binding. the wrapper preserves the original behavior by forwarding only caller-supplied keys to the Rust `_configure(**kwargs)` function, using `None` as a sentinel to avoid implicitly re-setting defaults. runtime semantics are unchanged; this is a documentation-driven refactor that makes configuration discoverable and understandable without altering behavior. Reviewed By: mariusae Differential Revision: D89195352 fbshipit-source-id: 92d173a23b9f384854bd4c0267eb31079202440c
1 parent c06bfb0 commit 4fbae16

File tree

3 files changed

+431
-4
lines changed

3 files changed

+431
-4
lines changed

docs/source/api/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,5 +12,6 @@ The :doc:`monarch.rdma` package provides RDMA support for high-performance netwo
1212
:maxdepth: 2
1313

1414
monarch.actor
15+
monarch.config
1516
monarch
1617
monarch.rdma

docs/source/api/monarch.config.rst

Lines changed: 341 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,341 @@
1+
monarch.config
2+
==============
3+
4+
.. currentmodule:: monarch.config
5+
6+
The ``monarch.config`` module provides utilities for managing Monarch's
7+
runtime configuration.
8+
9+
Configuration values can be set programmatically via :func:`configure`
10+
or :func:`configured`, or through environment variables
11+
(``HYPERACTOR_*``, ``MONARCH_*``). Programmatic configuration takes
12+
precedence over environment variables and defaults.
13+
14+
Configuration API
15+
=================
16+
17+
``monarch.config`` exposes a small, process-wide API. All helpers talk to
18+
the same layered configuration store, so changes are immediately visible to
19+
every thread in the process.
20+
21+
``configure``
22+
Apply overrides to the Runtime layer. Values are validated eagerly; a
23+
``ValueError`` is raised for unknown keys and ``TypeError`` for wrong
24+
types. ``configure`` is additive, so you typically pair it with
25+
:func:`clear_runtime_config` in long-running processes.
26+
27+
``configured``
28+
Context manager sugar that snapshots the current Runtime layer,
29+
applies overrides, yields the merged config, then restores the snapshot.
30+
Because the Runtime layer is global, the overrides apply to every thread
31+
until the context exits. This makes ``configured`` ideal for tests or
32+
short-lived blocks where you can guarantee single-threaded execution.
33+
34+
``get_global_config``
35+
Return the fully merged configuration (defaults + environment + file +
36+
runtime). Useful for introspection or for passing a frozen view to other
37+
components.
38+
39+
``get_runtime_config``
40+
Return only the currently active Runtime layer. This is what ``configure``
41+
manipulates and what ``configured`` snapshots.
42+
43+
``clear_runtime_config``
44+
Reset the Runtime layer to an empty mapping. Environment and file values
45+
remain untouched.
46+
47+
.. autofunction:: configure
48+
49+
.. autofunction:: configured
50+
51+
.. autofunction:: get_global_config
52+
53+
.. autofunction:: get_runtime_config
54+
55+
.. autofunction:: clear_runtime_config
56+
57+
58+
Configuration Keys
59+
==================
60+
61+
The following configuration keys are available for use with
62+
:func:`configure` and :func:`configured`:
63+
64+
Performance and Transport
65+
--------------------------
66+
67+
``codec_max_frame_length``
68+
Maximum frame length for message codec (in bytes).
69+
70+
- **Type**: ``int``
71+
- **Default**: ``10 * 1024 * 1024 * 1024`` (10 GiB)
72+
- **Environment**: ``HYPERACTOR_CODEC_MAX_FRAME_LENGTH``
73+
74+
Controls the maximum size of serialized messages. Exceeding this limit
75+
will cause supervision errors.
76+
77+
.. code-block:: python
78+
79+
from monarch.config import configured
80+
81+
# Allow larger messages for bulk data transfer
82+
oneHundredGiB = 100 * 1024 * 1024 * 1024
83+
with configured(codec_max_frame_length=oneHundredGiB):
84+
# Send large chunks
85+
result = actor.process_chunks.call_one(large_data).get()
86+
87+
``default_transport``
88+
Default channel transport mechanism for inter-actor communication.
89+
90+
- **Type**: ``ChannelTransport`` enum
91+
- **Default**: ``ChannelTransport.Unix``
92+
- **Environment**: ``HYPERACTOR_DEFAULT_TRANSPORT``
93+
94+
Available transports:
95+
96+
- ``ChannelTransport.Unix`` - Unix domain sockets (local only)
97+
- ``ChannelTransport.TcpWithLocalhost`` - TCP over localhost
98+
- ``ChannelTransport.TcpWithHostname`` - TCP with hostname resolution
99+
- ``ChannelTransport.MetaTlsWithHostname`` - Meta TLS (Meta internal only)
100+
101+
.. code-block:: python
102+
103+
from monarch._rust_bindings.monarch_hyperactor.channel import (
104+
ChannelTransport,
105+
)
106+
from monarch.config import configured
107+
108+
with configured(default_transport=ChannelTransport.TcpWithLocalhost):
109+
# Actors will communicate via TCP
110+
mesh = this_host().spawn_procs(per_host={"workers": 4})
111+
112+
113+
Timeouts
114+
--------
115+
116+
``message_delivery_timeout``
117+
Maximum time to wait for message delivery before timing out.
118+
119+
- **Type**: ``str`` (duration format, e.g., ``"30s"``, ``"5m"``)
120+
- **Default**: ``"30s"``
121+
- **Environment**: ``HYPERACTOR_MESSAGE_DELIVERY_TIMEOUT``
122+
123+
Uses `humantime <https://docs.rs/humantime/latest/humantime/>`_ format.
124+
Examples: ``"30s"``, ``"5m"``, ``"1h 30m"``.
125+
126+
.. code-block:: python
127+
128+
from monarch.config import configured
129+
130+
# Increase timeout for slow operations
131+
with configured(message_delivery_timeout="5m"):
132+
result = slow_actor.heavy_computation.call_one().get()
133+
134+
``host_spawn_ready_timeout``
135+
Maximum time to wait for spawned hosts to become ready.
136+
137+
- **Type**: ``str`` (duration format)
138+
- **Default**: ``"30s"``
139+
- **Environment**: ``HYPERACTOR_HOST_SPAWN_READY_TIMEOUT``
140+
141+
.. code-block:: python
142+
143+
from monarch.config import configured
144+
145+
# Allow more time for remote host allocation
146+
with configured(host_spawn_ready_timeout="5m"):
147+
hosts = HostMesh.allocate(...)
148+
149+
``mesh_proc_spawn_max_idle``
150+
Maximum idle time between status updates while spawning processes in a
151+
mesh.
152+
153+
- **Type**: ``str`` (duration format)
154+
- **Default**: ``"30s"``
155+
- **Environment**: ``HYPERACTOR_MESH_PROC_SPAWN_MAX_IDLE``
156+
157+
During proc mesh spawning, each process being created sends status
158+
updates to the controller. If no update arrives within this timeout, the
159+
spawn operation fails. This prevents hung or stuck process creation from
160+
waiting indefinitely.
161+
162+
163+
Logging
164+
-------
165+
166+
``enable_log_forwarding``
167+
Enable forwarding child process stdout/stderr over the mesh log channel.
168+
169+
- **Type**: ``bool``
170+
- **Default**: ``False``
171+
- **Environment**: ``HYPERACTOR_MESH_ENABLE_LOG_FORWARDING``
172+
173+
When ``True``, child process output is forwarded to ``LogForwardActor``
174+
for centralized logging.
175+
When ``False``, child processes inherit parent stdio.
176+
177+
.. code-block:: python
178+
179+
from monarch.config import configured
180+
181+
with configured(enable_log_forwarding=True):
182+
# Child process logs will be forwarded
183+
mesh = this_host().spawn_procs(per_host={"workers": 4})
184+
185+
``enable_file_capture``
186+
Enable capturing child process output to log files on disk.
187+
188+
- **Type**: ``bool``
189+
- **Default**: ``False``
190+
- **Environment**: ``HYPERACTOR_MESH_ENABLE_FILE_CAPTURE``
191+
192+
When ``True``, child process output is written to host-scoped log
193+
files. Can be combined with ``enable_log_forwarding`` for both
194+
streaming and persistent logs.
195+
196+
``tail_log_lines``
197+
Number of recent log lines to retain in memory per process.
198+
199+
- **Type**: ``int``
200+
- **Default**: ``0``
201+
- **Environment**: ``HYPERACTOR_MESH_TAIL_LOG_LINES``
202+
203+
Maintains a rotating in-memory buffer of the most recent log lines for
204+
debugging.
205+
Independent of file capture.
206+
207+
.. code-block:: python
208+
209+
from monarch.config import configured
210+
211+
# Keep last 100 lines for debugging
212+
with configured(tail_log_lines=100):
213+
mesh = this_host().spawn_procs(per_host={"workers": 4})
214+
215+
Validation and Error Handling
216+
-----------------------------
217+
218+
``configure`` and ``configured`` validate input immediately:
219+
220+
* Unknown keys raise ``ValueError``.
221+
* Type mismatches raise ``TypeError`` (for example, passing a string instead
222+
of ``ChannelTransport`` for ``default_transport`` or a non-bool to logging
223+
flags).
224+
* Duration strings must follow
225+
`humantime <https://docs.rs/humantime/latest/humantime/>`_ syntax;
226+
invalid strings or non-string values trigger ``TypeError`` with a message
227+
that highlights the bad value.
228+
229+
Normalization
230+
~~~~~~~~~~~~~
231+
232+
Duration values are normalized when read from :func:`get_global_config`. For
233+
instance, setting ``host_spawn_ready_timeout="300s"`` yields ``"5m"`` when you
234+
read it back. This matches the behavior exercised in
235+
``monarch/python/tests/test_config.py`` and helps keep logs and telemetry
236+
consistent.
237+
238+
239+
Examples
240+
========
241+
242+
Basic Configuration
243+
-------------------
244+
245+
.. code-block:: python
246+
247+
from monarch.config import configure, get_global_config
248+
249+
# Set configuration values
250+
configure(enable_log_forwarding=True, tail_log_lines=100)
251+
252+
# Read current configuration
253+
config = get_global_config()
254+
print(config["enable_log_forwarding"]) # True
255+
print(config["tail_log_lines"]) # 100
256+
257+
258+
Temporary Configuration (Testing)
259+
----------------------------------
260+
261+
.. code-block:: python
262+
263+
from monarch.config import configured
264+
265+
def test_with_custom_config():
266+
# Configuration is scoped to this context
267+
with configured(
268+
enable_log_forwarding=True,
269+
message_delivery_timeout="1m"
270+
) as config:
271+
# Config is active here
272+
assert config["enable_log_forwarding"] is True
273+
274+
# Config is automatically restored after the context
275+
276+
277+
Nested Overrides
278+
----------------
279+
280+
.. code-block:: python
281+
282+
from monarch.config import configured
283+
284+
with configured(default_transport=ChannelTransport.TcpWithLocalhost):
285+
# Inner config overrides logging knobs only; default_transport
286+
# stays put.
287+
with configured(
288+
enable_log_forwarding=True,
289+
tail_log_lines=50,
290+
) as config:
291+
assert (
292+
config["default_transport"]
293+
== ChannelTransport.TcpWithLocalhost
294+
)
295+
assert config["enable_log_forwarding"]
296+
297+
# After both contexts exit the process is back to the previous settings.
298+
299+
300+
Duration Formats
301+
----------------
302+
303+
.. code-block:: python
304+
305+
from monarch.config import configured
306+
307+
# Various duration formats are supported
308+
with configured(
309+
message_delivery_timeout="90s", # 1m 30s
310+
host_spawn_ready_timeout="5m", # 5 minutes
311+
mesh_proc_spawn_max_idle="1h 30m", # 1 hour 30 minutes
312+
):
313+
# Timeouts are active
314+
pass
315+
316+
317+
Environment Variable Override
318+
------------------------------
319+
320+
Configuration can also be set via environment variables:
321+
322+
.. code-block:: bash
323+
324+
# Set codec max frame length to 100 GiB
325+
export HYPERACTOR_CODEC_MAX_FRAME_LENGTH=107374182400
326+
327+
# Enable log forwarding
328+
export HYPERACTOR_MESH_ENABLE_LOG_FORWARDING=true
329+
330+
# Set message delivery timeout to 5 minutes
331+
export HYPERACTOR_MESSAGE_DELIVERY_TIMEOUT=5m
332+
333+
Environment variables are read during initialization and can be overridden
334+
programmatically.
335+
336+
337+
See Also
338+
========
339+
340+
- :doc:`../generated/examples/getting_started` - Getting started guide
341+
- :doc:`monarch.actor` - Actor API documentation

0 commit comments

Comments
 (0)