|
| 1 | +monarch.config |
| 2 | +============== |
| 3 | + |
| 4 | +.. currentmodule:: monarch.config |
| 5 | + |
| 6 | +The ``monarch.config`` module provides utilities for managing Monarch's |
| 7 | +runtime configuration. |
| 8 | + |
| 9 | +Configuration values can be set programmatically via :func:`configure` |
| 10 | +or :func:`configured`, or through environment variables |
| 11 | +(``HYPERACTOR_*``, ``MONARCH_*``). Programmatic configuration takes |
| 12 | +precedence over environment variables and defaults. |
| 13 | + |
| 14 | +Configuration API |
| 15 | +================= |
| 16 | + |
| 17 | +``monarch.config`` exposes a small, process-wide API. All helpers talk to |
| 18 | +the same layered configuration store, so changes are immediately visible to |
| 19 | +every thread in the process. |
| 20 | + |
| 21 | +``configure`` |
| 22 | + Apply overrides to the Runtime layer. Values are validated eagerly; a |
| 23 | + ``ValueError`` is raised for unknown keys and ``TypeError`` for wrong |
| 24 | + types. ``configure`` is additive, so you typically pair it with |
| 25 | + :func:`clear_runtime_config` in long-running processes. |
| 26 | + |
| 27 | +``configured`` |
| 28 | + Context manager sugar that snapshots the current Runtime layer, |
| 29 | + applies overrides, yields the merged config, then restores the snapshot. |
| 30 | + Because the Runtime layer is global, the overrides apply to every thread |
| 31 | + until the context exits. This makes ``configured`` ideal for tests or |
| 32 | + short-lived blocks where you can guarantee single-threaded execution. |
| 33 | + |
| 34 | +``get_global_config`` |
| 35 | + Return the fully merged configuration (defaults + environment + file + |
| 36 | + runtime). Useful for introspection or for passing a frozen view to other |
| 37 | + components. |
| 38 | + |
| 39 | +``get_runtime_config`` |
| 40 | + Return only the currently active Runtime layer. This is what ``configure`` |
| 41 | + manipulates and what ``configured`` snapshots. |
| 42 | + |
| 43 | +``clear_runtime_config`` |
| 44 | + Reset the Runtime layer to an empty mapping. Environment and file values |
| 45 | + remain untouched. |
| 46 | + |
| 47 | +.. autofunction:: configure |
| 48 | + |
| 49 | +.. autofunction:: configured |
| 50 | + |
| 51 | +.. autofunction:: get_global_config |
| 52 | + |
| 53 | +.. autofunction:: get_runtime_config |
| 54 | + |
| 55 | +.. autofunction:: clear_runtime_config |
| 56 | + |
| 57 | + |
| 58 | +Configuration Keys |
| 59 | +================== |
| 60 | + |
| 61 | +The following configuration keys are available for use with |
| 62 | +:func:`configure` and :func:`configured`: |
| 63 | + |
| 64 | +Performance and Transport |
| 65 | +-------------------------- |
| 66 | + |
| 67 | +``codec_max_frame_length`` |
| 68 | + Maximum frame length for message codec (in bytes). |
| 69 | + |
| 70 | + - **Type**: ``int`` |
| 71 | + - **Default**: ``10 * 1024 * 1024 * 1024`` (10 GiB) |
| 72 | + - **Environment**: ``HYPERACTOR_CODEC_MAX_FRAME_LENGTH`` |
| 73 | + |
| 74 | + Controls the maximum size of serialized messages. Exceeding this limit |
| 75 | + will cause supervision errors. |
| 76 | + |
| 77 | + .. code-block:: python |
| 78 | +
|
| 79 | + from monarch.config import configured |
| 80 | +
|
| 81 | + # Allow larger messages for bulk data transfer |
| 82 | + oneHundredGiB = 100 * 1024 * 1024 * 1024 |
| 83 | + with configured(codec_max_frame_length=oneHundredGiB): |
| 84 | + # Send large chunks |
| 85 | + result = actor.process_chunks.call_one(large_data).get() |
| 86 | +
|
| 87 | +``default_transport`` |
| 88 | + Default channel transport mechanism for inter-actor communication. |
| 89 | + |
| 90 | + - **Type**: ``ChannelTransport`` enum |
| 91 | + - **Default**: ``ChannelTransport.Unix`` |
| 92 | + - **Environment**: ``HYPERACTOR_DEFAULT_TRANSPORT`` |
| 93 | + |
| 94 | + Available transports: |
| 95 | + |
| 96 | + - ``ChannelTransport.Unix`` - Unix domain sockets (local only) |
| 97 | + - ``ChannelTransport.TcpWithLocalhost`` - TCP over localhost |
| 98 | + - ``ChannelTransport.TcpWithHostname`` - TCP with hostname resolution |
| 99 | + - ``ChannelTransport.MetaTlsWithHostname`` - Meta TLS (Meta internal only) |
| 100 | + |
| 101 | + .. code-block:: python |
| 102 | +
|
| 103 | + from monarch._rust_bindings.monarch_hyperactor.channel import ( |
| 104 | + ChannelTransport, |
| 105 | + ) |
| 106 | + from monarch.config import configured |
| 107 | +
|
| 108 | + with configured(default_transport=ChannelTransport.TcpWithLocalhost): |
| 109 | + # Actors will communicate via TCP |
| 110 | + mesh = this_host().spawn_procs(per_host={"workers": 4}) |
| 111 | +
|
| 112 | +
|
| 113 | +Timeouts |
| 114 | +-------- |
| 115 | + |
| 116 | +``message_delivery_timeout`` |
| 117 | + Maximum time to wait for message delivery before timing out. |
| 118 | + |
| 119 | + - **Type**: ``str`` (duration format, e.g., ``"30s"``, ``"5m"``) |
| 120 | + - **Default**: ``"30s"`` |
| 121 | + - **Environment**: ``HYPERACTOR_MESSAGE_DELIVERY_TIMEOUT`` |
| 122 | + |
| 123 | + Uses `humantime <https://docs.rs/humantime/latest/humantime/>`_ format. |
| 124 | + Examples: ``"30s"``, ``"5m"``, ``"1h 30m"``. |
| 125 | + |
| 126 | + .. code-block:: python |
| 127 | +
|
| 128 | + from monarch.config import configured |
| 129 | +
|
| 130 | + # Increase timeout for slow operations |
| 131 | + with configured(message_delivery_timeout="5m"): |
| 132 | + result = slow_actor.heavy_computation.call_one().get() |
| 133 | +
|
| 134 | +``host_spawn_ready_timeout`` |
| 135 | + Maximum time to wait for spawned hosts to become ready. |
| 136 | + |
| 137 | + - **Type**: ``str`` (duration format) |
| 138 | + - **Default**: ``"30s"`` |
| 139 | + - **Environment**: ``HYPERACTOR_HOST_SPAWN_READY_TIMEOUT`` |
| 140 | + |
| 141 | + .. code-block:: python |
| 142 | +
|
| 143 | + from monarch.config import configured |
| 144 | +
|
| 145 | + # Allow more time for remote host allocation |
| 146 | + with configured(host_spawn_ready_timeout="5m"): |
| 147 | + hosts = HostMesh.allocate(...) |
| 148 | +
|
| 149 | +``mesh_proc_spawn_max_idle`` |
| 150 | + Maximum idle time between status updates while spawning processes in a |
| 151 | + mesh. |
| 152 | + |
| 153 | + - **Type**: ``str`` (duration format) |
| 154 | + - **Default**: ``"30s"`` |
| 155 | + - **Environment**: ``HYPERACTOR_MESH_PROC_SPAWN_MAX_IDLE`` |
| 156 | + |
| 157 | + During proc mesh spawning, each process being created sends status |
| 158 | + updates to the controller. If no update arrives within this timeout, the |
| 159 | + spawn operation fails. This prevents hung or stuck process creation from |
| 160 | + waiting indefinitely. |
| 161 | + |
| 162 | + |
| 163 | +Logging |
| 164 | +------- |
| 165 | + |
| 166 | +``enable_log_forwarding`` |
| 167 | + Enable forwarding child process stdout/stderr over the mesh log channel. |
| 168 | + |
| 169 | + - **Type**: ``bool`` |
| 170 | + - **Default**: ``False`` |
| 171 | + - **Environment**: ``HYPERACTOR_MESH_ENABLE_LOG_FORWARDING`` |
| 172 | + |
| 173 | + When ``True``, child process output is forwarded to ``LogForwardActor`` |
| 174 | + for centralized logging. |
| 175 | + When ``False``, child processes inherit parent stdio. |
| 176 | + |
| 177 | + .. code-block:: python |
| 178 | +
|
| 179 | + from monarch.config import configured |
| 180 | +
|
| 181 | + with configured(enable_log_forwarding=True): |
| 182 | + # Child process logs will be forwarded |
| 183 | + mesh = this_host().spawn_procs(per_host={"workers": 4}) |
| 184 | +
|
| 185 | +``enable_file_capture`` |
| 186 | + Enable capturing child process output to log files on disk. |
| 187 | + |
| 188 | + - **Type**: ``bool`` |
| 189 | + - **Default**: ``False`` |
| 190 | + - **Environment**: ``HYPERACTOR_MESH_ENABLE_FILE_CAPTURE`` |
| 191 | + |
| 192 | + When ``True``, child process output is written to host-scoped log |
| 193 | + files. Can be combined with ``enable_log_forwarding`` for both |
| 194 | + streaming and persistent logs. |
| 195 | + |
| 196 | +``tail_log_lines`` |
| 197 | + Number of recent log lines to retain in memory per process. |
| 198 | + |
| 199 | + - **Type**: ``int`` |
| 200 | + - **Default**: ``0`` |
| 201 | + - **Environment**: ``HYPERACTOR_MESH_TAIL_LOG_LINES`` |
| 202 | + |
| 203 | + Maintains a rotating in-memory buffer of the most recent log lines for |
| 204 | + debugging. |
| 205 | + Independent of file capture. |
| 206 | + |
| 207 | + .. code-block:: python |
| 208 | +
|
| 209 | + from monarch.config import configured |
| 210 | +
|
| 211 | + # Keep last 100 lines for debugging |
| 212 | + with configured(tail_log_lines=100): |
| 213 | + mesh = this_host().spawn_procs(per_host={"workers": 4}) |
| 214 | +
|
| 215 | +Validation and Error Handling |
| 216 | +----------------------------- |
| 217 | + |
| 218 | +``configure`` and ``configured`` validate input immediately: |
| 219 | + |
| 220 | +* Unknown keys raise ``ValueError``. |
| 221 | +* Type mismatches raise ``TypeError`` (for example, passing a string instead |
| 222 | + of ``ChannelTransport`` for ``default_transport`` or a non-bool to logging |
| 223 | + flags). |
| 224 | +* Duration strings must follow |
| 225 | + `humantime <https://docs.rs/humantime/latest/humantime/>`_ syntax; |
| 226 | + invalid strings or non-string values trigger ``TypeError`` with a message |
| 227 | + that highlights the bad value. |
| 228 | + |
| 229 | +Normalization |
| 230 | +~~~~~~~~~~~~~ |
| 231 | + |
| 232 | +Duration values are normalized when read from :func:`get_global_config`. For |
| 233 | +instance, setting ``host_spawn_ready_timeout="300s"`` yields ``"5m"`` when you |
| 234 | +read it back. This matches the behavior exercised in |
| 235 | +``monarch/python/tests/test_config.py`` and helps keep logs and telemetry |
| 236 | +consistent. |
| 237 | + |
| 238 | + |
| 239 | +Examples |
| 240 | +======== |
| 241 | + |
| 242 | +Basic Configuration |
| 243 | +------------------- |
| 244 | + |
| 245 | +.. code-block:: python |
| 246 | +
|
| 247 | + from monarch.config import configure, get_global_config |
| 248 | +
|
| 249 | + # Set configuration values |
| 250 | + configure(enable_log_forwarding=True, tail_log_lines=100) |
| 251 | +
|
| 252 | + # Read current configuration |
| 253 | + config = get_global_config() |
| 254 | + print(config["enable_log_forwarding"]) # True |
| 255 | + print(config["tail_log_lines"]) # 100 |
| 256 | +
|
| 257 | +
|
| 258 | +Temporary Configuration (Testing) |
| 259 | +---------------------------------- |
| 260 | + |
| 261 | +.. code-block:: python |
| 262 | +
|
| 263 | + from monarch.config import configured |
| 264 | +
|
| 265 | + def test_with_custom_config(): |
| 266 | + # Configuration is scoped to this context |
| 267 | + with configured( |
| 268 | + enable_log_forwarding=True, |
| 269 | + message_delivery_timeout="1m" |
| 270 | + ) as config: |
| 271 | + # Config is active here |
| 272 | + assert config["enable_log_forwarding"] is True |
| 273 | +
|
| 274 | + # Config is automatically restored after the context |
| 275 | +
|
| 276 | +
|
| 277 | +Nested Overrides |
| 278 | +---------------- |
| 279 | + |
| 280 | +.. code-block:: python |
| 281 | +
|
| 282 | + from monarch.config import configured |
| 283 | +
|
| 284 | + with configured(default_transport=ChannelTransport.TcpWithLocalhost): |
| 285 | + # Inner config overrides logging knobs only; default_transport |
| 286 | + # stays put. |
| 287 | + with configured( |
| 288 | + enable_log_forwarding=True, |
| 289 | + tail_log_lines=50, |
| 290 | + ) as config: |
| 291 | + assert ( |
| 292 | + config["default_transport"] |
| 293 | + == ChannelTransport.TcpWithLocalhost |
| 294 | + ) |
| 295 | + assert config["enable_log_forwarding"] |
| 296 | +
|
| 297 | + # After both contexts exit the process is back to the previous settings. |
| 298 | +
|
| 299 | +
|
| 300 | +Duration Formats |
| 301 | +---------------- |
| 302 | + |
| 303 | +.. code-block:: python |
| 304 | +
|
| 305 | + from monarch.config import configured |
| 306 | +
|
| 307 | + # Various duration formats are supported |
| 308 | + with configured( |
| 309 | + message_delivery_timeout="90s", # 1m 30s |
| 310 | + host_spawn_ready_timeout="5m", # 5 minutes |
| 311 | + mesh_proc_spawn_max_idle="1h 30m", # 1 hour 30 minutes |
| 312 | + ): |
| 313 | + # Timeouts are active |
| 314 | + pass |
| 315 | +
|
| 316 | +
|
| 317 | +Environment Variable Override |
| 318 | +------------------------------ |
| 319 | + |
| 320 | +Configuration can also be set via environment variables: |
| 321 | + |
| 322 | +.. code-block:: bash |
| 323 | +
|
| 324 | + # Set codec max frame length to 100 GiB |
| 325 | + export HYPERACTOR_CODEC_MAX_FRAME_LENGTH=107374182400 |
| 326 | +
|
| 327 | + # Enable log forwarding |
| 328 | + export HYPERACTOR_MESH_ENABLE_LOG_FORWARDING=true |
| 329 | +
|
| 330 | + # Set message delivery timeout to 5 minutes |
| 331 | + export HYPERACTOR_MESSAGE_DELIVERY_TIMEOUT=5m |
| 332 | +
|
| 333 | +Environment variables are read during initialization and can be overridden |
| 334 | +programmatically. |
| 335 | + |
| 336 | + |
| 337 | +See Also |
| 338 | +======== |
| 339 | + |
| 340 | +- :doc:`../generated/examples/getting_started` - Getting started guide |
| 341 | +- :doc:`monarch.actor` - Actor API documentation |
0 commit comments