Skip to content

Conversation

@quinntaylormitchell
Copy link
Collaborator

@quinntaylormitchell quinntaylormitchell commented Nov 25, 2025

Description

This PR adds a system that assigns a port to each component of the clp package. The intent is to avoid port conflicts on package startup.

--clp-base-port is an optional CLI flag which has a default value of 55000 if it is not specified. A port range is constructed starting at that base port and stretching to accommodate the maximum number of ports needed by the CLP package described in a given ClpConfig object. The system checks whether or not each port within this range is available, and if they are all available, the system assigns ports to the components appropriately.

PR dependencies

This PR depends on the following being merged:

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Ran the following command from clp/integration-tests:

uv run pyetst -m 'package'

Both the clp-json and clp-text tests passed.

Summary by CodeRabbit

  • New Features

    • Added a CLI option --clp-base-port (default: 55000) for integration tests.
    • Automatic assignment and recording of service ports derived from the chosen base port.
    • New port-assignment utilities to allocate and verify consecutive port ranges.
  • Bug Fixes

    • Clearer startup error messages indicating port conflicts and how to override the base port.
    • Validation and descriptive errors for invalid or unavailable base-port ranges.

✏️ Tip: You can customize this high-level summary in your review settings.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
integration-tests/tests/utils/port_utils.py (1)

7-7: Off-by-one: privileged ports include 1023.

Privileged (well-known) ports are 0–1023, so the first non-privileged port is 1024. With the current value of 1023 and the < comparison on line 67, a base_port of 1023 would incorrectly be accepted.

-MIN_NON_PRIVILEGED_PORT = 1023
+MIN_NON_PRIVILEGED_PORT = 1024
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8fe9dea and c452624.

📒 Files selected for processing (5)
  • integration-tests/tests/conftest.py (1 hunks)
  • integration-tests/tests/fixtures/package_config.py (2 hunks)
  • integration-tests/tests/fixtures/package_instance.py (1 hunks)
  • integration-tests/tests/utils/config.py (1 hunks)
  • integration-tests/tests/utils/port_utils.py (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.

Applied to files:

  • integration-tests/tests/fixtures/package_instance.py
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
🧬 Code graph analysis (3)
integration-tests/tests/fixtures/package_config.py (4)
integration-tests/tests/utils/port_utils.py (1)
  • assign_ports_from_base (11-57)
integration-tests/tests/utils/clp_mode_utils.py (1)
  • get_clp_config_from_mode (29-40)
integration-tests/tests/utils/config.py (1)
  • PackageConfig (111-144)
integration-tests/tests/fixtures/path_configs.py (1)
  • fixt_package_path_config (28-35)
integration-tests/tests/fixtures/package_instance.py (1)
integration-tests/tests/fixtures/package_config.py (1)
  • fixt_package_config (13-49)
integration-tests/tests/utils/port_utils.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • ClpConfig (713-963)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: package-image
  • GitHub Check: build (macos-15)
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (5)
integration-tests/tests/conftest.py (1)

14-25: LGTM!

The pytest_addoption hook is correctly implemented to register the --clp-base-port CLI option. The default value of "55000" falls within the dynamic/private port range (49152–65535), which is appropriate for avoiding conflicts with well-known and registered ports.

integration-tests/tests/utils/port_utils.py (2)

22-57: LGTM!

The port assignment logic correctly reserves fixed offsets for all services by incrementing current_port unconditionally, ensuring deterministic port mappings regardless of which optional services are enabled.


60-75: LGTM!

The validation logic and error messages are clear and provide actionable feedback when the port range is invalid.

integration-tests/tests/utils/config.py (1)

123-124: LGTM!

The new base_port field is appropriately added to the PackageConfig dataclass with a clear docstring comment.

integration-tests/tests/fixtures/package_config.py (1)

29-43: LGTM!

The base port handling is well-implemented with proper error handling for invalid CLI input. The ValueError propagation from assign_ports_from_base (when the port range is invalid) is also correctly documented in the docstring.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c452624 and 35d1033.

📒 Files selected for processing (2)
  • integration-tests/tests/fixtures/package_instance.py (1 hunks)
  • integration-tests/tests/utils/port_utils.py (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.

Applied to files:

  • integration-tests/tests/fixtures/package_instance.py
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
🧬 Code graph analysis (2)
integration-tests/tests/fixtures/package_instance.py (1)
integration-tests/tests/fixtures/package_config.py (1)
  • fixt_package_config (13-49)
integration-tests/tests/utils/port_utils.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • ClpConfig (713-963)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (2)
integration-tests/tests/fixtures/package_instance.py (1)

32-37: Preserve original RuntimeError details in the failure message

Right now the RuntimeError is swallowed and only the hint about base_port is shown, which can make non‑port issues harder to diagnose. Consider capturing the exception and including its message alongside your helpful guidance.

[ suggest_recommended_refactor ]
[ request_verification ]

-    except RuntimeError:
-        base_port = fixt_package_config.base_port
-        pytest.fail(
-            f"Failed to start the {mode_name} package. This could mean that one of the ports"
-            f" derived from base_port={base_port} was unavailable. You can specify a new value for"
-            " base_port with the '--clp-base-port' flag."
-        )
+    except RuntimeError as err:
+        base_port = fixt_package_config.base_port
+        pytest.fail(
+            f"Failed to start the {mode_name} package. This could mean that one of the ports"
+            f" derived from base_port={base_port} was unavailable. You can specify a new value for"
+            " base_port with the '--clp-base-port' flag. "
+            f"Original error: {err}"
+        )

Please verify this still produces readable pytest output for typical failure cases under pytest 8.4.1.

integration-tests/tests/utils/port_utils.py (1)

60-75: Port-range validation logic and messaging look solid.

The checks against MIN_NON_PRIVILEGED_PORT and MAX_PORT correctly enforce the valid range for the entire block of MAX_REQUIRED_PORTS ports, and the error messages are clear and actionable.

Comment on lines 5 to 6
# MAX_REQUIRED_PORTS is equal to the max number of CLP package components that need a port.
MAX_REQUIRED_PORTS = 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# MAX_REQUIRED_PORTS is equal to the max number of CLP package components that need a port.
MAX_REQUIRED_PORTS = 10
CLP_PACKAGE_PORTS: list[tuple[str, str]] = [
("database", "port"),
("queue", "port"),
("redis", "port"),
("reducer", "base_port"),
("results_cache", "port"),
("query_scheduler", "port"),
("webui", "port"),
# Optional services
("api_server", "port"),
("mcp_server", "port"),
("presto", "port"),
]
# Maximum number of ports that may be consumed by the CLP package.
MAX_REQUIRED_PORTS = len(CLP_PACKAGE_PORTS)

Since Junhao had comments about how this number is derived, I think this is a good opportunity to change the current brute force way of iterating over ports.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of avoiding brute force; If we wait for #1658 to be merged, we could just operate on the list of required components for any given package_config, rather than making (and then having to maintain) a list like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works too, though I don't know where we can account for the path to the port attribute.

Copy link
Collaborator Author

@quinntaylormitchell quinntaylormitchell Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can figure out how to do that; if it ends up being to complicated, I'll list them like you did in your suggestion

Comment on lines 23 to 57

clp_config.database.port = current_port
current_port += 1

clp_config.queue.port = current_port
current_port += 1

clp_config.redis.port = current_port
current_port += 1

# Reducer binds to a range of ports starting from a `base_port`.
clp_config.reducer.base_port = current_port
current_port += 1

clp_config.results_cache.port = current_port
current_port += 1

clp_config.query_scheduler.port = current_port
current_port += 1

clp_config.webui.port = current_port
current_port += 1

# Optional services
if clp_config.api_server is not None:
clp_config.api_server.port = current_port
current_port += 1

if clp_config.mcp_server is not None:
clp_config.mcp_server.port = current_port
current_port += 1

if clp_config.presto is not None:
clp_config.presto.port = current_port
current_port += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
clp_config.database.port = current_port
current_port += 1
clp_config.queue.port = current_port
current_port += 1
clp_config.redis.port = current_port
current_port += 1
# Reducer binds to a range of ports starting from a `base_port`.
clp_config.reducer.base_port = current_port
current_port += 1
clp_config.results_cache.port = current_port
current_port += 1
clp_config.query_scheduler.port = current_port
current_port += 1
clp_config.webui.port = current_port
current_port += 1
# Optional services
if clp_config.api_server is not None:
clp_config.api_server.port = current_port
current_port += 1
if clp_config.mcp_server is not None:
clp_config.mcp_server.port = current_port
current_port += 1
if clp_config.presto is not None:
clp_config.presto.port = current_port
current_port += 1
for component_name, port_attr in CLP_PACKAGE_PORTS:
component_config = getattr(clp_config, component_name)
if component_config is not None:
setattr(component_config, port_attr, current_port)
current_port += 1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed in concept; see above comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 35d1033 and b9e75ec.

📒 Files selected for processing (1)
  • integration-tests/tests/utils/port_utils.py (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
🔇 Additional comments (4)
integration-tests/tests/utils/port_utils.py (4)

21-67: Well-implemented dynamic port assignment.

The function correctly:

  • Discovers components with port-like attributes dynamically, avoiding hardcoded component names.
  • Allocates a block of 128 consecutive ports for the reducer component.
  • Validates the port range and availability before assignment.
  • Assigns ports sequentially without gaps or overlaps.

The dynamic discovery approach is flexible and aligns with the design discussion in past comments about waiting for #1658 to operate on required components generically.


69-86: Correct port range validation with clear error messaging.

The validation logic correctly checks both bounds and the error message helpfully displays inclusive endpoints for clarity.


89-105: Port availability validation provides helpful early failure detection.

The function correctly checks each port for availability and provides a clear error message directing users to choose a different range. While there's an inherent TOCTOU limitation (port could become unavailable between check and use), this early detection is valuable for catching obvious conflicts during test setup.


108-121: Correct implementation of port availability check.

The function properly uses a context manager for socket cleanup and follows the standard pattern for testing port availability. The implementation matches the approach suggested in past review discussions.

@quinntaylormitchell quinntaylormitchell requested review from Bill-hbrhbr and junhaoliao and removed request for Bill-hbrhbr and junhaoliao December 4, 2025 18:38
:param request:
:return: An iterator that yields the PackageConfig object for the specified mode.
:raise ValueError: if the CLP base port is invalid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:raise ValueError: if the CLP base port is invalid.
:raise ValueError: if the CLP base port's value is invalid.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

:param clp_config:
:raise ValueError: if the base port is out of range or if any required port is in use.
"""
# Discover which components in ClpConfig have a port-like data member.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Discover which components in ClpConfig have a port-like data member.
# Discover which components in ClpConfig have a port attribute.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

Comment on lines 23 to 24
Assign ports for all components described in `clp_config` that require a port. Ports are
assigned relative to `base_port`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Assign ports for all components described in `clp_config` that require a port. Ports are
assigned relative to `base_port`.
Assign ports for all active components described in `clp_config` that require a port. Ports
are assigned relative to `base_port`.
Assume ports appear directly under each component with no further nesting, and that
each component defines exactly one port attribute.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might reword this, as not every component of CLP requires a port. Will change to:

Suggested change
Assign ports for all components described in `clp_config` that require a port. Ports are
assigned relative to `base_port`.
Assign ports for all active components described in `clp_config` that require a port. Ports are
assigned relative to `base_port`. Assume that port attributes appear directly under each
component with no further nesting, and that each component that requires a port defines only one
port attribute using one of the names in `PORT_LIKE_ATTR_NAMES`.

break

if port_attr_name is None:
# This component doesn't have any port-like attributes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# This component doesn't have any port-like attributes.

The code is self-explanatory enough that we don't really need this comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.


ports_required_for_component = 1
if attr_name == "reducer":
# The reducer needs a block of ports starting at its base_port.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The reducer needs a block of ports starting at its base_port.
# The reducer cluster needs a block of ports starting at its `base_port`.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

Comment on lines 76 to 80
if port_range.start < VALID_PORT_RANGE.start or port_range.stop > VALID_PORT_RANGE.stop:
required_start_port = port_range.start
required_end_port = port_range.stop - 1
min_valid_port = VALID_PORT_RANGE.start
max_valid_port = VALID_PORT_RANGE.stop - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if port_range.start < VALID_PORT_RANGE.start or port_range.stop > VALID_PORT_RANGE.stop:
required_start_port = port_range.start
required_end_port = port_range.stop - 1
min_valid_port = VALID_PORT_RANGE.start
max_valid_port = VALID_PORT_RANGE.stop - 1
required_start_port = port_range.start
required_end_port = port_range.stop - 1
min_valid_port = VALID_PORT_RANGE.start
max_valid_port = VALID_PORT_RANGE.stop - 1
if required_start_port < min_valid_port or required_end_port > max_valid_port:

The entire range thing seems like overkill though. Is there a specific reason why it's introduced?
we can also write something like

  if port_range.start not in VALID_PORT_RANGE or port_range.stop not in VALID_PORT_RANGE:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used a range so that I could pass in the required range to _validate_ports_available_in_range, it made it pretty simple. I agree with your code suggestion though, will apply.

Comment on lines 82 to 84
f"The port range derived from --clp-base-port ('{required_start_port}' to"
f" '{required_end_port}' inclusive) must fall within the range of valid ports"
f" ('{min_valid_port}' to '{max_valid_port}' inclusive)."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could benefit from a range pretty print helper.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

start_port = port_range.start
end_port = port_range.stop - 1
err_msg = (
f"Port '{port}' in the desired range '{start_port}' to '{end_port}' is already in"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could benefit from a range pretty print helper.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

Copy link
Collaborator Author

@quinntaylormitchell quinntaylormitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review Bill.

re. the replies that say "see top-level comment":

The first issue with using the required_components list in assign_ports_from_base is that the component names on the list are all in their Docker container name forms (with - characters and no _), whereas the names of the ClpConfig class attributes are all in _ form.

In addition, it seems to me like it's more future-proof to iterate through the attributes of the ClpConfig class, as this code is currently doing. That way, the only thing we have to keep track of as we maintain the code in the future is the list of components that each operating mode needs (which is human-level anyway).

:param request:
:return: An iterator that yields the PackageConfig object for the specified mode.
:raise ValueError: if the CLP base port is invalid.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

Comment on lines 23 to 24
Assign ports for all components described in `clp_config` that require a port. Ports are
assigned relative to `base_port`.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might reword this, as not every component of CLP requires a port. Will change to:

Suggested change
Assign ports for all components described in `clp_config` that require a port. Ports are
assigned relative to `base_port`.
Assign ports for all active components described in `clp_config` that require a port. Ports are
assigned relative to `base_port`. Assume that port attributes appear directly under each
component with no further nesting, and that each component that requires a port defines only one
port attribute using one of the names in `PORT_LIKE_ATTR_NAMES`.

:param clp_config:
:raise ValueError: if the base port is out of range or if any required port is in use.
"""
# Discover which components in ClpConfig have a port-like data member.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

break

if port_attr_name is None:
# This component doesn't have any port-like attributes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.


ports_required_for_component = 1
if attr_name == "reducer":
# The reducer needs a block of ports starting at its base_port.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

Comment on lines 76 to 80
if port_range.start < VALID_PORT_RANGE.start or port_range.stop > VALID_PORT_RANGE.stop:
required_start_port = port_range.start
required_end_port = port_range.stop - 1
min_valid_port = VALID_PORT_RANGE.start
max_valid_port = VALID_PORT_RANGE.stop - 1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used a range so that I could pass in the required range to _validate_ports_available_in_range, it made it pretty simple. I agree with your code suggestion though, will apply.

Comment on lines 82 to 84
f"The port range derived from --clp-base-port ('{required_start_port}' to"
f" '{required_end_port}' inclusive) must fall within the range of valid ports"
f" ('{min_valid_port}' to '{max_valid_port}' inclusive)."
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

start_port = port_range.start
end_port = port_range.stop - 1
err_msg = (
f"Port '{port}' in the desired range '{start_port}' to '{end_port}' is already in"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
integration-tests/tests/utils/port_utils.py (1)

13-14: Document the origin of REDUCER_MAX_PORTS = 128.

Add a comment explaining where this value comes from (e.g., reducer configuration or implementation constraint). This will help maintainers understand why 128 ports are reserved and keep the constant in sync with any upstream changes.

 # CLP constants.
+# Maximum number of consecutive ports the reducer component requires.
+# This value should match the reducer's implementation that binds to a port range.
 REDUCER_MAX_PORTS = 128
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a37994 and 0afc57b.

📒 Files selected for processing (2)
  • integration-tests/tests/fixtures/package_config.py (3 hunks)
  • integration-tests/tests/utils/port_utils.py (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
🧬 Code graph analysis (2)
integration-tests/tests/fixtures/package_config.py (2)
integration-tests/tests/utils/port_utils.py (1)
  • assign_ports_from_base (21-67)
integration-tests/tests/utils/clp_mode_utils.py (1)
  • get_clp_config_from_mode (77-88)
integration-tests/tests/utils/port_utils.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • ClpConfig (705-955)
🔇 Additional comments (5)
integration-tests/tests/fixtures/package_config.py (2)

31-38: LGTM!

The base port parsing with clear error handling and the subsequent port assignment call are well-implemented. The error message is descriptive and will help users identify configuration issues.


43-49: LGTM!

The base_port is correctly passed to the PackageConfig constructor, enabling downstream components to access the configured base port.

integration-tests/tests/utils/port_utils.py (3)

32-52: Component discovery approach is reasonable.

The dynamic discovery via vars(clp_config) provides flexibility and avoids maintaining a separate list of components. Since Python 3.7+, vars() returns attributes in definition order, making the port assignment deterministic for a given ClpConfig structure.


54-67: LGTM!

The validation-before-assignment pattern ensures that either all ports are available and assigned, or an error is raised before any changes are made. This atomic approach prevents partial configuration states.


98-134: LGTM!

The validation functions provide clear, actionable error messages that include both the problematic port and the expected range. The helper function _format_inclusive_port_range keeps the output consistent and readable.

Comment on lines +82 to +95
def _is_port_free(port: int, host: str) -> bool:
"""
Check whether a TCP port is available for binding.
:param port:
:param host:
:return: True if the port can be bound, otherwise False.
"""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
try:
sock.bind((host, port))
except OSError:
return False
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider setting SO_REUSEADDR for more reliable port checks.

In rapid test scenarios, sockets may remain in TIME_WAIT state, causing false negatives. Setting SO_REUSEADDR before binding can help:

 def _is_port_free(port: int, host: str) -> bool:
     with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
         try:
+            sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
             sock.bind((host, port))
         except OSError:
             return False
         return True

This is optional—it depends on how frequently tests run in succession.

🤖 Prompt for AI Agents
In integration-tests/tests/utils/port_utils.py around lines 82 to 95, the
port-availability check can yield false negatives when previous sockets are in
TIME_WAIT; set the SO_REUSEADDR socket option before binding to allow rebinding
in rapid test runs. Modify the function to call
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) on the newly created
socket (before sock.bind), keeping the same try/except and context manager
behavior so the function still returns True if bind succeeds and False on
OSError.

Copy link
Contributor

@Bill-hbrhbr Bill-hbrhbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your statement on inspecting the whole CLPConfig instead of the required component list.
Have some final comments on assign_ports_from_base, the main function of this PR.

"""
# Discover which components in ClpConfig have a port attribute.
component_port_targets: list[tuple[Any, str, int]] = []
for attr_name, attr_value in vars(clp_config).items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's replace:

  • attr_name with component_name
  • attr_value with component_config

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that here, we should keep the attr_name and attr_value variable names, because we're specifically iterating through the attributes of the ClpConfig object. Not all attributes of that object are CLP components.

That being said, in the for loop at the very end of assign_ports_from_base, I use attr_value, and I do think it should be component_config in that context, so I'll change it there.

# Discover which components in ClpConfig have a port attribute.
component_port_targets: list[tuple[Any, str, int]] = []
for attr_name, attr_value in vars(clp_config).items():
if attr_name.startswith("_") or attr_value is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does the component name start with a _?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per above comment, we're not going through the components, we're going through the attributes in ClpConfig; some of them start with a _, and we can safely exclude those.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0afc57b and bbc4780.

📒 Files selected for processing (1)
  • integration-tests/tests/utils/port_utils.py (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

  • integration-tests/tests/utils/port_utils.py
🧬 Code graph analysis (1)
integration-tests/tests/utils/port_utils.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
  • ClpConfig (779-1060)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: package-image
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: build (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (3)
integration-tests/tests/utils/port_utils.py (3)

8-20: Port constants and reducer block sizing look reasonable for the test context

The valid port range and REDUCER_MAX_PORTS constant read clearly and give a sensible safety margin for reducer instances in tests. No functional concerns here.


69-78: Range pretty-printer is correct and keeps error messages readable

The helper correctly interprets Python’s half-open range as inclusive on both ends for display, and the output format reads well in your error messages. Looks good as-is.


115-133: Port-range validation is correct and produces helpful diagnostics

The inclusive bound check against VALID_PORT_RANGE is logically correct, and the error message constructed with _format_inclusive_port_range gives the user both the derived and valid ranges, which should make misconfigured --clp-base-port issues easy to spot. No issues here.

Copy link
Contributor

@Bill-hbrhbr Bill-hbrhbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing on to @junhaoliao

@junhaoliao
Copy link
Member

Hey @quinntaylormitchell, while reviewing your PR, I'd suggest the following code changes:

👉 i tried refactoring this file a bit

You can also review and apply these suggestions locally on your machine.

Learn more about GitKraken Code Suggest

Code Suggest liberates your code reviews from GitHub's restrictive, comment-only feedback style. As simple as suggesting changes in a Google-doc, provide real code suggestions from where you code, e.g. your IDE, and on anything in your project — not just on the lines of code changed in the PR.

Join your team on GitKraken to speed up PR review.

Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the title, how about:

feat(integration-tests): Add port range assignment to avoid package spin-up failures based on a configurable base port.

Comment on lines +1 to +133
"""Functions for facilitating the port connections for the CLP package."""

import socket
from typing import Any

from clp_py_utils.clp_config import ClpConfig

# Port constants.
MIN_NON_PRIVILEGED_PORT = 1024
MAX_PORT = 65535
VALID_PORT_RANGE = range(MIN_NON_PRIVILEGED_PORT, MAX_PORT + 1)

# Practical maximum number of reducer instances.
REDUCER_MAX_PORTS = 128

PORT_LIKE_ATTR_NAMES = [
"port",
"base_port",
]


def assign_ports_from_base(base_port: int, clp_config: ClpConfig) -> None:
"""
Assign ports for all active components described in `clp_config` that require a port. Ports are
assigned relative to `base_port`. Assume that port attributes appear directly under each
component with no further nesting, and that each component that requires a port defines only one
port attribute using one of the names in `PORT_LIKE_ATTR_NAMES`.
:param base_port:
:param clp_config:
:raise ValueError: if the base port is out of range or if any required port is in use.
"""
# Discover which components in ClpConfig have a port attribute.
component_port_targets: list[tuple[Any, str, int]] = []
total_ports_required = 0
for attr_name, attr_value in vars(clp_config).items():
if attr_name.startswith("_") or attr_value is None:
continue

port_attr_name: str | None = None
for port_like_name in PORT_LIKE_ATTR_NAMES:
if hasattr(attr_value, port_like_name):
port_attr_name = port_like_name
break

if port_attr_name is None:
continue

ports_required_for_component = 1
if attr_name == "reducer":
# The reducer cluster needs a block of ports starting at its `base_port`.
ports_required_for_component = REDUCER_MAX_PORTS

component_port_targets.append((attr_value, port_attr_name, ports_required_for_component))
total_ports_required += ports_required_for_component

# Ensure desired port range is valid and that all ports in the range are available.
desired_port_range = range(base_port, base_port + total_ports_required)
_validate_port_range(port_range=desired_port_range)
_validate_ports_available_in_range(host="127.0.0.1", port_range=desired_port_range)

# Assign ports to the components.
current_port = base_port
for component_config, port_attr_name, ports_required_for_component in component_port_targets:
setattr(component_config, port_attr_name, current_port)
current_port += ports_required_for_component


def _format_inclusive_port_range(port_range: range) -> str:
"""
Return a prettified string describing an inclusive range.
:param port_range:
:return: range description of the form "'start' to "'end' inclusive".
"""
start_port = port_range.start
end_port = port_range.stop - 1
return f"'{start_port}' to '{end_port}' inclusive"


def _is_port_free(port: int, host: str) -> bool:
"""
Check whether a TCP port is available for binding.
:param port:
:param host:
:return: True if the port can be bound, otherwise False.
"""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
try:
sock.bind((host, port))
except OSError:
return False
return True


def _validate_ports_available_in_range(host: str, port_range: range) -> None:
"""
Validate that each port in `port_range` is available on `host`.
:param host:
:param port_range:
:raise ValueError: if any port in the range cannot be bound.
"""
for port in port_range:
if not _is_port_free(port=port, host=host):
desired_range_str = _format_inclusive_port_range(port_range)
err_msg = (
f"Port '{port}' in the desired range ({desired_range_str}) is already in use."
" Choose a different port range for the test environment."
)
raise ValueError(err_msg)


def _validate_port_range(port_range: range) -> None:
"""
Validate that `port_range` falls completely within `VALID_PORT_RANGE`.
:param port_range:
:raise ValueError: if any part of `port_range` falls outside `VALID_PORT_RANGE`.
"""
required_start_port = port_range.start
required_end_port = port_range.stop - 1
min_valid_port = VALID_PORT_RANGE.start
max_valid_port = VALID_PORT_RANGE.stop - 1
if required_start_port < min_valid_port or required_end_port > max_valid_port:
required_range_str = _format_inclusive_port_range(port_range)
valid_range_str = _format_inclusive_port_range(VALID_PORT_RANGE)
err_msg = (
f"The port range derived from --clp-base-port ({required_range_str}) must fall within"
f" the range of valid ports ({valid_range_str})."
)
raise ValueError(err_msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried cleaning up this source file a bit. changes include:

  1. rename constants / method names to make them more intuitive / self-explanatory
  2. extract a helper to discover port assignment requirements
  3. modified some docstrings
Suggested change
"""Functions for facilitating the port connections for the CLP package."""
import socket
from typing import Any
from clp_py_utils.clp_config import ClpConfig
# Port constants.
MIN_NON_PRIVILEGED_PORT = 1024
MAX_PORT = 65535
VALID_PORT_RANGE = range(MIN_NON_PRIVILEGED_PORT, MAX_PORT + 1)
# Practical maximum number of reducer instances.
REDUCER_MAX_PORTS = 128
PORT_LIKE_ATTR_NAMES = [
"port",
"base_port",
]
def assign_ports_from_base(base_port: int, clp_config: ClpConfig) -> None:
"""
Assign ports for all active components described in `clp_config` that require a port. Ports are
assigned relative to `base_port`. Assume that port attributes appear directly under each
component with no further nesting, and that each component that requires a port defines only one
port attribute using one of the names in `PORT_LIKE_ATTR_NAMES`.
:param base_port:
:param clp_config:
:raise ValueError: if the base port is out of range or if any required port is in use.
"""
# Discover which components in ClpConfig have a port attribute.
component_port_targets: list[tuple[Any, str, int]] = []
total_ports_required = 0
for attr_name, attr_value in vars(clp_config).items():
if attr_name.startswith("_") or attr_value is None:
continue
port_attr_name: str | None = None
for port_like_name in PORT_LIKE_ATTR_NAMES:
if hasattr(attr_value, port_like_name):
port_attr_name = port_like_name
break
if port_attr_name is None:
continue
ports_required_for_component = 1
if attr_name == "reducer":
# The reducer cluster needs a block of ports starting at its `base_port`.
ports_required_for_component = REDUCER_MAX_PORTS
component_port_targets.append((attr_value, port_attr_name, ports_required_for_component))
total_ports_required += ports_required_for_component
# Ensure desired port range is valid and that all ports in the range are available.
desired_port_range = range(base_port, base_port + total_ports_required)
_validate_port_range(port_range=desired_port_range)
_validate_ports_available_in_range(host="127.0.0.1", port_range=desired_port_range)
# Assign ports to the components.
current_port = base_port
for component_config, port_attr_name, ports_required_for_component in component_port_targets:
setattr(component_config, port_attr_name, current_port)
current_port += ports_required_for_component
def _format_inclusive_port_range(port_range: range) -> str:
"""
Return a prettified string describing an inclusive range.
:param port_range:
:return: range description of the form "'start' to "'end' inclusive".
"""
start_port = port_range.start
end_port = port_range.stop - 1
return f"'{start_port}' to '{end_port}' inclusive"
def _is_port_free(port: int, host: str) -> bool:
"""
Check whether a TCP port is available for binding.
:param port:
:param host:
:return: True if the port can be bound, otherwise False.
"""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
try:
sock.bind((host, port))
except OSError:
return False
return True
def _validate_ports_available_in_range(host: str, port_range: range) -> None:
"""
Validate that each port in `port_range` is available on `host`.
:param host:
:param port_range:
:raise ValueError: if any port in the range cannot be bound.
"""
for port in port_range:
if not _is_port_free(port=port, host=host):
desired_range_str = _format_inclusive_port_range(port_range)
err_msg = (
f"Port '{port}' in the desired range ({desired_range_str}) is already in use."
" Choose a different port range for the test environment."
)
raise ValueError(err_msg)
def _validate_port_range(port_range: range) -> None:
"""
Validate that `port_range` falls completely within `VALID_PORT_RANGE`.
:param port_range:
:raise ValueError: if any part of `port_range` falls outside `VALID_PORT_RANGE`.
"""
required_start_port = port_range.start
required_end_port = port_range.stop - 1
min_valid_port = VALID_PORT_RANGE.start
max_valid_port = VALID_PORT_RANGE.stop - 1
if required_start_port < min_valid_port or required_end_port > max_valid_port:
required_range_str = _format_inclusive_port_range(port_range)
valid_range_str = _format_inclusive_port_range(VALID_PORT_RANGE)
err_msg = (
f"The port range derived from --clp-base-port ({required_range_str}) must fall within"
f" the range of valid ports ({valid_range_str})."
)
raise ValueError(err_msg)
"""Functions for facilitating the port connections for the CLP package."""
import socket
from dataclasses import dataclass
from typing import Any
from clp_py_utils.clp_config import ClpConfig
# Port constants.
MIN_NON_PRIVILEGED_PORT = 1024
MAX_PORT = 65535
VALID_PORT_RANGE = range(MIN_NON_PRIVILEGED_PORT, MAX_PORT + 1)
# Practical maximum number of reducer instances.
REDUCER_MAX_PORTS = 128
# Attribute names used to discover port configuration in component objects.
PORT_ATTR_NAMES = ["port", "base_port"]
@dataclass
class PortAssignment:
"""
Represents a port assignment for a CLP component.
:param component: The component configuration object that needs port assignment.
:param attr_name: The name of the port attribute on the component.
:param port_count: The number of consecutive ports needed by this component.
"""
component: Any
attr_name: str
port_count: int
def assign_ports_from_base(base_port: int, clp_config: ClpConfig) -> None:
"""
Assign ports to all components in `clp_config` that require them, starting from `base_port`.
Ports are assigned sequentially, with each component receiving the number of ports it requires.
:param base_port:
:param clp_config:
:raise ValueError: If the base port is out of range, or if any required port is in use.
"""
# Discover which components need port assignments.
port_assignments = _discover_port_assignments(clp_config)
total_ports_needed = sum(assignment.port_count for assignment in port_assignments)
# Validate that all required ports are valid and available.
port_range = range(base_port, base_port + total_ports_needed)
_validate_port_range_bounds(port_range)
_check_ports_available(host="127.0.0.1", port_range=port_range)
# Assign ports to the components.
current_port = base_port
for assignment in port_assignments:
setattr(assignment.component, assignment.attr_name, current_port)
current_port += assignment.port_count
def _check_ports_available(host: str, port_range: range) -> None:
"""
Check that all ports in the given range are available for binding.
:param host:
:param port_range:
:raise ValueError: If any port in the range is already in use.
"""
for port in port_range:
if not _is_port_free(port=port, host=host):
range_str = _format_port_range(port_range)
err_msg = (
f"Port '{port}' in the desired range ({range_str}) is already in use. "
"Choose a different port range for the test environment."
)
raise ValueError(err_msg)
def _discover_port_assignments(clp_config: ClpConfig) -> list[PortAssignment]:
"""
Discover which components in `clp_config` require port assignments.
:param clp_config:
:return: A list of PortAssignment objects describing what ports each component needs.
"""
port_assignments: list[PortAssignment] = []
for component_name, component_config in vars(clp_config).items():
# Skip private attributes and None values.
if component_name.startswith("_") or component_config is None:
continue
# Check if this component has a port attribute.
port_attr_name = None
for attr_name in PORT_ATTR_NAMES:
if hasattr(component_config, attr_name):
port_attr_name = attr_name
break
if port_attr_name is None:
continue
# Determine how many ports this component needs.
port_count = REDUCER_MAX_PORTS if component_name == "reducer" else 1
port_assignments.append(
PortAssignment(
component=component_config,
attr_name=port_attr_name,
port_count=port_count,
)
)
return port_assignments
def _format_port_range(port_range: range) -> str:
"""
Format a port range as a human-readable string.
:param port_range:
:return: A string like "'1024' to '65535' inclusive".
"""
start_port = port_range.start
end_port = port_range.stop - 1
return f"'{start_port}' to '{end_port}' inclusive"
def _is_port_free(port: int, host: str) -> bool:
"""
Check whether a TCP port is available for binding.
:param port:
:param host:
:return: True if the port can be bound, otherwise False.
"""
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
try:
sock.bind((host, port))
except OSError:
return False
return True
def _validate_port_range_bounds(port_range: range) -> None:
"""
Validates that the given port range falls within the valid port range.
:param port_range:
:raise ValueError: If any part of the range falls outside valid port numbers.
"""
start_port = port_range.start
end_port = port_range.stop - 1
min_valid_port = VALID_PORT_RANGE.start
max_valid_port = VALID_PORT_RANGE.stop - 1
if start_port < min_valid_port or end_port > max_valid_port:
required_range_str = _format_port_range(port_range)
valid_range_str = _format_port_range(VALID_PORT_RANGE)
err_msg = (
f"The port range derived from --clp-base-port ({required_range_str}) must fall within"
f" the range of valid ports ({valid_range_str})."
)
raise ValueError(err_msg)

:param parser:
"""
parser.addoption(
"--clp-base-port",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is an option to the CLP integration test script (rather than an env var that others might read), the clp- prefix isn't adding much value. How about just --base-port?

@quinntaylormitchell quinntaylormitchell changed the title feat(integration-tests): Assign ports based on BASE_PORT to avoid package spin-up failure. feat(integration-tests): Add port range assignment based on a configurable base port to avoid package spin-up failures. Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants