-
Notifications
You must be signed in to change notification settings - Fork 84
feat(integration-tests): Add port range assignment based on a configurable base port to avoid package spin-up failures. #1662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ig file; address comments.
…er-level clp-config.yml
…E_CONFIGS other than clp-text and clp-json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
integration-tests/tests/utils/port_utils.py (1)
7-7: Off-by-one: privileged ports include 1023.Privileged (well-known) ports are 0–1023, so the first non-privileged port is 1024. With the current value of 1023 and the
<comparison on line 67, abase_portof 1023 would incorrectly be accepted.-MIN_NON_PRIVILEGED_PORT = 1023 +MIN_NON_PRIVILEGED_PORT = 1024
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (5)
integration-tests/tests/conftest.py(1 hunks)integration-tests/tests/fixtures/package_config.py(2 hunks)integration-tests/tests/fixtures/package_instance.py(1 hunks)integration-tests/tests/utils/config.py(1 hunks)integration-tests/tests/utils/port_utils.py(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Applied to files:
integration-tests/tests/fixtures/package_instance.py
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Applied to files:
integration-tests/tests/utils/port_utils.py
🧬 Code graph analysis (3)
integration-tests/tests/fixtures/package_config.py (4)
integration-tests/tests/utils/port_utils.py (1)
assign_ports_from_base(11-57)integration-tests/tests/utils/clp_mode_utils.py (1)
get_clp_config_from_mode(29-40)integration-tests/tests/utils/config.py (1)
PackageConfig(111-144)integration-tests/tests/fixtures/path_configs.py (1)
fixt_package_path_config(28-35)
integration-tests/tests/fixtures/package_instance.py (1)
integration-tests/tests/fixtures/package_config.py (1)
fixt_package_config(13-49)
integration-tests/tests/utils/port_utils.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
ClpConfig(713-963)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: package-image
- GitHub Check: build (macos-15)
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (5)
integration-tests/tests/conftest.py (1)
14-25: LGTM!The
pytest_addoptionhook is correctly implemented to register the--clp-base-portCLI option. The default value of"55000"falls within the dynamic/private port range (49152–65535), which is appropriate for avoiding conflicts with well-known and registered ports.integration-tests/tests/utils/port_utils.py (2)
22-57: LGTM!The port assignment logic correctly reserves fixed offsets for all services by incrementing
current_portunconditionally, ensuring deterministic port mappings regardless of which optional services are enabled.
60-75: LGTM!The validation logic and error messages are clear and provide actionable feedback when the port range is invalid.
integration-tests/tests/utils/config.py (1)
123-124: LGTM!The new
base_portfield is appropriately added to thePackageConfigdataclass with a clear docstring comment.integration-tests/tests/fixtures/package_config.py (1)
29-43: LGTM!The base port handling is well-implemented with proper error handling for invalid CLI input. The
ValueErrorpropagation fromassign_ports_from_base(when the port range is invalid) is also correctly documented in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
integration-tests/tests/fixtures/package_instance.py(1 hunks)integration-tests/tests/utils/port_utils.py(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
📚 Learning: 2025-08-17T16:10:38.722Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Applied to files:
integration-tests/tests/fixtures/package_instance.py
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Applied to files:
integration-tests/tests/utils/port_utils.py
🧬 Code graph analysis (2)
integration-tests/tests/fixtures/package_instance.py (1)
integration-tests/tests/fixtures/package_config.py (1)
fixt_package_config(13-49)
integration-tests/tests/utils/port_utils.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
ClpConfig(713-963)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (2)
integration-tests/tests/fixtures/package_instance.py (1)
32-37: Preserve originalRuntimeErrordetails in the failure messageRight now the
RuntimeErroris swallowed and only the hint aboutbase_portis shown, which can make non‑port issues harder to diagnose. Consider capturing the exception and including its message alongside your helpful guidance.[ suggest_recommended_refactor ]
[ request_verification ]- except RuntimeError: - base_port = fixt_package_config.base_port - pytest.fail( - f"Failed to start the {mode_name} package. This could mean that one of the ports" - f" derived from base_port={base_port} was unavailable. You can specify a new value for" - " base_port with the '--clp-base-port' flag." - ) + except RuntimeError as err: + base_port = fixt_package_config.base_port + pytest.fail( + f"Failed to start the {mode_name} package. This could mean that one of the ports" + f" derived from base_port={base_port} was unavailable. You can specify a new value for" + " base_port with the '--clp-base-port' flag. " + f"Original error: {err}" + )Please verify this still produces readable pytest output for typical failure cases under
pytest8.4.1.integration-tests/tests/utils/port_utils.py (1)
60-75: Port-range validation logic and messaging look solid.The checks against
MIN_NON_PRIVILEGED_PORTandMAX_PORTcorrectly enforce the valid range for the entire block ofMAX_REQUIRED_PORTSports, and the error messages are clear and actionable.
| # MAX_REQUIRED_PORTS is equal to the max number of CLP package components that need a port. | ||
| MAX_REQUIRED_PORTS = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # MAX_REQUIRED_PORTS is equal to the max number of CLP package components that need a port. | |
| MAX_REQUIRED_PORTS = 10 | |
| CLP_PACKAGE_PORTS: list[tuple[str, str]] = [ | |
| ("database", "port"), | |
| ("queue", "port"), | |
| ("redis", "port"), | |
| ("reducer", "base_port"), | |
| ("results_cache", "port"), | |
| ("query_scheduler", "port"), | |
| ("webui", "port"), | |
| # Optional services | |
| ("api_server", "port"), | |
| ("mcp_server", "port"), | |
| ("presto", "port"), | |
| ] | |
| # Maximum number of ports that may be consumed by the CLP package. | |
| MAX_REQUIRED_PORTS = len(CLP_PACKAGE_PORTS) |
Since Junhao had comments about how this number is derived, I think this is a good opportunity to change the current brute force way of iterating over ports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of avoiding brute force; If we wait for #1658 to be merged, we could just operate on the list of required components for any given package_config, rather than making (and then having to maintain) a list like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works too, though I don't know where we can account for the path to the port attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can figure out how to do that; if it ends up being to complicated, I'll list them like you did in your suggestion
|
|
||
| clp_config.database.port = current_port | ||
| current_port += 1 | ||
|
|
||
| clp_config.queue.port = current_port | ||
| current_port += 1 | ||
|
|
||
| clp_config.redis.port = current_port | ||
| current_port += 1 | ||
|
|
||
| # Reducer binds to a range of ports starting from a `base_port`. | ||
| clp_config.reducer.base_port = current_port | ||
| current_port += 1 | ||
|
|
||
| clp_config.results_cache.port = current_port | ||
| current_port += 1 | ||
|
|
||
| clp_config.query_scheduler.port = current_port | ||
| current_port += 1 | ||
|
|
||
| clp_config.webui.port = current_port | ||
| current_port += 1 | ||
|
|
||
| # Optional services | ||
| if clp_config.api_server is not None: | ||
| clp_config.api_server.port = current_port | ||
| current_port += 1 | ||
|
|
||
| if clp_config.mcp_server is not None: | ||
| clp_config.mcp_server.port = current_port | ||
| current_port += 1 | ||
|
|
||
| if clp_config.presto is not None: | ||
| clp_config.presto.port = current_port | ||
| current_port += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| clp_config.database.port = current_port | |
| current_port += 1 | |
| clp_config.queue.port = current_port | |
| current_port += 1 | |
| clp_config.redis.port = current_port | |
| current_port += 1 | |
| # Reducer binds to a range of ports starting from a `base_port`. | |
| clp_config.reducer.base_port = current_port | |
| current_port += 1 | |
| clp_config.results_cache.port = current_port | |
| current_port += 1 | |
| clp_config.query_scheduler.port = current_port | |
| current_port += 1 | |
| clp_config.webui.port = current_port | |
| current_port += 1 | |
| # Optional services | |
| if clp_config.api_server is not None: | |
| clp_config.api_server.port = current_port | |
| current_port += 1 | |
| if clp_config.mcp_server is not None: | |
| clp_config.mcp_server.port = current_port | |
| current_port += 1 | |
| if clp_config.presto is not None: | |
| clp_config.presto.port = current_port | |
| current_port += 1 | |
| for component_name, port_attr in CLP_PACKAGE_PORTS: | |
| component_config = getattr(clp_config, component_name) | |
| if component_config is not None: | |
| setattr(component_config, port_attr, current_port) | |
| current_port += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed in concept; see above comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
integration-tests/tests/utils/port_utils.py(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Applied to files:
integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
integration-tests/tests/utils/port_utils.py
🔇 Additional comments (4)
integration-tests/tests/utils/port_utils.py (4)
21-67: Well-implemented dynamic port assignment.The function correctly:
- Discovers components with port-like attributes dynamically, avoiding hardcoded component names.
- Allocates a block of 128 consecutive ports for the reducer component.
- Validates the port range and availability before assignment.
- Assigns ports sequentially without gaps or overlaps.
The dynamic discovery approach is flexible and aligns with the design discussion in past comments about waiting for #1658 to operate on required components generically.
69-86: Correct port range validation with clear error messaging.The validation logic correctly checks both bounds and the error message helpfully displays inclusive endpoints for clarity.
89-105: Port availability validation provides helpful early failure detection.The function correctly checks each port for availability and provides a clear error message directing users to choose a different range. While there's an inherent TOCTOU limitation (port could become unavailable between check and use), this early detection is valuable for catching obvious conflicts during test setup.
108-121: Correct implementation of port availability check.The function properly uses a context manager for socket cleanup and follows the standard pattern for testing port availability. The implementation matches the approach suggested in past review discussions.
| :param request: | ||
| :return: An iterator that yields the PackageConfig object for the specified mode. | ||
| :raise ValueError: if the CLP base port is invalid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| :raise ValueError: if the CLP base port is invalid. | |
| :raise ValueError: if the CLP base port's value is invalid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
| :param clp_config: | ||
| :raise ValueError: if the base port is out of range or if any required port is in use. | ||
| """ | ||
| # Discover which components in ClpConfig have a port-like data member. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Discover which components in ClpConfig have a port-like data member. | |
| # Discover which components in ClpConfig have a port attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
| Assign ports for all components described in `clp_config` that require a port. Ports are | ||
| assigned relative to `base_port`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Assign ports for all components described in `clp_config` that require a port. Ports are | |
| assigned relative to `base_port`. | |
| Assign ports for all active components described in `clp_config` that require a port. Ports | |
| are assigned relative to `base_port`. | |
| Assume ports appear directly under each component with no further nesting, and that | |
| each component defines exactly one port attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might reword this, as not every component of CLP requires a port. Will change to:
| Assign ports for all components described in `clp_config` that require a port. Ports are | |
| assigned relative to `base_port`. | |
| Assign ports for all active components described in `clp_config` that require a port. Ports are | |
| assigned relative to `base_port`. Assume that port attributes appear directly under each | |
| component with no further nesting, and that each component that requires a port defines only one | |
| port attribute using one of the names in `PORT_LIKE_ATTR_NAMES`. |
| break | ||
|
|
||
| if port_attr_name is None: | ||
| # This component doesn't have any port-like attributes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # This component doesn't have any port-like attributes. |
The code is self-explanatory enough that we don't really need this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
|
|
||
| ports_required_for_component = 1 | ||
| if attr_name == "reducer": | ||
| # The reducer needs a block of ports starting at its base_port. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # The reducer needs a block of ports starting at its base_port. | |
| # The reducer cluster needs a block of ports starting at its `base_port`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
| if port_range.start < VALID_PORT_RANGE.start or port_range.stop > VALID_PORT_RANGE.stop: | ||
| required_start_port = port_range.start | ||
| required_end_port = port_range.stop - 1 | ||
| min_valid_port = VALID_PORT_RANGE.start | ||
| max_valid_port = VALID_PORT_RANGE.stop - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if port_range.start < VALID_PORT_RANGE.start or port_range.stop > VALID_PORT_RANGE.stop: | |
| required_start_port = port_range.start | |
| required_end_port = port_range.stop - 1 | |
| min_valid_port = VALID_PORT_RANGE.start | |
| max_valid_port = VALID_PORT_RANGE.stop - 1 | |
| required_start_port = port_range.start | |
| required_end_port = port_range.stop - 1 | |
| min_valid_port = VALID_PORT_RANGE.start | |
| max_valid_port = VALID_PORT_RANGE.stop - 1 | |
| if required_start_port < min_valid_port or required_end_port > max_valid_port: |
The entire range thing seems like overkill though. Is there a specific reason why it's introduced?
we can also write something like
if port_range.start not in VALID_PORT_RANGE or port_range.stop not in VALID_PORT_RANGE:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used a range so that I could pass in the required range to _validate_ports_available_in_range, it made it pretty simple. I agree with your code suggestion though, will apply.
| f"The port range derived from --clp-base-port ('{required_start_port}' to" | ||
| f" '{required_end_port}' inclusive) must fall within the range of valid ports" | ||
| f" ('{min_valid_port}' to '{max_valid_port}' inclusive)." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could benefit from a range pretty print helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
| start_port = port_range.start | ||
| end_port = port_range.stop - 1 | ||
| err_msg = ( | ||
| f"Port '{port}' in the desired range '{start_port}' to '{end_port}' is already in" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could benefit from a range pretty print helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
quinntaylormitchell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review Bill.
re. the replies that say "see top-level comment":
The first issue with using the required_components list in assign_ports_from_base is that the component names on the list are all in their Docker container name forms (with - characters and no _), whereas the names of the ClpConfig class attributes are all in _ form.
In addition, it seems to me like it's more future-proof to iterate through the attributes of the ClpConfig class, as this code is currently doing. That way, the only thing we have to keep track of as we maintain the code in the future is the list of components that each operating mode needs (which is human-level anyway).
| :param request: | ||
| :return: An iterator that yields the PackageConfig object for the specified mode. | ||
| :raise ValueError: if the CLP base port is invalid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
| Assign ports for all components described in `clp_config` that require a port. Ports are | ||
| assigned relative to `base_port`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might reword this, as not every component of CLP requires a port. Will change to:
| Assign ports for all components described in `clp_config` that require a port. Ports are | |
| assigned relative to `base_port`. | |
| Assign ports for all active components described in `clp_config` that require a port. Ports are | |
| assigned relative to `base_port`. Assume that port attributes appear directly under each | |
| component with no further nesting, and that each component that requires a port defines only one | |
| port attribute using one of the names in `PORT_LIKE_ATTR_NAMES`. |
| :param clp_config: | ||
| :raise ValueError: if the base port is out of range or if any required port is in use. | ||
| """ | ||
| # Discover which components in ClpConfig have a port-like data member. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
| break | ||
|
|
||
| if port_attr_name is None: | ||
| # This component doesn't have any port-like attributes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
|
|
||
| ports_required_for_component = 1 | ||
| if attr_name == "reducer": | ||
| # The reducer needs a block of ports starting at its base_port. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
| if port_range.start < VALID_PORT_RANGE.start or port_range.stop > VALID_PORT_RANGE.stop: | ||
| required_start_port = port_range.start | ||
| required_end_port = port_range.stop - 1 | ||
| min_valid_port = VALID_PORT_RANGE.start | ||
| max_valid_port = VALID_PORT_RANGE.stop - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used a range so that I could pass in the required range to _validate_ports_available_in_range, it made it pretty simple. I agree with your code suggestion though, will apply.
| f"The port range derived from --clp-base-port ('{required_start_port}' to" | ||
| f" '{required_end_port}' inclusive) must fall within the range of valid ports" | ||
| f" ('{min_valid_port}' to '{max_valid_port}' inclusive)." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
| start_port = port_range.start | ||
| end_port = port_range.stop - 1 | ||
| err_msg = ( | ||
| f"Port '{port}' in the desired range '{start_port}' to '{end_port}' is already in" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
integration-tests/tests/utils/port_utils.py (1)
13-14: Document the origin ofREDUCER_MAX_PORTS = 128.Add a comment explaining where this value comes from (e.g., reducer configuration or implementation constraint). This will help maintainers understand why 128 ports are reserved and keep the constant in sync with any upstream changes.
# CLP constants. +# Maximum number of consecutive ports the reducer component requires. +# This value should match the reducer's implementation that binds to a port range. REDUCER_MAX_PORTS = 128
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
integration-tests/tests/fixtures/package_config.py(3 hunks)integration-tests/tests/utils/port_utils.py(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Applied to files:
integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
integration-tests/tests/utils/port_utils.py
🧬 Code graph analysis (2)
integration-tests/tests/fixtures/package_config.py (2)
integration-tests/tests/utils/port_utils.py (1)
assign_ports_from_base(21-67)integration-tests/tests/utils/clp_mode_utils.py (1)
get_clp_config_from_mode(77-88)
integration-tests/tests/utils/port_utils.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
ClpConfig(705-955)
🔇 Additional comments (5)
integration-tests/tests/fixtures/package_config.py (2)
31-38: LGTM!The base port parsing with clear error handling and the subsequent port assignment call are well-implemented. The error message is descriptive and will help users identify configuration issues.
43-49: LGTM!The
base_portis correctly passed to thePackageConfigconstructor, enabling downstream components to access the configured base port.integration-tests/tests/utils/port_utils.py (3)
32-52: Component discovery approach is reasonable.The dynamic discovery via
vars(clp_config)provides flexibility and avoids maintaining a separate list of components. Since Python 3.7+,vars()returns attributes in definition order, making the port assignment deterministic for a givenClpConfigstructure.
54-67: LGTM!The validation-before-assignment pattern ensures that either all ports are available and assigned, or an error is raised before any changes are made. This atomic approach prevents partial configuration states.
98-134: LGTM!The validation functions provide clear, actionable error messages that include both the problematic port and the expected range. The helper function
_format_inclusive_port_rangekeeps the output consistent and readable.
| def _is_port_free(port: int, host: str) -> bool: | ||
| """ | ||
| Check whether a TCP port is available for binding. | ||
| :param port: | ||
| :param host: | ||
| :return: True if the port can be bound, otherwise False. | ||
| """ | ||
| with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: | ||
| try: | ||
| sock.bind((host, port)) | ||
| except OSError: | ||
| return False | ||
| return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Consider setting SO_REUSEADDR for more reliable port checks.
In rapid test scenarios, sockets may remain in TIME_WAIT state, causing false negatives. Setting SO_REUSEADDR before binding can help:
def _is_port_free(port: int, host: str) -> bool:
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
try:
+ sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind((host, port))
except OSError:
return False
return TrueThis is optional—it depends on how frequently tests run in succession.
🤖 Prompt for AI Agents
In integration-tests/tests/utils/port_utils.py around lines 82 to 95, the
port-availability check can yield false negatives when previous sockets are in
TIME_WAIT; set the SO_REUSEADDR socket option before binding to allow rebinding
in rapid test runs. Modify the function to call
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) on the newly created
socket (before sock.bind), keeping the same try/except and context manager
behavior so the function still returns True if bind succeeds and False on
OSError.
Bill-hbrhbr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your statement on inspecting the whole CLPConfig instead of the required component list.
Have some final comments on assign_ports_from_base, the main function of this PR.
| """ | ||
| # Discover which components in ClpConfig have a port attribute. | ||
| component_port_targets: list[tuple[Any, str, int]] = [] | ||
| for attr_name, attr_value in vars(clp_config).items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's replace:
attr_namewithcomponent_nameattr_valuewithcomponent_config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that here, we should keep the attr_name and attr_value variable names, because we're specifically iterating through the attributes of the ClpConfig object. Not all attributes of that object are CLP components.
That being said, in the for loop at the very end of assign_ports_from_base, I use attr_value, and I do think it should be component_config in that context, so I'll change it there.
| # Discover which components in ClpConfig have a port attribute. | ||
| component_port_targets: list[tuple[Any, str, int]] = [] | ||
| for attr_name, attr_value in vars(clp_config).items(): | ||
| if attr_name.startswith("_") or attr_value is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does the component name start with a _?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per above comment, we're not going through the components, we're going through the attributes in ClpConfig; some of them start with a _, and we can safely exclude those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
integration-tests/tests/utils/port_utils.py(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1405
File: components/clp-package-utils/pyproject.toml:5-15
Timestamp: 2025-10-13T03:24:35.074Z
Learning: In the y-scope/clp repository, the Python 3.9 to 3.10 version requirement change was intentionally deferred to a separate PR (after PR #1405) to reduce review effort, as decided in an offline discussion between junhaoliao and kirkrodrigues.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1100
File: integration-tests/tests/fixtures/integration_test_logs.py:54-56
Timestamp: 2025-08-17T16:10:38.722Z
Learning: For PR #1100 (feat(integration-tests): Add CLP package integration tests boilerplate), do not raise cache weakness problems related to the pytest cache implementation in the integration test logs fixtures.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Applied to files:
integration-tests/tests/utils/port_utils.py
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
integration-tests/tests/utils/port_utils.py
🧬 Code graph analysis (1)
integration-tests/tests/utils/port_utils.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
ClpConfig(779-1060)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: package-image
- GitHub Check: build (ubuntu-24.04)
- GitHub Check: build (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (3)
integration-tests/tests/utils/port_utils.py (3)
8-20: Port constants and reducer block sizing look reasonable for the test contextThe valid port range and
REDUCER_MAX_PORTSconstant read clearly and give a sensible safety margin for reducer instances in tests. No functional concerns here.
69-78: Range pretty-printer is correct and keeps error messages readableThe helper correctly interprets Python’s half-open
rangeas inclusive on both ends for display, and the output format reads well in your error messages. Looks good as-is.
115-133: Port-range validation is correct and produces helpful diagnosticsThe inclusive bound check against
VALID_PORT_RANGEis logically correct, and the error message constructed with_format_inclusive_port_rangegives the user both the derived and valid ranges, which should make misconfigured--clp-base-portissues easy to spot. No issues here.
Bill-hbrhbr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passing on to @junhaoliao
|
Hey @quinntaylormitchell, while reviewing your PR, I'd suggest the following code changes: 👉 i tried refactoring this file a bit You can also review and apply these suggestions locally on your machine. Learn more about GitKraken Code Suggest
Join your team on GitKraken to speed up PR review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the title, how about:
feat(integration-tests): Add port range assignment to avoid package spin-up failures based on a configurable base port.
| """Functions for facilitating the port connections for the CLP package.""" | ||
|
|
||
| import socket | ||
| from typing import Any | ||
|
|
||
| from clp_py_utils.clp_config import ClpConfig | ||
|
|
||
| # Port constants. | ||
| MIN_NON_PRIVILEGED_PORT = 1024 | ||
| MAX_PORT = 65535 | ||
| VALID_PORT_RANGE = range(MIN_NON_PRIVILEGED_PORT, MAX_PORT + 1) | ||
|
|
||
| # Practical maximum number of reducer instances. | ||
| REDUCER_MAX_PORTS = 128 | ||
|
|
||
| PORT_LIKE_ATTR_NAMES = [ | ||
| "port", | ||
| "base_port", | ||
| ] | ||
|
|
||
|
|
||
| def assign_ports_from_base(base_port: int, clp_config: ClpConfig) -> None: | ||
| """ | ||
| Assign ports for all active components described in `clp_config` that require a port. Ports are | ||
| assigned relative to `base_port`. Assume that port attributes appear directly under each | ||
| component with no further nesting, and that each component that requires a port defines only one | ||
| port attribute using one of the names in `PORT_LIKE_ATTR_NAMES`. | ||
| :param base_port: | ||
| :param clp_config: | ||
| :raise ValueError: if the base port is out of range or if any required port is in use. | ||
| """ | ||
| # Discover which components in ClpConfig have a port attribute. | ||
| component_port_targets: list[tuple[Any, str, int]] = [] | ||
| total_ports_required = 0 | ||
| for attr_name, attr_value in vars(clp_config).items(): | ||
| if attr_name.startswith("_") or attr_value is None: | ||
| continue | ||
|
|
||
| port_attr_name: str | None = None | ||
| for port_like_name in PORT_LIKE_ATTR_NAMES: | ||
| if hasattr(attr_value, port_like_name): | ||
| port_attr_name = port_like_name | ||
| break | ||
|
|
||
| if port_attr_name is None: | ||
| continue | ||
|
|
||
| ports_required_for_component = 1 | ||
| if attr_name == "reducer": | ||
| # The reducer cluster needs a block of ports starting at its `base_port`. | ||
| ports_required_for_component = REDUCER_MAX_PORTS | ||
|
|
||
| component_port_targets.append((attr_value, port_attr_name, ports_required_for_component)) | ||
| total_ports_required += ports_required_for_component | ||
|
|
||
| # Ensure desired port range is valid and that all ports in the range are available. | ||
| desired_port_range = range(base_port, base_port + total_ports_required) | ||
| _validate_port_range(port_range=desired_port_range) | ||
| _validate_ports_available_in_range(host="127.0.0.1", port_range=desired_port_range) | ||
|
|
||
| # Assign ports to the components. | ||
| current_port = base_port | ||
| for component_config, port_attr_name, ports_required_for_component in component_port_targets: | ||
| setattr(component_config, port_attr_name, current_port) | ||
| current_port += ports_required_for_component | ||
|
|
||
|
|
||
| def _format_inclusive_port_range(port_range: range) -> str: | ||
| """ | ||
| Return a prettified string describing an inclusive range. | ||
| :param port_range: | ||
| :return: range description of the form "'start' to "'end' inclusive". | ||
| """ | ||
| start_port = port_range.start | ||
| end_port = port_range.stop - 1 | ||
| return f"'{start_port}' to '{end_port}' inclusive" | ||
|
|
||
|
|
||
| def _is_port_free(port: int, host: str) -> bool: | ||
| """ | ||
| Check whether a TCP port is available for binding. | ||
| :param port: | ||
| :param host: | ||
| :return: True if the port can be bound, otherwise False. | ||
| """ | ||
| with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: | ||
| try: | ||
| sock.bind((host, port)) | ||
| except OSError: | ||
| return False | ||
| return True | ||
|
|
||
|
|
||
| def _validate_ports_available_in_range(host: str, port_range: range) -> None: | ||
| """ | ||
| Validate that each port in `port_range` is available on `host`. | ||
| :param host: | ||
| :param port_range: | ||
| :raise ValueError: if any port in the range cannot be bound. | ||
| """ | ||
| for port in port_range: | ||
| if not _is_port_free(port=port, host=host): | ||
| desired_range_str = _format_inclusive_port_range(port_range) | ||
| err_msg = ( | ||
| f"Port '{port}' in the desired range ({desired_range_str}) is already in use." | ||
| " Choose a different port range for the test environment." | ||
| ) | ||
| raise ValueError(err_msg) | ||
|
|
||
|
|
||
| def _validate_port_range(port_range: range) -> None: | ||
| """ | ||
| Validate that `port_range` falls completely within `VALID_PORT_RANGE`. | ||
| :param port_range: | ||
| :raise ValueError: if any part of `port_range` falls outside `VALID_PORT_RANGE`. | ||
| """ | ||
| required_start_port = port_range.start | ||
| required_end_port = port_range.stop - 1 | ||
| min_valid_port = VALID_PORT_RANGE.start | ||
| max_valid_port = VALID_PORT_RANGE.stop - 1 | ||
| if required_start_port < min_valid_port or required_end_port > max_valid_port: | ||
| required_range_str = _format_inclusive_port_range(port_range) | ||
| valid_range_str = _format_inclusive_port_range(VALID_PORT_RANGE) | ||
| err_msg = ( | ||
| f"The port range derived from --clp-base-port ({required_range_str}) must fall within" | ||
| f" the range of valid ports ({valid_range_str})." | ||
| ) | ||
| raise ValueError(err_msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i tried cleaning up this source file a bit. changes include:
- rename constants / method names to make them more intuitive / self-explanatory
- extract a helper to discover port assignment requirements
- modified some docstrings
| """Functions for facilitating the port connections for the CLP package.""" | |
| import socket | |
| from typing import Any | |
| from clp_py_utils.clp_config import ClpConfig | |
| # Port constants. | |
| MIN_NON_PRIVILEGED_PORT = 1024 | |
| MAX_PORT = 65535 | |
| VALID_PORT_RANGE = range(MIN_NON_PRIVILEGED_PORT, MAX_PORT + 1) | |
| # Practical maximum number of reducer instances. | |
| REDUCER_MAX_PORTS = 128 | |
| PORT_LIKE_ATTR_NAMES = [ | |
| "port", | |
| "base_port", | |
| ] | |
| def assign_ports_from_base(base_port: int, clp_config: ClpConfig) -> None: | |
| """ | |
| Assign ports for all active components described in `clp_config` that require a port. Ports are | |
| assigned relative to `base_port`. Assume that port attributes appear directly under each | |
| component with no further nesting, and that each component that requires a port defines only one | |
| port attribute using one of the names in `PORT_LIKE_ATTR_NAMES`. | |
| :param base_port: | |
| :param clp_config: | |
| :raise ValueError: if the base port is out of range or if any required port is in use. | |
| """ | |
| # Discover which components in ClpConfig have a port attribute. | |
| component_port_targets: list[tuple[Any, str, int]] = [] | |
| total_ports_required = 0 | |
| for attr_name, attr_value in vars(clp_config).items(): | |
| if attr_name.startswith("_") or attr_value is None: | |
| continue | |
| port_attr_name: str | None = None | |
| for port_like_name in PORT_LIKE_ATTR_NAMES: | |
| if hasattr(attr_value, port_like_name): | |
| port_attr_name = port_like_name | |
| break | |
| if port_attr_name is None: | |
| continue | |
| ports_required_for_component = 1 | |
| if attr_name == "reducer": | |
| # The reducer cluster needs a block of ports starting at its `base_port`. | |
| ports_required_for_component = REDUCER_MAX_PORTS | |
| component_port_targets.append((attr_value, port_attr_name, ports_required_for_component)) | |
| total_ports_required += ports_required_for_component | |
| # Ensure desired port range is valid and that all ports in the range are available. | |
| desired_port_range = range(base_port, base_port + total_ports_required) | |
| _validate_port_range(port_range=desired_port_range) | |
| _validate_ports_available_in_range(host="127.0.0.1", port_range=desired_port_range) | |
| # Assign ports to the components. | |
| current_port = base_port | |
| for component_config, port_attr_name, ports_required_for_component in component_port_targets: | |
| setattr(component_config, port_attr_name, current_port) | |
| current_port += ports_required_for_component | |
| def _format_inclusive_port_range(port_range: range) -> str: | |
| """ | |
| Return a prettified string describing an inclusive range. | |
| :param port_range: | |
| :return: range description of the form "'start' to "'end' inclusive". | |
| """ | |
| start_port = port_range.start | |
| end_port = port_range.stop - 1 | |
| return f"'{start_port}' to '{end_port}' inclusive" | |
| def _is_port_free(port: int, host: str) -> bool: | |
| """ | |
| Check whether a TCP port is available for binding. | |
| :param port: | |
| :param host: | |
| :return: True if the port can be bound, otherwise False. | |
| """ | |
| with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: | |
| try: | |
| sock.bind((host, port)) | |
| except OSError: | |
| return False | |
| return True | |
| def _validate_ports_available_in_range(host: str, port_range: range) -> None: | |
| """ | |
| Validate that each port in `port_range` is available on `host`. | |
| :param host: | |
| :param port_range: | |
| :raise ValueError: if any port in the range cannot be bound. | |
| """ | |
| for port in port_range: | |
| if not _is_port_free(port=port, host=host): | |
| desired_range_str = _format_inclusive_port_range(port_range) | |
| err_msg = ( | |
| f"Port '{port}' in the desired range ({desired_range_str}) is already in use." | |
| " Choose a different port range for the test environment." | |
| ) | |
| raise ValueError(err_msg) | |
| def _validate_port_range(port_range: range) -> None: | |
| """ | |
| Validate that `port_range` falls completely within `VALID_PORT_RANGE`. | |
| :param port_range: | |
| :raise ValueError: if any part of `port_range` falls outside `VALID_PORT_RANGE`. | |
| """ | |
| required_start_port = port_range.start | |
| required_end_port = port_range.stop - 1 | |
| min_valid_port = VALID_PORT_RANGE.start | |
| max_valid_port = VALID_PORT_RANGE.stop - 1 | |
| if required_start_port < min_valid_port or required_end_port > max_valid_port: | |
| required_range_str = _format_inclusive_port_range(port_range) | |
| valid_range_str = _format_inclusive_port_range(VALID_PORT_RANGE) | |
| err_msg = ( | |
| f"The port range derived from --clp-base-port ({required_range_str}) must fall within" | |
| f" the range of valid ports ({valid_range_str})." | |
| ) | |
| raise ValueError(err_msg) | |
| """Functions for facilitating the port connections for the CLP package.""" | |
| import socket | |
| from dataclasses import dataclass | |
| from typing import Any | |
| from clp_py_utils.clp_config import ClpConfig | |
| # Port constants. | |
| MIN_NON_PRIVILEGED_PORT = 1024 | |
| MAX_PORT = 65535 | |
| VALID_PORT_RANGE = range(MIN_NON_PRIVILEGED_PORT, MAX_PORT + 1) | |
| # Practical maximum number of reducer instances. | |
| REDUCER_MAX_PORTS = 128 | |
| # Attribute names used to discover port configuration in component objects. | |
| PORT_ATTR_NAMES = ["port", "base_port"] | |
| @dataclass | |
| class PortAssignment: | |
| """ | |
| Represents a port assignment for a CLP component. | |
| :param component: The component configuration object that needs port assignment. | |
| :param attr_name: The name of the port attribute on the component. | |
| :param port_count: The number of consecutive ports needed by this component. | |
| """ | |
| component: Any | |
| attr_name: str | |
| port_count: int | |
| def assign_ports_from_base(base_port: int, clp_config: ClpConfig) -> None: | |
| """ | |
| Assign ports to all components in `clp_config` that require them, starting from `base_port`. | |
| Ports are assigned sequentially, with each component receiving the number of ports it requires. | |
| :param base_port: | |
| :param clp_config: | |
| :raise ValueError: If the base port is out of range, or if any required port is in use. | |
| """ | |
| # Discover which components need port assignments. | |
| port_assignments = _discover_port_assignments(clp_config) | |
| total_ports_needed = sum(assignment.port_count for assignment in port_assignments) | |
| # Validate that all required ports are valid and available. | |
| port_range = range(base_port, base_port + total_ports_needed) | |
| _validate_port_range_bounds(port_range) | |
| _check_ports_available(host="127.0.0.1", port_range=port_range) | |
| # Assign ports to the components. | |
| current_port = base_port | |
| for assignment in port_assignments: | |
| setattr(assignment.component, assignment.attr_name, current_port) | |
| current_port += assignment.port_count | |
| def _check_ports_available(host: str, port_range: range) -> None: | |
| """ | |
| Check that all ports in the given range are available for binding. | |
| :param host: | |
| :param port_range: | |
| :raise ValueError: If any port in the range is already in use. | |
| """ | |
| for port in port_range: | |
| if not _is_port_free(port=port, host=host): | |
| range_str = _format_port_range(port_range) | |
| err_msg = ( | |
| f"Port '{port}' in the desired range ({range_str}) is already in use. " | |
| "Choose a different port range for the test environment." | |
| ) | |
| raise ValueError(err_msg) | |
| def _discover_port_assignments(clp_config: ClpConfig) -> list[PortAssignment]: | |
| """ | |
| Discover which components in `clp_config` require port assignments. | |
| :param clp_config: | |
| :return: A list of PortAssignment objects describing what ports each component needs. | |
| """ | |
| port_assignments: list[PortAssignment] = [] | |
| for component_name, component_config in vars(clp_config).items(): | |
| # Skip private attributes and None values. | |
| if component_name.startswith("_") or component_config is None: | |
| continue | |
| # Check if this component has a port attribute. | |
| port_attr_name = None | |
| for attr_name in PORT_ATTR_NAMES: | |
| if hasattr(component_config, attr_name): | |
| port_attr_name = attr_name | |
| break | |
| if port_attr_name is None: | |
| continue | |
| # Determine how many ports this component needs. | |
| port_count = REDUCER_MAX_PORTS if component_name == "reducer" else 1 | |
| port_assignments.append( | |
| PortAssignment( | |
| component=component_config, | |
| attr_name=port_attr_name, | |
| port_count=port_count, | |
| ) | |
| ) | |
| return port_assignments | |
| def _format_port_range(port_range: range) -> str: | |
| """ | |
| Format a port range as a human-readable string. | |
| :param port_range: | |
| :return: A string like "'1024' to '65535' inclusive". | |
| """ | |
| start_port = port_range.start | |
| end_port = port_range.stop - 1 | |
| return f"'{start_port}' to '{end_port}' inclusive" | |
| def _is_port_free(port: int, host: str) -> bool: | |
| """ | |
| Check whether a TCP port is available for binding. | |
| :param port: | |
| :param host: | |
| :return: True if the port can be bound, otherwise False. | |
| """ | |
| with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: | |
| try: | |
| sock.bind((host, port)) | |
| except OSError: | |
| return False | |
| return True | |
| def _validate_port_range_bounds(port_range: range) -> None: | |
| """ | |
| Validates that the given port range falls within the valid port range. | |
| :param port_range: | |
| :raise ValueError: If any part of the range falls outside valid port numbers. | |
| """ | |
| start_port = port_range.start | |
| end_port = port_range.stop - 1 | |
| min_valid_port = VALID_PORT_RANGE.start | |
| max_valid_port = VALID_PORT_RANGE.stop - 1 | |
| if start_port < min_valid_port or end_port > max_valid_port: | |
| required_range_str = _format_port_range(port_range) | |
| valid_range_str = _format_port_range(VALID_PORT_RANGE) | |
| err_msg = ( | |
| f"The port range derived from --clp-base-port ({required_range_str}) must fall within" | |
| f" the range of valid ports ({valid_range_str})." | |
| ) | |
| raise ValueError(err_msg) | |
| :param parser: | ||
| """ | ||
| parser.addoption( | ||
| "--clp-base-port", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is an option to the CLP integration test script (rather than an env var that others might read), the clp- prefix isn't adding much value. How about just --base-port?
BASE_PORT to avoid package spin-up failure.
Description
This PR adds a system that assigns a port to each component of the clp package. The intent is to avoid port conflicts on package startup.
--clp-base-portis an optional CLI flag which has a default value of55000if it is not specified. A port range is constructed starting at that base port and stretching to accommodate the maximum number of ports needed by the CLP package described in a givenClpConfigobject. The system checks whether or not each port within this range is available, and if they are all available, the system assigns ports to the components appropriately.PR dependencies
This PR depends on the following being merged:
Checklist
breaking change.
Validation performed
Ran the following command from
clp/integration-tests:uv run pyetst -m 'package'Both the
clp-jsonandclp-texttests passed.Summary by CodeRabbit
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.