Skip to content

Feat/furnace sql#113

Open
nicornk wants to merge 5 commits intomainfrom
feat/furnace-sql
Open

Feat/furnace sql#113
nicornk wants to merge 5 commits intomainfrom
feat/furnace-sql

Conversation

@nicornk
Copy link
Contributor

@nicornk nicornk commented Nov 27, 2025

Summary

Checklist

  • You agree with our CLA
  • Included tests (or is not applicable).
  • Updated documentation (or is not applicable).
  • Used pre-commit hooks to format and lint the code.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for a new “V2” Foundry SQL Server query flow (execute → poll status → stream results) and wires it into the context plus error handling, with integration tests covering the new client behavior.

Changes:

  • Introduce FoundrySqlServerClientV2 with Arrow-stream result retrieval.
  • Add FurnaceSql:SqlParseErrorFurnaceSqlSqlParseError mapping and a new SQL parse error class.
  • Add integration tests for the V2 client and expose it via FoundryContext.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/integration/clients/test_foundry_sql_server.py Adds V2 integration tests (smoke, return types, parse error, compression flag).
libs/foundry-dev-tools/src/foundry_dev_tools/errors/sql.py Adds a new Furnace SQL parse exception type.
libs/foundry-dev-tools/src/foundry_dev_tools/errors/handling.py Maps Furnace SQL parse errorName to the new exception.
libs/foundry-dev-tools/src/foundry_dev_tools/config/context.py Exposes foundry_sql_server_v2 as a cached context property.
libs/foundry-dev-tools/src/foundry_dev_tools/clients/foundry_sql_server.py Implements the V2 SQL client (execute/status/stream).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 399 to 403
# Poll for completion
while response_json.get("type") != "success":
time.sleep(0.2)
response = self.api_status(query_identifier)
response_json = response.json()
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The polling loop has no timeout/escape condition and no way to pass a request timeout to api_status, so a query that never reaches type=success can hang indefinitely (default read timeout is None). Implement a timeout (like V1 does) and raise FoundrySqlQueryClientTimedOutError when exceeded.

Copilot uses AI. Check for mistakes.
Comment on lines 406 to 407
raise FoundrySqlQueryFailedError(response)

Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On V2 failures you raise FoundrySqlQueryFailedError(response), but that exception extracts the message from the V1 response shape (status.failed.errorMessage). If the V2 failed payload differs, this will likely produce empty/incorrect error details. Consider adding a V2-specific error type (or updating FoundrySqlQueryFailedError to handle both shapes) so users get actionable messages.

Suggested change
raise FoundrySqlQueryFailedError(response)
# Adapt V2 failure payload to the V1 error shape expected by FoundrySqlQueryFailedError.
# Try to extract a meaningful message from the V2 response JSON.
message: str = (
response_json.get("message")
or response_json.get("error", {}).get("message")
or response_json.get("failed", {}).get("errorMessage")
or "SQL query failed"
)
class _V2ErrorResponseAdapter:
"""Minimal adapter to provide a V1-shaped JSON payload for error handling."""
def __init__(self, msg: str) -> None:
self.status_code = getattr(response, "status_code", 400)
self.text = msg
self._msg = msg
def json(self) -> dict[str, Any]:
# Shape compatible with V1: status.failed.errorMessage
return {
"status": {
"failed": {
"errorMessage": self._msg,
}
}
}
raise FoundrySqlQueryFailedError(_V2ErrorResponseAdapter(message))

Copilot uses AI. Check for mistakes.
Comment on lines 123 to 127
schema, rows = TEST_SINGLETON.ctx.foundry_sql_server_v2.query_foundry_sql(
query=f"SELECT sepal_width, sepal_length FROM `{TEST_SINGLETON.iris_new.rid}` LIMIT 3",
application_id=TEST_SINGLETON.iris_new.rid,
return_type="raw",
)
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test, schema and rows are assigned but never used. Ruff will flag this as unused local variables (F841) even though the call is expected to raise. Consider calling the method without unpacking, or assigning to _ placeholders to avoid lint failures.

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +28
class FurnaceSqlSqlParseError(FoundryAPIError):
"""Exception is thrown when SQL Query is not valid."""

message = "Foundry SQL Query Parsing Failed."


Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception class name FurnaceSqlSqlParseError contains a duplicated "Sql" segment, which makes the public API a bit confusing and inconsistent with other error names in this module. Consider renaming it to something like FurnaceSqlParseError / FurnaceSqlQueryParseError and updating the error mapping + tests accordingly.

Suggested change
class FurnaceSqlSqlParseError(FoundryAPIError):
"""Exception is thrown when SQL Query is not valid."""
message = "Foundry SQL Query Parsing Failed."
class FurnaceSqlParseError(FoundryAPIError):
"""Exception is thrown when SQL Query is not valid."""
message = "Foundry SQL Query Parsing Failed."
# Backwards compatibility: keep the old name as an alias.
FurnaceSqlSqlParseError = FurnaceSqlParseError

Copilot uses AI. Check for mistakes.
Comment on lines 355 to 360
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query_foundry_sql overloads advertise a timeout parameter and require application_id, but the concrete implementation signature does not accept timeout and allows application_id=None. This mismatch breaks type-checking and also prevents callers from setting a client-side timeout for V2 queries.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 315 to 363
@overload
def query_foundry_sql(
self,
query: str,
application_id: str,
return_type: Literal["pandas"],
disable_arrow_compression: bool = ...,
timeout: int = ...,
) -> pd.core.frame.DataFrame: ...

@overload
def query_foundry_sql(
self,
query: str,
return_type: Literal["polars"],
branch: Ref = ...,
sql_dialect: SqlDialect = ...,
timeout: int = ...,
) -> pl.DataFrame: ...

@overload
def query_foundry_sql(
self,
query: str,
application_id: str,
return_type: Literal["spark"],
disable_arrow_compression: bool = ...,
timeout: int = ...,
) -> pyspark.sql.DataFrame: ...

@overload
def query_foundry_sql(
self,
query: str,
application_id: str,
return_type: Literal["arrow"],
disable_arrow_compression: bool = ...,
timeout: int = ...,
) -> pa.Table: ...

@overload
def query_foundry_sql(
self,
query: str,
application_id: str,
return_type: SQLReturnType = ...,
disable_arrow_compression: bool = ...,
timeout: int = ...,
) -> tuple[dict, list[list]] | pd.core.frame.DataFrame | pl.DataFrame | pa.Table | pyspark.sql.DataFrame: ...
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout parameter is declared in all overload signatures (lines 322, 332, 342, 352, 362) but is never accepted in the actual function implementation (line 365-371) nor is it used anywhere in the method body. Either implement timeout handling (similar to how V1 client does it in lines 149-150 of the same file) or remove the parameter from the overload signatures. This creates a misleading API contract where callers think they can pass a timeout, but it will be silently ignored.

Copilot uses AI. Check for mistakes.
Comment on lines 315 to 370
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overload signatures declare application_id as a required positional parameter (line 319, 339, 349, 359), but the actual implementation (line 370) declares it as an optional parameter with default None. This creates a type checking inconsistency. If application_id should be optional (as the implementation suggests), it should be marked as optional in the overload signatures with application_id: str | None = .... Alternatively, if it should be required, the implementation should not have a default value and should raise an error if not provided, rather than falling back to User-Agent.

Copilot uses AI. Check for mistakes.
Comment on lines 459 to 463
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method directly accesses response_json[response_json["type"]]["query"] without checking if the keys exist. If the response structure is unexpected or if response_json["type"] doesn't match a key in the response, this will raise a KeyError. Consider adding defensive checks or using .get() with appropriate error handling to provide clearer error messages when the API response doesn't match the expected format.

Suggested change
if response_json["type"] == "triggered" and "plan" in response_json["triggered"]:
plan = response_json["triggered"]["plan"]
LOGGER.debug("plan %s", plan)
return response_json[response_json["type"]]["query"]
response_type = response_json.get("type")
if not isinstance(response_type, str):
msg = f"Unexpected execute response: missing or invalid 'type' field in {response_json!r}"
raise ValueError(msg)
# Log plan details for triggered queries, if present
if response_type == "triggered":
triggered_section = response_json.get("triggered") or {}
plan = triggered_section.get("plan")
if plan is not None:
LOGGER.debug("plan %s", plan)
section = response_json.get(response_type)
if not isinstance(section, dict):
msg = (
f"Unexpected execute response: missing or invalid '{response_type}' section "
f"in {response_json!r}"
)
raise ValueError(msg)
query_identifier = section.get("query")
if not isinstance(query_identifier, dict):
msg = (
f"Unexpected execute response: missing or invalid 'query' field in "
f"'{response_type}' section: {section!r}"
)
raise ValueError(msg)
return query_identifier

Copilot uses AI. Check for mistakes.
Comment on lines +429 to +431
# throws an ImportError when trying to access attributes of the module.
# This ImportError is caught below to fall back to query_foundry_sql_legacy
# which will again raise an ImportError when polars is not installed.
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions falling back to query_foundry_sql_legacy when polars is not installed, but this fallback logic is not actually implemented in the V2 client. The V1 client has this fallback (lines 156-197), but the V2 implementation will just raise an ImportError directly if polars is not installed. Either remove this misleading comment or implement the fallback behavior if it's intended.

Suggested change
# throws an ImportError when trying to access attributes of the module.
# This ImportError is caught below to fall back to query_foundry_sql_legacy
# which will again raise an ImportError when polars is not installed.
# throws an ImportError when trying to access attributes of the module
# if polars is not installed. In that case, this ImportError will
# propagate to the caller and no automatic fallback is performed.

Copilot uses AI. Check for mistakes.
Comment on lines 329 to 332
return_type: Literal["polars"],
branch: Ref = ...,
sql_dialect: SqlDialect = ...,
timeout: int = ...,
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overload signature for the "polars" return type is inconsistent with other overloads and the actual implementation. This overload specifies branch and sql_dialect parameters, but the actual implementation (line 365-371) does not accept these parameters. Instead, it accepts application_id and disable_arrow_compression like the other overloads. The parameters should match: application_id: str, disable_arrow_compression: bool = ..., and timeout should be removed if not implemented (see other issue). The V2 API uses a different flow than V1 and does not use branch/sql_dialect parameters.

Suggested change
return_type: Literal["polars"],
branch: Ref = ...,
sql_dialect: SqlDialect = ...,
timeout: int = ...,
application_id: str,
return_type: Literal["polars"],
disable_arrow_compression: bool = ...,

Copilot uses AI. Check for mistakes.

def test_v2_polars_return_type():
polars_df = TEST_SINGLETON.ctx.foundry_sql_server_v2.query_foundry_sql(
f"SELECT sepal_length FROM `{TEST_SINGLETON.iris_new.rid}` LIMIT 2",
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is missing the application_id parameter. While the implementation defaults to using the User-Agent header when application_id is None (line 400), all other V2 tests explicitly provide this parameter. For consistency and to properly test the API contract, this test should also pass application_id=TEST_SINGLETON.iris_new.rid like the other tests do.

Suggested change
f"SELECT sepal_length FROM `{TEST_SINGLETON.iris_new.rid}` LIMIT 2",
query=f"SELECT sepal_length FROM `{TEST_SINGLETON.iris_new.rid}` LIMIT 2",
application_id=TEST_SINGLETON.iris_new.rid,

Copilot uses AI. Check for mistakes.
Comment on lines 410 to 417
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The polling loop lacks timeout protection and could run indefinitely if the query never completes or the server stops responding with "success" or "failed" status. Unlike the V1 client which implements timeout checking (see lines 149-150), this implementation has no safeguard against infinite loops. Consider adding timeout tracking similar to V1, or at minimum add a maximum iteration count to prevent runaway polling. The docstring claims to raise FoundrySqlQueryClientTimedOutError on timeout, but this is never actually raised.

Suggested change
while response_json.get("type") != "success":
time.sleep(0.2)
response = self.api_status(query_identifier)
response_json = response.json()
if response_json.get("type") == "failed":
raise FoundrySqlQueryFailedError(response)
start_time = time.monotonic()
while response_json.get("type") not in ("success", "failed"):
if timeout is not None and time.monotonic() - start_time > timeout:
raise FoundrySqlQueryClientTimedOutError(
f"Query timed out after {timeout} seconds."
)
time.sleep(0.2)
response = self.api_status(query_identifier)
response_json = response.json()
if response_json.get("type") == "failed":
raise FoundrySqlQueryFailedError(response)

Copilot uses AI. Check for mistakes.

Raises:
FoundrySqlQueryFailedError: If the query fails
FoundrySqlQueryClientTimedOutError: If the query times out
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring claims this method raises FoundrySqlQueryClientTimedOutError on timeout, but timeout handling is not implemented in the V2 client. This is inconsistent with the actual behavior. Either implement timeout handling or remove this from the Raises section of the docstring.

Suggested change
FoundrySqlQueryClientTimedOutError: If the query times out

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method directly accesses deeply nested keys response_json["success"]["result"]["interactive"]["chunks"] without defensive checks. If any intermediate key is missing, this will raise a KeyError with an unclear error message. Consider adding proper error handling or using .get() with default values to provide more helpful error messages when the API response structure is unexpected.

Suggested change
chunks = response_json["success"]["result"]["interactive"]["chunks"]
try:
chunks = response_json["success"]["result"]["interactive"]["chunks"]
except (KeyError, TypeError) as exc:
msg = (
"Unexpected structure in success response: expected "
"'success.result.interactive.chunks' to be present."
)
raise ValueError(msg) from exc

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

def test_v2_return_type_raw_not_supported():
"""Test V2 client with raw return type."""
with pytest.raises(ValueError, match="The following return_type is not supported: .+"):
schema, rows = TEST_SINGLETON.ctx.foundry_sql_server_v2.query_foundry_sql(
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside this pytest.raises block, assigning to schema, rows is unnecessary and will be flagged by Ruff as an unused assignment. Just call query_foundry_sql(...) without binding the result.

Suggested change
schema, rows = TEST_SINGLETON.ctx.foundry_sql_server_v2.query_foundry_sql(
TEST_SINGLETON.ctx.foundry_sql_server_v2.query_foundry_sql(

Copilot uses AI. Check for mistakes.
import pyspark
import requests

LOGGER = logging.getLogger(__name__)
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOGGER is defined but never used in this module. This will be flagged by linting (unused assignment) and should either be removed or used for actual logging (e.g., debug/info around query execution/polling).

Suggested change
LOGGER = logging.getLogger(__name__)

Copilot uses AI. Check for mistakes.
Comment on lines +359 to +378
@overload
def query_foundry_sql(
self,
query: str,
return_type: SQLReturnType = ...,
branch: Ref = ...,
sql_dialect: SqlDialect = ...,
arrow_compression_codec: ArrowCompressionCodec = ...,
timeout: int = ...,
) -> tuple[dict, list[list]] | pd.core.frame.DataFrame | pl.DataFrame | pa.Table | pyspark.sql.DataFrame: ...

def query_foundry_sql(
self,
query: str,
return_type: SQLReturnType = "pandas",
branch: Ref = "master",
sql_dialect: SqlDialect = "SPARK",
arrow_compression_codec: ArrowCompressionCodec = "NONE",
timeout: int = 600,
) -> tuple[dict, list[list]] | pd.core.frame.DataFrame | pl.DataFrame | pa.Table | pyspark.sql.DataFrame:
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query_foundry_sql in the V2 client is typed as accepting SQLReturnType and advertises a possible tuple[dict, list[list]] return, but the implementation never returns a raw (schema, rows) tuple and instead raises ValueError for unsupported types (including raw). Either implement raw support or narrow the accepted/overload types and return annotation to only the supported return types.

Copilot uses AI. Check for mistakes.
Comment on lines +507 to +545
def api_query(
self,
query: str,
dialect: SqlDialect,
branch: Ref,
arrow_compression_codec: ArrowCompressionCodec = "NONE",
**kwargs,
) -> requests.Response:
"""Execute a SQL query via the V2 API.

Args:
query: The SQL query string
dialect: The SQL dialect to use
branch: The dataset branch to query
arrow_compression_codec: Arrow compression codec (NONE, LZ4, ZSTD)
**kwargs: gets passed to :py:meth:`APIClient.api_request`

Returns:
Response with query handle and initial status

"""
return self.api_request(
"POST",
"sql-endpoint/v1/queries/query",
json={
"querySpec": {
"query": query,
"tableProviders": {},
"dialect": dialect,
"options": {"options": [{"option": "arrowCompressionCodec", "value": arrow_compression_codec}]},
},
"executionParams": {
"defaultBranchIds": [{"type": "datasetBranch", "datasetBranch": branch}],
"resultFormat": "ARROW",
"resultMode": "AUTO",
},
},
**kwargs,
)
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api_query (V2) doesn't pass an ErrorHandlingConfig and doesn't validate dialect/arrow_compression_codec (unlike V1 which calls assert_in_literal and passes contextual kwargs via ErrorHandlingConfig). Consider adding assert_in_literal checks and passing error_handling=ErrorHandlingConfig(branch=..., dialect=..., timeout=...) so raised API errors include the same context as V1.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments