Skip to content

Conversation

@ashrithb
Copy link
Contributor

@ashrithb ashrithb commented Dec 7, 2025

What changes were proposed in this pull request?

In ANSI mode, avg(timestamp) and sum(timestamp) now fail with a proper
type error instead of silently casting to double.

Why are the changes needed?

Per SQL standards, AVG is only defined for numeric types. The current behavior
(inherited from Hive compatibility) silently casts timestamp to epoch seconds
as a double, which:

  1. Returns confusing results (e.g., -979200.0 for 1969-12-21)
  2. Violates ANSI SQL semantics
  3. Differs from other databases like PostgreSQL which reject this

Does this PR introduce any user-facing change?

Yes. In ANSI mode (spark.sql.ansi.enabled=true), queries like
SELECT avg(timestamp_col) FROM table will now fail with: DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE: The 1st parameter requires the NUMERIC or ANSI INTERVAL type, however timestamp_col has the type TIMESTAMP.

Non-ANSI mode behavior is unchanged (backward compatible).

How was this patch tested?

  • Added unit tests in AnsiTypeCoercionSuite
  • Added unit tests in TypeCoercionSuite
  • All existing tests pass

Was this patch authored or co-authored using generative AI tooling?

No, except for PR description.

@github-actions github-actions bot added the SQL label Dec 7, 2025
@ashrithb ashrithb force-pushed the SPARK-54372-avg-timestamp-ansi-check branch from 1a3009c to 856efec Compare December 7, 2025 06:30
@ashrithb ashrithb changed the title [SPARK-54372][SQL] ANSI mode should reject avg/sum on timestamp types [WIP][SPARK-54372][SQL] ANSI mode should reject avg/sum on timestamp types Dec 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant