-
Notifications
You must be signed in to change notification settings - Fork 99
Updated String Expression Factory Type to BigQuery SQL #1575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated String Expression Factory Type to BigQuery SQL #1575
Conversation
28e5247 to
0551988
Compare
|
|
||
| // If the BigQuery capability is required, ensure the SQL engine supports this capability. | ||
| return requiresBigQueryCapability ? | ||
| ctx.getEngine().getExpressionFactory(StringExpressionFactoryType.SQL, StandardSQLCapabilities.BIGQUERY) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you don't need to check for SQL as I suppose BigQuery would imply SQL.
core-plugins/src/main/java/io/cdap/plugin/batch/aggregator/DedupAggregatorUtils.java
Show resolved
Hide resolved
| put(GroupByConfig.Function.COUNT, "COUNT(%s)"); | ||
| put(GroupByConfig.Function.COUNTNULLS, "SUM(CASE WHEN %s IS NULL THEN 1 END)"); | ||
| put(GroupByConfig.Function.COUNTDISTINCT, | ||
| "COUNT(DISTINCT %s) + IF(SUM(CASE WHEN %s IS NULL THEN 1 END) > 0, 1, 0)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to use case instead of if to be ansi compatible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
core-plugins/src/main/java/io/cdap/plugin/batch/aggregator/GroupByAggregator.java
Show resolved
Hide resolved
0551988 to
10bdde8
Compare
core-plugins/src/main/java/io/cdap/plugin/batch/aggregator/DedupAggregatorUtils.java
Show resolved
Hide resolved
d2b0686 to
80367a4
Compare
| put(GroupByConfig.Function.COUNT, "COUNT(%s)"); | ||
| put(GroupByConfig.Function.COUNTNULLS, "SUM(CASE WHEN %s IS NULL THEN 1 ELSE 0 END)"); | ||
| put(GroupByConfig.Function.COUNTDISTINCT, | ||
| "COUNT(DISTINCT %s) + CASE WHEN SUM(CASE WHEN %>s IS NULL THEN 1 ELSE 0 END) > 0 THEN 1 ELSE 0 END"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can this be COUNT(DISTINCT %s) + MAX(CASE WHEN %>s IS NULL THEN 1 ELSE 0 END) or it would produce a wrong result for empty set? Two nested CASE are hard to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This expression can be made safe by adding a Coalesce:
COUNT(DISTINCT %s) + COALESCE(MAX(CASE WHEN %>s IS NULL THEN 1 ELSE 0 END), 0)
ca123bd to
8df3615
Compare
56215bf to
cdcdfa3
Compare
cdcdfa3 to
e12c313
Compare
Depends on cdapio/cdap#14178