Skip to content

Conversation

@fernst
Copy link
Member

@fernst fernst commented Apr 11, 2022

Depends on cdapio/cdap#14178

@fernst fernst force-pushed the add-bigquery-sql-string-expression-factory-type branch from 28e5247 to 0551988 Compare April 11, 2022 23:39

// If the BigQuery capability is required, ensure the SQL engine supports this capability.
return requiresBigQueryCapability ?
ctx.getEngine().getExpressionFactory(StringExpressionFactoryType.SQL, StandardSQLCapabilities.BIGQUERY) :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you don't need to check for SQL as I suppose BigQuery would imply SQL.

put(GroupByConfig.Function.COUNT, "COUNT(%s)");
put(GroupByConfig.Function.COUNTNULLS, "SUM(CASE WHEN %s IS NULL THEN 1 END)");
put(GroupByConfig.Function.COUNTDISTINCT,
"COUNT(DISTINCT %s) + IF(SUM(CASE WHEN %s IS NULL THEN 1 END) > 0, 1, 0)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use case instead of if to be ansi compatible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@fernst fernst force-pushed the add-bigquery-sql-string-expression-factory-type branch from 0551988 to 10bdde8 Compare April 12, 2022 03:46
@fernst fernst requested a review from tivv April 12, 2022 16:47
@fernst fernst force-pushed the add-bigquery-sql-string-expression-factory-type branch 3 times, most recently from d2b0686 to 80367a4 Compare April 13, 2022 16:38
@fernst fernst requested a review from tivv April 13, 2022 17:23
put(GroupByConfig.Function.COUNT, "COUNT(%s)");
put(GroupByConfig.Function.COUNTNULLS, "SUM(CASE WHEN %s IS NULL THEN 1 ELSE 0 END)");
put(GroupByConfig.Function.COUNTDISTINCT,
"COUNT(DISTINCT %s) + CASE WHEN SUM(CASE WHEN %>s IS NULL THEN 1 ELSE 0 END) > 0 THEN 1 ELSE 0 END");
Copy link
Contributor

@tivv tivv Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can this be COUNT(DISTINCT %s) + MAX(CASE WHEN %>s IS NULL THEN 1 ELSE 0 END) or it would produce a wrong result for empty set? Two nested CASE are hard to read

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This expression can be made safe by adding a Coalesce:

COUNT(DISTINCT %s) + COALESCE(MAX(CASE WHEN %>s IS NULL THEN 1 ELSE 0 END), 0)

@sanjanasandeep sanjanasandeep force-pushed the add-bigquery-sql-string-expression-factory-type branch 3 times, most recently from ca123bd to 8df3615 Compare April 13, 2022 21:20
@fernst fernst added the build Trigger unit test build label Apr 15, 2022
@fernst fernst force-pushed the add-bigquery-sql-string-expression-factory-type branch 2 times, most recently from 56215bf to cdcdfa3 Compare April 15, 2022 21:20
@fernst fernst force-pushed the add-bigquery-sql-string-expression-factory-type branch from cdcdfa3 to e12c313 Compare April 15, 2022 21:30
@fernst fernst merged commit 7efd94a into develop Apr 15, 2022
@fernst fernst deleted the add-bigquery-sql-string-expression-factory-type branch April 15, 2022 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Trigger unit test build

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants