Skip to content
This repository was archived by the owner on Jun 14, 2024. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
208ea5e
gold standard initial commit
apoorvedave1 Feb 19, 2021
cc14991
fix q32
apoorvedave1 Feb 19, 2021
7c8ee78
Merge branch 'master' of github.com:apoorvedave1/hyperspace-1 into gs
apoorvedave1 Feb 19, 2021
fa9bd4a
Merge branch 'master' of github.com:apoorvedave1/hyperspace-1 into gs
apoorvedave1 Mar 10, 2021
4b007a5
keep only tpcds v1.4 and remove others
apoorvedave1 Mar 10, 2021
59207f6
Merge remote-tracking branch 'upstream/master' into gs_initial
apoorvedave1 Mar 10, 2021
5c0cee9
Trigger Build
apoorvedave1 Mar 10, 2021
900539d
build error: test with sequential run
apoorvedave1 Mar 10, 2021
530dfa7
revert previous commit
apoorvedave1 Mar 10, 2021
add01f1
update plans
apoorvedave1 Mar 10, 2021
8b58b6b
update plan
apoorvedave1 Mar 10, 2021
97e8441
added sorting for fixing build pipeline
apoorvedave1 Mar 11, 2021
411450f
udpated plans with sorting
apoorvedave1 Mar 11, 2021
b70f00b
fix q49
apoorvedave1 Mar 11, 2021
7880f40
updated instructions on how to run tests
apoorvedave1 Mar 11, 2021
3dcd2d5
test with updated plans for q47, q49
apoorvedave1 Mar 12, 2021
a7a1149
update q47, 49 plans
apoorvedave1 Mar 12, 2021
dd295a9
remove rogue query
apoorvedave1 Mar 12, 2021
f47fbd3
remove q49
apoorvedave1 Mar 12, 2021
d15a4e8
fix scalastyle
apoorvedave1 Mar 12, 2021
4c511fb
add query files for q49.sql
apoorvedave1 Mar 12, 2021
e48c713
restructuring
apoorvedave1 Mar 12, 2021
6c5a2f7
cleanup before rebase
apoorvedave1 Mar 13, 2021
25cb361
Merge remote-tracking branch 'upstream/master' into gs_initial
apoorvedave1 Mar 13, 2021
0b40853
updated plans based on the plan stability suite
apoorvedave1 Mar 13, 2021
32f5899
normalize location: fix
apoorvedave1 Mar 15, 2021
5d7eebd
commit with code and q1 plans
apoorvedave1 Mar 15, 2021
d44ecc0
simplify TPCDSBase
apoorvedave1 Mar 16, 2021
ff1a08a
fix a review comment
apoorvedave1 Mar 16, 2021
e1b9e79
Trigger Build
apoorvedave1 Mar 16, 2021
6976a4b
refactor spark confgs
apoorvedave1 Mar 16, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
296 changes: 42 additions & 254 deletions src/test/resources/tpcds/spark-2.4/approved-plans-v1_4/q1/explain.txt
Original file line number Diff line number Diff line change
@@ -1,255 +1,43 @@
== Physical Plan ==
TakeOrderedAndProject (44)
+- * Project (43)
+- * BroadcastHashJoin Inner BuildRight (42)
:- * Project (37)
: +- * BroadcastHashJoin Inner BuildRight (36)
: :- * Project (30)
: : +- * BroadcastHashJoin Inner BuildRight (29)
: : :- * Filter (14)
: : : +- * HashAggregate (13)
: : : +- Exchange (12)
: : : +- * HashAggregate (11)
: : : +- * Project (10)
: : : +- * BroadcastHashJoin Inner BuildRight (9)
: : : :- * Filter (3)
: : : : +- * ColumnarToRow (2)
: : : : +- Scan parquet default.store_returns (1)
: : : +- BroadcastExchange (8)
: : : +- * Project (7)
: : : +- * Filter (6)
: : : +- * ColumnarToRow (5)
: : : +- Scan parquet default.date_dim (4)
: : +- BroadcastExchange (28)
: : +- * Filter (27)
: : +- * HashAggregate (26)
: : +- Exchange (25)
: : +- * HashAggregate (24)
: : +- * HashAggregate (23)
: : +- Exchange (22)
: : +- * HashAggregate (21)
: : +- * Project (20)
: : +- * BroadcastHashJoin Inner BuildRight (19)
: : :- * Filter (17)
: : : +- * ColumnarToRow (16)
: : : +- Scan parquet default.store_returns (15)
: : +- ReusedExchange (18)
: +- BroadcastExchange (35)
: +- * Project (34)
: +- * Filter (33)
: +- * ColumnarToRow (32)
: +- Scan parquet default.store (31)
+- BroadcastExchange (41)
+- * Filter (40)
+- * ColumnarToRow (39)
+- Scan parquet default.customer (38)


(1) Scan parquet default.store_returns
Output [4]: [sr_returned_date_sk#1, sr_customer_sk#2, sr_store_sk#3, sr_return_amt#4]
Batched: true
Location [not included in comparison]/{warehouse_dir}/store_returns]
PushedFilters: [IsNotNull(sr_returned_date_sk), IsNotNull(sr_store_sk), IsNotNull(sr_customer_sk)]
ReadSchema: struct<sr_returned_date_sk:bigint,sr_customer_sk:bigint,sr_store_sk:bigint,sr_return_amt:decimal(7,2)>

(2) ColumnarToRow [codegen id : 2]
Input [4]: [sr_returned_date_sk#1, sr_customer_sk#2, sr_store_sk#3, sr_return_amt#4]

(3) Filter [codegen id : 2]
Input [4]: [sr_returned_date_sk#1, sr_customer_sk#2, sr_store_sk#3, sr_return_amt#4]
Condition : ((isnotnull(sr_returned_date_sk#1) AND isnotnull(sr_store_sk#3)) AND isnotnull(sr_customer_sk#2))

(4) Scan parquet default.date_dim
Output [2]: [d_date_sk#5, d_year#6]
Batched: true
Location [not included in comparison]/{warehouse_dir}/date_dim]
PushedFilters: [IsNotNull(d_year), EqualTo(d_year,2000), IsNotNull(d_date_sk)]
ReadSchema: struct<d_date_sk:int,d_year:int>

(5) ColumnarToRow [codegen id : 1]
Input [2]: [d_date_sk#5, d_year#6]

(6) Filter [codegen id : 1]
Input [2]: [d_date_sk#5, d_year#6]
Condition : ((isnotnull(d_year#6) AND (d_year#6 = 2000)) AND isnotnull(d_date_sk#5))

(7) Project [codegen id : 1]
Output [1]: [d_date_sk#5]
Input [2]: [d_date_sk#5, d_year#6]

(8) BroadcastExchange
Input [1]: [d_date_sk#5]
Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#7]

(9) BroadcastHashJoin [codegen id : 2]
Left keys [1]: [sr_returned_date_sk#1]
Right keys [1]: [cast(d_date_sk#5 as bigint)]
Join condition: None

(10) Project [codegen id : 2]
Output [3]: [sr_customer_sk#2, sr_store_sk#3, sr_return_amt#4]
Input [5]: [sr_returned_date_sk#1, sr_customer_sk#2, sr_store_sk#3, sr_return_amt#4, d_date_sk#5]

(11) HashAggregate [codegen id : 2]
Input [3]: [sr_customer_sk#2, sr_store_sk#3, sr_return_amt#4]
Keys [2]: [sr_customer_sk#2, sr_store_sk#3]
Functions [1]: [partial_sum(UnscaledValue(sr_return_amt#4))]
Aggregate Attributes [1]: [sum#8]
Results [3]: [sr_customer_sk#2, sr_store_sk#3, sum#9]

(12) Exchange
Input [3]: [sr_customer_sk#2, sr_store_sk#3, sum#9]
Arguments: hashpartitioning(sr_customer_sk#2, sr_store_sk#3, 5), true, [id=#10]

(13) HashAggregate [codegen id : 9]
Input [3]: [sr_customer_sk#2, sr_store_sk#3, sum#9]
Keys [2]: [sr_customer_sk#2, sr_store_sk#3]
Functions [1]: [sum(UnscaledValue(sr_return_amt#4))]
Aggregate Attributes [1]: [sum(UnscaledValue(sr_return_amt#4))#11]
Results [3]: [sr_customer_sk#2 AS ctr_customer_sk#12, sr_store_sk#3 AS ctr_store_sk#13, MakeDecimal(sum(UnscaledValue(sr_return_amt#4))#11,17,2) AS ctr_total_return#14]

(14) Filter [codegen id : 9]
Input [3]: [ctr_customer_sk#12, ctr_store_sk#13, ctr_total_return#14]
Condition : isnotnull(ctr_total_return#14)

(15) Scan parquet default.store_returns
Output [4]: [sr_returned_date_sk#1, sr_customer_sk#2, sr_store_sk#3, sr_return_amt#4]
Batched: true
Location [not included in comparison]/{warehouse_dir}/store_returns]
PushedFilters: [IsNotNull(sr_returned_date_sk), IsNotNull(sr_store_sk)]
ReadSchema: struct<sr_returned_date_sk:bigint,sr_customer_sk:bigint,sr_store_sk:bigint,sr_return_amt:decimal(7,2)>

(16) ColumnarToRow [codegen id : 4]
Input [4]: [sr_returned_date_sk#1, sr_customer_sk#2, sr_store_sk#3, sr_return_amt#4]

(17) Filter [codegen id : 4]
Input [4]: [sr_returned_date_sk#1, sr_customer_sk#2, sr_store_sk#3, sr_return_amt#4]
Condition : (isnotnull(sr_returned_date_sk#1) AND isnotnull(sr_store_sk#3))

(18) ReusedExchange [Reuses operator id: 8]
Output [1]: [d_date_sk#5]

(19) BroadcastHashJoin [codegen id : 4]
Left keys [1]: [sr_returned_date_sk#1]
Right keys [1]: [cast(d_date_sk#5 as bigint)]
Join condition: None

(20) Project [codegen id : 4]
Output [3]: [sr_customer_sk#2, sr_store_sk#3, sr_return_amt#4]
Input [5]: [sr_returned_date_sk#1, sr_customer_sk#2, sr_store_sk#3, sr_return_amt#4, d_date_sk#5]

(21) HashAggregate [codegen id : 4]
Input [3]: [sr_customer_sk#2, sr_store_sk#3, sr_return_amt#4]
Keys [2]: [sr_customer_sk#2, sr_store_sk#3]
Functions [1]: [partial_sum(UnscaledValue(sr_return_amt#4))]
Aggregate Attributes [1]: [sum#15]
Results [3]: [sr_customer_sk#2, sr_store_sk#3, sum#16]

(22) Exchange
Input [3]: [sr_customer_sk#2, sr_store_sk#3, sum#16]
Arguments: hashpartitioning(sr_customer_sk#2, sr_store_sk#3, 5), true, [id=#17]

(23) HashAggregate [codegen id : 5]
Input [3]: [sr_customer_sk#2, sr_store_sk#3, sum#16]
Keys [2]: [sr_customer_sk#2, sr_store_sk#3]
Functions [1]: [sum(UnscaledValue(sr_return_amt#4))]
Aggregate Attributes [1]: [sum(UnscaledValue(sr_return_amt#4))#18]
Results [2]: [sr_store_sk#3 AS ctr_store_sk#13, MakeDecimal(sum(UnscaledValue(sr_return_amt#4))#18,17,2) AS ctr_total_return#14]

(24) HashAggregate [codegen id : 5]
Input [2]: [ctr_store_sk#13, ctr_total_return#14]
Keys [1]: [ctr_store_sk#13]
Functions [1]: [partial_avg(ctr_total_return#14)]
Aggregate Attributes [2]: [sum#19, count#20]
Results [3]: [ctr_store_sk#13, sum#21, count#22]

(25) Exchange
Input [3]: [ctr_store_sk#13, sum#21, count#22]
Arguments: hashpartitioning(ctr_store_sk#13, 5), true, [id=#23]

(26) HashAggregate [codegen id : 6]
Input [3]: [ctr_store_sk#13, sum#21, count#22]
Keys [1]: [ctr_store_sk#13]
Functions [1]: [avg(ctr_total_return#14)]
Aggregate Attributes [1]: [avg(ctr_total_return#14)#24]
Results [2]: [CheckOverflow((promote_precision(avg(ctr_total_return#14)#24) * 1.200000), DecimalType(24,7), true) AS (CAST(avg(ctr_total_return) AS DECIMAL(21,6)) * CAST(1.2 AS DECIMAL(21,6)))#25, ctr_store_sk#13 AS ctr_store_sk#13#26]

(27) Filter [codegen id : 6]
Input [2]: [(CAST(avg(ctr_total_return) AS DECIMAL(21,6)) * CAST(1.2 AS DECIMAL(21,6)))#25, ctr_store_sk#13#26]
Condition : isnotnull((CAST(avg(ctr_total_return) AS DECIMAL(21,6)) * CAST(1.2 AS DECIMAL(21,6)))#25)

(28) BroadcastExchange
Input [2]: [(CAST(avg(ctr_total_return) AS DECIMAL(21,6)) * CAST(1.2 AS DECIMAL(21,6)))#25, ctr_store_sk#13#26]
Arguments: HashedRelationBroadcastMode(List(input[1, bigint, true]),false), [id=#27]

(29) BroadcastHashJoin [codegen id : 9]
Left keys [1]: [ctr_store_sk#13]
Right keys [1]: [ctr_store_sk#13#26]
Join condition: (cast(ctr_total_return#14 as decimal(24,7)) > (CAST(avg(ctr_total_return) AS DECIMAL(21,6)) * CAST(1.2 AS DECIMAL(21,6)))#25)

(30) Project [codegen id : 9]
Output [2]: [ctr_customer_sk#12, ctr_store_sk#13]
Input [5]: [ctr_customer_sk#12, ctr_store_sk#13, ctr_total_return#14, (CAST(avg(ctr_total_return) AS DECIMAL(21,6)) * CAST(1.2 AS DECIMAL(21,6)))#25, ctr_store_sk#13#26]

(31) Scan parquet default.store
Output [2]: [s_store_sk#28, s_state#29]
Batched: true
Location [not included in comparison]/{warehouse_dir}/store]
PushedFilters: [IsNotNull(s_state), EqualTo(s_state,TN), IsNotNull(s_store_sk)]
ReadSchema: struct<s_store_sk:int,s_state:string>

(32) ColumnarToRow [codegen id : 7]
Input [2]: [s_store_sk#28, s_state#29]

(33) Filter [codegen id : 7]
Input [2]: [s_store_sk#28, s_state#29]
Condition : ((isnotnull(s_state#29) AND (s_state#29 = TN)) AND isnotnull(s_store_sk#28))

(34) Project [codegen id : 7]
Output [1]: [s_store_sk#28]
Input [2]: [s_store_sk#28, s_state#29]

(35) BroadcastExchange
Input [1]: [s_store_sk#28]
Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#30]

(36) BroadcastHashJoin [codegen id : 9]
Left keys [1]: [ctr_store_sk#13]
Right keys [1]: [cast(s_store_sk#28 as bigint)]
Join condition: None

(37) Project [codegen id : 9]
Output [1]: [ctr_customer_sk#12]
Input [3]: [ctr_customer_sk#12, ctr_store_sk#13, s_store_sk#28]

(38) Scan parquet default.customer
Output [2]: [c_customer_sk#31, c_customer_id#32]
Batched: true
Location [not included in comparison]/{warehouse_dir}/customer]
PushedFilters: [IsNotNull(c_customer_sk)]
ReadSchema: struct<c_customer_sk:int,c_customer_id:string>

(39) ColumnarToRow [codegen id : 8]
Input [2]: [c_customer_sk#31, c_customer_id#32]

(40) Filter [codegen id : 8]
Input [2]: [c_customer_sk#31, c_customer_id#32]
Condition : isnotnull(c_customer_sk#31)

(41) BroadcastExchange
Input [2]: [c_customer_sk#31, c_customer_id#32]
Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [id=#33]

(42) BroadcastHashJoin [codegen id : 9]
Left keys [1]: [ctr_customer_sk#12]
Right keys [1]: [cast(c_customer_sk#31 as bigint)]
Join condition: None

(43) Project [codegen id : 9]
Output [1]: [c_customer_id#32]
Input [3]: [ctr_customer_sk#12, c_customer_sk#31, c_customer_id#32]

(44) TakeOrderedAndProject
Input [1]: [c_customer_id#32]
Arguments: 100, [c_customer_id#32 ASC NULLS FIRST], [c_customer_id#32]

TakeOrderedAndProject(limit=100, orderBy=[c_customer_id#1 ASC NULLS FIRST], output=[c_customer_id#1])
+- *(9) Project [c_customer_id#1]
+- *(9) BroadcastHashJoin [ctr_customer_sk#2], [cast(c_customer_sk#3 as bigint)], Inner, BuildRight
:- *(9) Project [ctr_customer_sk#2]
: +- *(9) BroadcastHashJoin [ctr_store_sk#4], [cast(s_store_sk#5 as bigint)], Inner, BuildRight
: :- *(9) Project [ctr_customer_sk#2, ctr_store_sk#4]
: : +- *(9) BroadcastHashJoin [ctr_store_sk#4], [ctr_store_sk#4#6], Inner, BuildRight, (cast(ctr_total_return#7 as decimal(24,7)) > (CAST(avg(ctr_total_return) AS DECIMAL(21,6)) * CAST(1.2 AS DECIMAL(21,6)))#8)
: : :- *(9) Filter isnotnull(ctr_total_return#7)
: : : +- *(9) HashAggregate(keys=[sr_customer_sk#9, sr_store_sk#10], functions=[sum(UnscaledValue(sr_return_amt#11))])
: : : +- Exchange hashpartitioning(sr_customer_sk#9, sr_store_sk#10, 5)
: : : +- *(2) HashAggregate(keys=[sr_customer_sk#9, sr_store_sk#10], functions=[partial_sum(UnscaledValue(sr_return_amt#11))])
: : : +- *(2) Project [sr_customer_sk#9, sr_store_sk#10, sr_return_amt#11]
: : : +- *(2) BroadcastHashJoin [sr_returned_date_sk#12], [cast(d_date_sk#13 as bigint)], Inner, BuildRight
: : : :- *(2) Project [sr_returned_date_sk#12, sr_customer_sk#9, sr_store_sk#10, sr_return_amt#11]
: : : : +- *(2) Filter ((isnotnull(sr_returned_date_sk#12) && isnotnull(sr_store_sk#10)) && isnotnull(sr_customer_sk#9))
: : : : +- *(2) FileScan parquet default.store_returns[sr_returned_date_sk#12,sr_customer_sk#9,sr_store_sk#10,sr_return_amt#11] Batched: true, Format: Parquet, Location [not included in comparison]/{warehouse_dir}/store_returns], PartitionFilters: [], PushedFilters: [IsNotNull(sr_returned_date_sk), IsNotNull(sr_store_sk), IsNotNull(sr_customer_sk)], ReadSchema: struct<sr_returned_date_sk:bigint,sr_customer_sk:bigint,sr_store_sk:bigint,sr_return_amt:decimal(...
: : : +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)))
: : : +- *(1) Project [d_date_sk#13]
: : : +- *(1) Filter ((isnotnull(d_year#14) && (d_year#14 = 2000)) && isnotnull(d_date_sk#13))
: : : +- *(1) FileScan parquet default.date_dim[d_date_sk#13,d_year#14] Batched: true, Format: Parquet, Location [not included in comparison]/{warehouse_dir}/date_dim], PartitionFilters: [], PushedFilters: [IsNotNull(d_year), EqualTo(d_year,2000), IsNotNull(d_date_sk)], ReadSchema: struct<d_date_sk:int,d_year:int>
: : +- BroadcastExchange HashedRelationBroadcastMode(List(input[1, bigint, true]))
: : +- *(6) Filter isnotnull((CAST(avg(ctr_total_return) AS DECIMAL(21,6)) * CAST(1.2 AS DECIMAL(21,6)))#8)
: : +- *(6) HashAggregate(keys=[ctr_store_sk#4], functions=[avg(ctr_total_return#7)])
: : +- Exchange hashpartitioning(ctr_store_sk#4, 5)
: : +- *(5) HashAggregate(keys=[ctr_store_sk#4], functions=[partial_avg(ctr_total_return#7)])
: : +- *(5) HashAggregate(keys=[sr_customer_sk#9, sr_store_sk#10], functions=[sum(UnscaledValue(sr_return_amt#11))])
: : +- Exchange hashpartitioning(sr_customer_sk#9, sr_store_sk#10, 5)
: : +- *(4) HashAggregate(keys=[sr_customer_sk#9, sr_store_sk#10], functions=[partial_sum(UnscaledValue(sr_return_amt#11))])
: : +- *(4) Project [sr_customer_sk#9, sr_store_sk#10, sr_return_amt#11]
: : +- *(4) BroadcastHashJoin [sr_returned_date_sk#12], [cast(d_date_sk#13 as bigint)], Inner, BuildRight
: : :- *(4) Project [sr_returned_date_sk#12, sr_customer_sk#9, sr_store_sk#10, sr_return_amt#11]
: : : +- *(4) Filter (isnotnull(sr_returned_date_sk#12) && isnotnull(sr_store_sk#10))
: : : +- *(4) FileScan parquet default.store_returns[sr_returned_date_sk#12,sr_customer_sk#9,sr_store_sk#10,sr_return_amt#11] Batched: true, Format: Parquet, Location [not included in comparison]/{warehouse_dir}/store_returns], PartitionFilters: [], PushedFilters: [IsNotNull(sr_returned_date_sk), IsNotNull(sr_store_sk)], ReadSchema: struct<sr_returned_date_sk:bigint,sr_customer_sk:bigint,sr_store_sk:bigint,sr_return_amt:decimal(...
: : +- ReusedExchange [d_date_sk#13], BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)))
: +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)))
: +- *(7) Project [s_store_sk#5]
: +- *(7) Filter ((isnotnull(s_state#15) && (s_state#15 = TN)) && isnotnull(s_store_sk#5))
: +- *(7) FileScan parquet default.store[s_store_sk#5,s_state#15] Batched: true, Format: Parquet, Location [not included in comparison]/{warehouse_dir}/store], PartitionFilters: [], PushedFilters: [IsNotNull(s_state), EqualTo(s_state,TN), IsNotNull(s_store_sk)], ReadSchema: struct<s_store_sk:int,s_state:string>
+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)))
+- *(8) Project [c_customer_sk#3, c_customer_id#1]
+- *(8) Filter isnotnull(c_customer_sk#3)
+- *(8) FileScan parquet default.customer[c_customer_sk#3,c_customer_id#1] Batched: true, Format: Parquet, Location [not included in comparison]/{warehouse_dir}/customer], PartitionFilters: [], PushedFilters: [IsNotNull(c_customer_sk)], ReadSchema: struct<c_customer_sk:int,c_customer_id:string>
Loading