[Refactor] New Tests for Query Parser/Formatter#97
[Refactor] New Tests for Query Parser/Formatter#97HazelYuAhiru wants to merge 3 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds 40 new test cases for the query parser to document expected AST representations and known parser limitations. The tests cover various SQL constructs including CAST operations, self-joins, subqueries, complex WHERE conditions, aggregations with CASE statements, and advanced features like DISTINCT and UNION.
Changes:
- Added 40 new test functions in
test_query_parser.py, documenting expected AST structures for real-world SQL queries - Included TODO comments throughout the tests identifying known parser limitations that need to be addressed (e.g., TYPE keywords as FunctionNodes, missing UNION support, missing DISTINCT support)
- Most test assertions are commented out since many features are not yet fully supported by the parser
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| [ | ||
| LiteralNode(0), | ||
| LiteralNode(1), | ||
| LiteralNode(2), | ||
| LiteralNode(3), | ||
| LiteralNode(4), | ||
| LiteralNode(5), | ||
| LiteralNode(6), | ||
| ], |
There was a problem hiding this comment.
Inconsistent representation of IN operator with list values. Lines 1413 and 1436 wrap the list in LiteralNode (e.g., LiteralNode(["FCM", "ONE_SIGNAL"])), but lines 1423-1431 use a plain Python list (e.g., [LiteralNode(0), LiteralNode(1), ...]). According to OperatorNode's signature, the _right parameter must be a Node. Additionally, LiteralNode's type hint indicates _value should be str|int|float|bool|datetime|None, which doesn't include list. For consistency, choose one approach: either wrap all IN lists in LiteralNode (and update LiteralNode's type hint to include list), or use a dedicated structure for representing IN lists.
| [ | |
| LiteralNode(0), | |
| LiteralNode(1), | |
| LiteralNode(2), | |
| LiteralNode(3), | |
| LiteralNode(4), | |
| LiteralNode(5), | |
| LiteralNode(6), | |
| ], | |
| LiteralNode([0, 1, 2, 3, 4, 5, 6]), |
There was a problem hiding this comment.
I think the old way semantically makes more sense. I will modify Lines 1413 and 1436 instead.
Overview
Add tests for the query parser. After this PR, the reviewed test cases can be copied to the formatter and end-to-end test files.
Code Changes
test_query_parser.py.Issues Detected
TEXT, INTEGER, DATE, NULL) areFunctionNodes; consider a keyword/node type (tests 1–5, 41-43).UNIONnot supported (queries 15, 29, 32).DISTINCT(tests 11, 18, 35)/DISTINCT ON(test 40) not supported.IS NULLemitted asFunctionNode("MISSING"); desired form isOperatorNode(col, "IS", [NULL])(test 41).FunctionNode ADD/SUB/MUL/DIV; could map toOperatorNode(tests 27, 42).INshould be a list of literals, not oneLiteralNodewhose value is a list. (test 20)Design Questions
CASE:CASEcurrently is underFunctionNode; consider a dedicated CaseNode? (test 31)GROUP BY 1, 2handling to be confirmed. Currently are literals1,2; consider replace as column refs? (test 43)INTERVALhandling to be confirmed (test 42, 43).