reconcile_schemas: Parameter "schema" column names should be lowercased before column differential check

operators/s3_to_redshift_operator.py (Lines 189-191)
------------------------------------
```
pg_query = \
            """
            SELECT column_name, udt_name
            FROM information_schema.columns
            WHERE table_schema = '{0}' AND table_name = '{1}';
            """.format(self.redshift_schema, self.table)
pg_schema = dict(pg_hook.get_records(pg_query))
incoming_keys = [column['name'] for column in schema]
diff = list(set(incoming_keys) - set(pg_schema.keys()))
```
In above snippet:  

If "schema" column name contains any uppercase character, the column differential (diff) will erroneously be a non-empty set. This will in turn cause logic to attempt to insert a column that is already present in created table.

Example
--------
Assume schema = {"name": "ColumnName", "type": _ }

pg_query will report column_name == "columnname" (automatically lowercased by redshift) but incoming keys will leverage column['name'] == "ColumnName" so:

```
In [1]: diff =  list(set(["ColumnName"]) - set(["columnname"]))
In [2]: diff
Out[2]: ['ColumnName']
```

This will cause subsequent logic to try to insert a new column called 'ColumnName' which will fail since 'columnname' already exists in created table.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

reconcile_schemas: Parameter "schema" column names should be lowercased before column differential check #18

operators/s3_to_redshift_operator.py (Lines 189-191)

Example

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

reconcile_schemas: Parameter "schema" column names should be lowercased before column differential check #18

Description

operators/s3_to_redshift_operator.py (Lines 189-191)

Example

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions