operators/s3_to_redshift_operator.py (Lines 189-191)
pg_query = \
"""
SELECT column_name, udt_name
FROM information_schema.columns
WHERE table_schema = '{0}' AND table_name = '{1}';
""".format(self.redshift_schema, self.table)
pg_schema = dict(pg_hook.get_records(pg_query))
incoming_keys = [column['name'] for column in schema]
diff = list(set(incoming_keys) - set(pg_schema.keys()))
In above snippet:
If "schema" column name contains any uppercase character, the column differential (diff) will erroneously be a non-empty set. This will in turn cause logic to attempt to insert a column that is already present in created table.
Example
Assume schema = {"name": "ColumnName", "type": _ }
pg_query will report column_name == "columnname" (automatically lowercased by redshift) but incoming keys will leverage column['name'] == "ColumnName" so:
In [1]: diff = list(set(["ColumnName"]) - set(["columnname"]))
In [2]: diff
Out[2]: ['ColumnName']
This will cause subsequent logic to try to insert a new column called 'ColumnName' which will fail since 'columnname' already exists in created table.