-
Notifications
You must be signed in to change notification settings - Fork 528
Open
Labels
bugstatus-triage_doneInitial triage done, will be further handled by the driver teamInitial triage done, will be further handled by the driver teamtriaged
Description
Python version
Python 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0]
Operating system and processor architecture
Linux-5.4.0-165-generic-x86_64-with-glibc2.31
Installed packages
numba==0.58.1
numpy @ file:///work/mkl/numpy_and_numpy_base_1682953417311/work
pandas==2.1.4
python-dateutil @ file:///tmp/build/80754af9/python-dateutil_1626374649649/work
pytz==2022.7.1
requests==2.31.0
snowballstemmer @ file:///tmp/build/80754af9/snowballstemmer_1637937080595/work
snowflake-connector-python==3.7.0
snowflake-sqlalchemy==1.5.1
SQLAlchemy==1.4.50
tqdm==4.66.1What did you do?
TIMESTAMP_NTZ(9) column with values that overflow int64 with ns precision (eg. '9999-12-31 00:00:00.000')What did you expect to see?
I tried to fetch a column with TIMESTAMP_NTZ(9) dtype and the max datetime is '9999-12-31 00:00:00.000' and minimum is '1987-01-30 23:59:59.000'.
I get following error when I select from that column.
File "/home/jwyang/anaconda3/lib/python3.11/site-packages/snowflake/connector/result_batch.py", line 79, in _create_nanoarrow_iterator
else PyArrowTableIterator(
^^^^^^^^^^^^^^^^^^^^^
File "src/snowflake/connector/nanoarrow_cpp/ArrowIterator/nanoarrow_arrow_iterator.pyx", line 239, in snowflake.connector.nanoarrow_arrow_iterator.PyArrowTableIterator.__cinit__
File "pyarrow/table.pxi", line 4116, in pyarrow.lib.Table.from_batches
File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Schema at index 2 was different:
DT: timestamp[us]
vs
DT: timestamp[ns]
Because '9999-12-31 00:00:00.000' doesn't fit in int64 with ns precision, it seems like it is downcast to us precision on a batch basis in
Line 562 in 6a2a5b6
| if (epoch > (INT64_MAX / powTenSB4) || |
I am guessing downcasting is not applied to all batches and it results in different data types between batches which pyarrow does not allow.
Can you set logging to DEBUG and collect the logs?
import logging
import os
for logger_name in ('snowflake.connector',):
logger = logging.getLogger(logger_name)
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
logger.addHandler(ch)thomasaarholt, 0xbe7a, sp-mduval, Simon-Bru and dennis-wey
Metadata
Metadata
Assignees
Labels
bugstatus-triage_doneInitial triage done, will be further handled by the driver teamInitial triage done, will be further handled by the driver teamtriaged