Search code examples
dataframecastingsnowflake-cloud-data-platformsqldatatypes

Is there a way to revert a numeric datatype back to date time? (python pandas, snowflake)


I was converting a pandas dataframe column from a string ("YYYY-MM-DDTHH:MM:SSZ") to date time. After the conversion I stored it in a snowflake database where it ended up being a numerical value that did not reflect the timestamp at all. On my first run of this program it worked as intended (correctly converting to a timestamp in snowflake) but on a later run, when a new table was concatenated, the result column ended up being in this numeric format.

"2023-07-20T14:43:42Z" -> 1,689,864,222,000,000

Just wondering if anyone's seen this before, and maybe knows what is happening.

The process grabs the existing table in snowflake, and then converts all columns to string type (in order to concatenate with possible miss-matched data types) and then reuploads the new table after dropping duplicates. It was done this way as occasionally, the new table would contain a column which was missing from the existing table.

import pandas as pd
from snowflake.snowpark import Session

# session = create snowflake session
# newdf = new data into a new dataframe
 
existingdf = session.table(tablename).toPandas()

existingdf = existingdf.astype(str)
newdf = newdf.astype(str)

df = pd.concat([existingdf,newdf],ignore_index=True,sort=False)
df = df.loc[df.astype(str).drop_duplicates().index]

df["CREATIONTIME"] = pd.to_datetime(df["CREATIONTIME"], utc=True)

df = session.create_dataframe(df)
df.write.mode(action).save_as_table("TABLE_NAME")

In snowflake, the "CREATIONTIME" column is a fixed numeric snowflake result column

Whereas the original csv is in this format csv column

The above should be displaying the same results (just with a different datatype) rather than what actually occurred

"2023-07-20T14:43:42Z" -> 1,689,864,222,000,000

** edit: the number in snowflake is the timestamp in microseconds with a 6 hour timezone difference


Solution

  • Snowpark currently has a bug related to Pandas datetime64[ns]. See SNOW-886649: write_pandas inserts datetime64[ns] to Snowflake as an Invalid Date. The work around is to write a string to the Snowflake timestamp column:

    for column in df_columns:
        if is_datetime64_dtype(df[column]):
            df[column] = df[column].dt.strftime('%Y-%m-%d %H:%M:%S.%f')