Search code examples
pandassnowflake-cloud-data-platformsnowflake-connector

snowflake.connector SQL compilation error invalid identifier from pandas dataframe


I'm trying to ingest a df I created from a json response into an existing table (the table is currently empty because I can't seem to get this to work)

The df looks something like the below table:

index clicks_affiliated
0 3214
1 2221

but I'm seeing the following error:

snowflake.connector.errors.ProgrammingError: 000904 (42000): SQL compilation error: error line 1 at position 94 invalid identifier '"clicks_affiliated"'

and the column names in snowflake match to the columns in my dataframe.

This is my code:

import pandas as pd
from snowflake.sqlalchemy import URL
from sqlalchemy import create_engine
import snowflake.connector
from snowflake.connector.pandas_tools import write_pandas, pd_writer
from pandas import json_normalize
import requests

  
df_norm = json_normalize(json_response, 'reports')

#I've tried also adding the below line (and removing it) but I see the same error
df = df_norm.reset_index(drop=True)

   
def create_db_engine(db_name, schema_name):
    engine = URL(
        account="ab12345.us-west-2",
        user="my_user",
        password="my_pw",
        database="DB",
        schema="PUBLIC",
        warehouse="WH1",
        role="DEV"
    )
    return engine


def create_table(out_df, table_name, idx=False):
    url = create_db_engine(db_name="DB", schema_name="PUBLIC")
    engine = create_engine(url)
    connection = engine.connect()

    try:
        out_df.to_sql(
            table_name, connection, if_exists="append", index=idx, method=pd_writer
        )

    except ConnectionError:
        print("Unable to connect to database!")

    finally:
        connection.close()
        engine.dispose()

    return True

print(df.head)

create_table(df, "reporting")

Solution

  • So... it turns out I needed to change my columns in my dataframe to uppercase

    I've added this after the dataframe creation to do so and it worked:

    df.columns = map(lambda x: str(x).upper(), df.columns)