Search code examples
pythonsqlpandassnowflake-connector

Trying to query Snowflake with a list of values from an Excel List using Python and Snowflake-Connector


I have a project I received at work trying to supersede part numbers from a bill of material in an excel file. I have a query that I wrote to run the parts through a self-join that would essentially supersede the parts through. I can manually manipulate the excel file and make the query return what I need through snowflake. We'll likely get more requests of this type if we can successfully turn this into an automation. It would save SO much time in manual labor. Once I can figure this snowflake connector issue, I want to load the result into a new column in the excel spreadsheet.

I have successfully extracted the column with the list of part numbers from the excel file, authenticated to snowflake using sqlalchemy URL. Now I'm trying to apply my SQL query and I'm running into a roadblock. I've gone through the SQLAlchemy and Snowflake Python Connector documentation and I just can't seem to quite get it right. Even after adding:

conn.cursor().execute("USE WAREHOUSE XS_WAREHOUSE")
conn.cursor().execute("USE DATABASE DB")
conn.cursor().execute("USE SCHEMA db_mg.schemado you")"""



Here is the current code I am working with:

import pandas as pd 
from snowflake.sqlalchemy import URL 
from sqlalchemy import create_engine 


"""Extract the list of part numbers from the Part Number column and input it into a pandas 
dataframe"""
df = pd.read_excel(r'C:/Users/[MyUsername]/Downloads/EAS Working Candidates.xlsx') 
part_num_col = df['PartNumber'].tolist() 

"""SQL Query for Parts List built from .XLSX Part File.
Part supersession is done through AS400 manually. The Query loops the part numbers back 
through the query until a null is returned. Once a null is returned, the part number from the 
previous loop is then placed into a column 'final_actual'."""

query = r"SELECT r.itmid, rg.cstsku as final_part, i.item_desc as final_part_desc,  
CASE WHEN rg.cstsku is not null THEN rg.cstsku 
WHEN rf.cstsku is not null THEN rf.cstsku 
WHEN re.cstsku is not null THEN re.cstsku 
WHEN rd.cstsku is not null THEN rd.cstsku 
WHEN rc.cstsku is not null THEN rc.cstsku 
WHEN rb.cstsku is not null THEN rb.cstsku 
WHEN ra.cstsku is not null THEN ra.cstsku 
WHEN r.cstsku is not null THEN r.cstsku 
ELSE r.itmid END as final_actual 
FROM DB.AS400.VC_DOPCIXREF r 
LEFT JOIN DB.AS400.VC_DOPCIXREF ra 
  ON r.cstsku = ra.itmid 
LEFT JOIN DB.AS400.VC_DOPCIXREF rb 
  ON ra.cstsku = rb.itmid 
LEFT JOIN DB.AS400.VC_DOPCIXREF rc 
  ON rb.cstsku = rc.itmid 
LEFT JOIN DB.AS400.VC_DOPCIXREF rd 
  ON rc.cstsku = rd.itmid 
LEFT JOIN DB.AS400.VC_DOPCIXREF re 
  ON rd.cstsku = re.itmid 
LEFT JOIN DB.AS400.VC_DOPCIXREF rf 
  ON re.cstsku = rf.itmid 
LEFT JOIN DB.AS400.VC_DOPCIXREF rg 
  ON rf.cstsku = rg.itmid 
LEFT JOIN DB.AS400.VC_ITEM i 
  ON rg.cstsku = i.item_id WHERE r.cstsku IN (" + "'" + "','".join(map(str, part_num_col)) + "'" + ");"

#Connect to the Snowflake Data Warehouse 
engine = create_engine(URL( 
    user='[MyUsername]@domain.com', 
    account='account', 
    role='ANALYST', 
    authenticator='EXTERNALBROWSER', 
    warehouse='DB', 
    database='AS400', 
)) 
connection = engine.connect() 
cur = connection.cursor() 
try: 
    cur.execute(query) 
    df = pd.read_sql(query, engine) 
finally: 
    connection.close() 
    engine.dispose()

I receive the following error:

Traceback (most recent call last): 
  File "C:\Users\[MyUsername]\PycharmProjects\PTC EAS Report\PTC EAS Part List.py", line 24, in <br /> <module> 
    cur = Connection.Cursor() 
AttributeError: 'Connection' object has no attribute 'Cursor'



Thanks in Advance for taking a look! (Also, if I can present this in a better, more readable way next time please let me know! I'm still learning!)


Solution

  • There is no cursor attribute associated with the connection object in the sqlalchemy snowflake connector.

    The correct usage is as below

    from snowflake.sqlalchemy import URL
    from sqlalchemy import create_engine
    
    engine = create_engine(URL(
    account = 'myorganization-myaccount',
    user = 'testuser1',
    password = '0123456',
    database = 'testdb',
    schema = 'public',
    warehouse = 'testwh',
    role='myrole',
    ))
    try:
    connection = engine.connect()
    results = connection.execute('select 
    current_version()').fetchone()
    print(results[0])
    
    or
    
    rows = r.fetchall()
    results = connection.execute('select 
    current_version()').fetchall()
    print(rows)
    
    finally:
    connection.close()
    engine.dispose()
    

    Regards, Sujan