Search code examples
pythonpandasdataframepymysql

Pandas DF From and Back to MySQL


import config
import pandas as pd
import pymysql

username = config.username
dbpassword = config.dbpassword
dbhost = config.dburl
engine =  pymysql.connect(host=dbhost, port=3306,user=username,password=dbpassword,db='db',autocommit=True) 

tableBuilder1='''SELECT b.`IssueId` AS `Id`, b.`ShortId` AS `ShortId`, b.`Path` AS `Path`, b.`Data` AS `Data`,  b.`Actual Create Date` AS `Actual Create Date` FROM `SIM_FE_Audit_Data` b WHERE b.`Data` IN ( 'Open', 'Comment', 'Pending Others', 'Work in Progress', 'Resolved') AND NOT b.`IssueId` IN (SELECT c.`IssueId` FROM `SIM_FE_Audit_Data` c WHERE b.`Actual Create Date` = c.`Actual Create Date` AND b.`Data` = 'Comment' AND c.`Data` = 'Open') ORDER BY b.`IssueId`, b.`Actual Create Date`'''

df = pd.read_sql(tableBuilder1, con=engine)
df.to_sql('SIM_FE_Audit_Durations_No_First_Comment', con=engine, if_exists='replace',index=False)

The above code is being developed to replace views due to the views taking 15+ minutes to render and causing failures of tableau dashboards. This first part is to make the first table in a series of 3. However, I currently get an error DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': not all arguments converted during string formatting on the df.to_sql and I cannot figure out why I am getting that error. I did a print(df) to verify it is reading the sql and it is. Writing it back to the new table is generating the error and I don't understand why.


Solution

  • swapping to sqlalchemy directly seems to have fixed it:

    import config
    import pandas as pd
    from sqlalchemy import create_engine
    import time
    
    username = config.username
    dbpassword = config.dbpassword
    dbhost = config.dburl
    engine =  create_engine('mysql://%s:%s@%s/db?charset=utf8' %(username, dbpassword, dbhost), encoding="utf-8") 
    
    tableBuilder1='''SELECT b.`IssueId` AS `Id`, b.`ShortId` AS `ShortId`, b.`Path` AS `Path`, b.`Data` AS `Data`,  b.`Actual Create Date` AS `Actual Create Date` FROM `SIM_FE_Audit_Data` b WHERE b.`Data` IN ( 'Open', 'Comment', 'Pending Others', 'Work in Progress', 'Resolved') AND NOT b.`IssueId` IN (SELECT c.`IssueId` FROM `SIM_FE_Audit_Data` c WHERE b.`Actual Create Date` = c.`Actual Create Date` AND b.`Data` = 'Comment' AND c.`Data` = 'Open') ORDER BY b.`IssueId`, b.`Actual Create Date`'''
    
    df = pd.read_sql(tableBuilder1, con=engine)
    df.to_sql('SIM_FE_Audit_Durations_No_First_Comment', con=engine, if_exists='replace')