My goal is to query data from redshift view where the view name is appended by monthdt like view_name_202009 and to query this view month by month for 13 months back, create dataframe and perform transformation. After the first dataframe(df1) is created the previous months transformed data should be appended to df1.
As a step by step approach:
Below is the function I wrote to create the sql statements for 13months but I'm failing on passing them to the read_sql squetnailly and create dataframe.
def last_n_months(num_of_months, start_date=datetime.now(), include_curr=True):
f = "%Y%m"
curr = datetime.now()
if include_curr:
yield curr.strftime(f)
for num in range(num_of_months):
curr = curr.replace(day=1) - timedelta(days=1)
sql = """select col1, col2,
from schema.view_name"""+str(curr.strftime(f))+""" where col1 in ('01','02','1ST','2ND')"""
# df = pd.read_sql(sql, con)
yield sql
m_lst = [sql for sql in last_n_months(13, include_curr=False)]
df = pd.DataFrame()
for i in m_lst:
df = df.append(pd.read_sql(m_lst, con))
I printed out the m_lst and it looks like below:
["select col1, col2 from schema.view_name_202010 where col1 in ('01','02','1ST','2ND')",["select col1, col2 from schema.view_name_202009 where col1 in ('01','02','1ST','2ND')",["select col1, col2 from schema.view_name_202008 where col1 in ('01','02','1ST','2ND')"]
I am getting an error as below:
pandas.io.sql.DatabaseError: Execution failed on sql : argument 1 must be a string or unicode object: got list instead
Can anyone help me where I'm doing wrong and if there is a better way of doing this?
Change this code bellow
df = pd.DataFrame()
for i in m_lst:
df = df.append(pd.read_sql(i, con))