I'm trying to understand the following code snip:
improt pandasql
data_sql = data[['account_id', 'id', 'date', 'amount']]
# data_sql is a table has the above columns
data_sql.loc[:, 'date_hist_min'] = data_sql.date.apply(lambda x: x + pd.DateOffset(months=-6))
# add one more column, 'date_hist_min', it is from the column 'data' with the month minus 6
sqlcode = '''
SELECT t1.id,
t1.date,
t2.account_id as "account_id_hist",
t2.date as "date_hist",
t2.amount as "amount_hist"
FROM data_sql as t1 JOIN data_sql as t2
ON (cast(strftime('%s', t2.date) as integer) BETWEEN
(cast(strftime('%s', t1.date_hist_min) as integer))
AND (cast(strftime('%s', t1.date) as integer)))
AND (t1.{0} == t2.{0})
'''
# perform the SQL query on the table with sqlcode:
newdf = pandasql.sqldf(sqlcode.format(column), locals())
The code is with Python pandasql. It manipulates dataframe as SQL table. You can assume the above dataframe as SQL table.
The definition of the table is in the comments.
What's the meaning of t1.{0} == t2.{0}
? What does {0}
stand for in the context?
sqlcode.format(column)
is going format the string and inject the columns into {0}
The 0
means format will use the first parameter.
print("This {1} a {0}".format("string", "is"))
would print "This is a string"