I have two data frames as below,
listCode= ['A1','A2','A3','A4','A5','A6']
dfLookup = pd.DataFrame({'ID':listCode})
data = [['Chicago', 'B1'], ['Madsion', 'A1'], ['NY', 'A4']]
dftest = pd.DataFrame(data, columns=['City', 'Code', ])
dftest['Found'] =''
dftest
City Code Found
Chicago B1
Madsion A1
NY A4
Expected Result:
City Code Found
Chicago B1 0
Madsion A1 1
NY A4 1
I am trying to join the data frames using the below, but not sure how to use the case statements.
import pandasql as pdsql
import sqldf
sQuery = """
SELECT dftest.City, dftest.Code, dftest.Found
FROM dftest
LEFT JOIN dfLookup
ON dftest.Code= dfLookup.ID"""
sqlResult = sqldf.run(sQuery )
Thanks for your help.
Another possible solution (there is no need to use pandasql
):
(dftest.merge(dfLookup, how='left', left_on='Code', right_on='ID')
.assign(Found = lambda x: x['ID'].notnull().astype(int))
.drop('ID', axis=1))
In case you really want to use pandasql
:
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
q = '''
SELECT dftest.City, dftest.Code,
CASE WHEN dfLookup.ID IS NOT NULL THEN 1 ELSE 0 END AS Found
FROM dftest
LEFT JOIN dfLookup ON dftest.Code = dfLookup.ID
'''
pysqldf(q)
Output:
City Code Found
0 Chicago B1 0
1 Madsion A1 1
2 NY A4 1