Search code examples
pandasdataframepandasql

How to Join two data frames using pandasql with condition?


I have two data frames as below,

    listCode= ['A1','A2','A3','A4','A5','A6']
    dfLookup = pd.DataFrame({'ID':listCode})

    data = [['Chicago', 'B1'], ['Madsion', 'A1'], ['NY', 'A4']]
    dftest = pd.DataFrame(data, columns=['City', 'Code', ])
    dftest['Found'] =''
    dftest

    City    Code    Found
    Chicago B1  
    Madsion A1  
    NY      A4  

Expected Result:

    City    Code    Found
    Chicago B1       0  
    Madsion A1       1
    NY      A4       1

I am trying to join the data frames using the below, but not sure how to use the case statements.

    import pandasql as pdsql
    import sqldf
    sQuery = """
       SELECT   dftest.City, dftest.Code, dftest.Found 
       FROM  dftest 
       LEFT JOIN dfLookup 
       ON dftest.Code= dfLookup.ID"""
       sqlResult = sqldf.run(sQuery )

Thanks for your help.


Solution

  • Another possible solution (there is no need to use pandasql):

    (dftest.merge(dfLookup, how='left', left_on='Code', right_on='ID')
     .assign(Found = lambda x: x['ID'].notnull().astype(int))
     .drop('ID', axis=1))
    

    In case you really want to use pandasql:

    from pandasql import sqldf
    
    pysqldf = lambda q: sqldf(q, globals())
    
    q = '''
    SELECT dftest.City, dftest.Code, 
           CASE WHEN dfLookup.ID IS NOT NULL THEN 1 ELSE 0 END AS Found
    FROM dftest
    LEFT JOIN dfLookup ON dftest.Code = dfLookup.ID
    '''
    
    pysqldf(q)
    

    Output:

          City Code  Found
    0  Chicago   B1      0
    1  Madsion   A1      1
    2       NY   A4      1