Search code examples
pythonpandasdataframechange-data-capture

Implementing cdc but getting value error in Python Pandas


I am trying to perform CDC operation via Python. I am trying to perform union of the unchanged data (master file / base table) with the new file (delta file).

Below is the function I have written:

def processInputdata():
    df1 = pd.read_csv('master.csv')
    df2 = pd.read_csv('delta.csv')
    df=pd.merge(df1,df2,on=['cust_id','cust_id'],how="outer",indicator=True)
    dfo=df[df['_merge']=='left_only']
    dfT =pd.merge(dfo,df2,on=['cust_id','cust_id'],how="right",indicator=True)

This is not working. Below is the error message:

ValueError: Cannot use name of an existing column for indicator column

I am not sure if there is any simpler or better approach to perform CDC.

Sample data :

Master file :

   cust_id cust_name  cust_income cust_phone
0      111     a            78000       sony
1      222     b             8000        jio
2      333     c           108000     iphone
3      444     d           200000    iphoneX
4      555     e            20000    samsung

Delta file :

 cust_id cust_name  cust_income cust_phone
0      222     b        20000          jio
1      333     c        120000     iphoneX
2      666     f        76000      oneplus

Expected output:

   cust_id cust_name  cust_income cust_phone
0      111     a            78000       sony
1      222     b            20000        jio
2      333     c           120000     iphoneX
3      444     d           200000    iphoneX
4      555     e            20000    samsung
5.     666     f           76000     oneplus

Solution

  • Using append with drop_duplicates with keep='last':

    df = master.append(delta)\
               .drop_duplicates(subset=['cust_id','cust_phone'], keep='last')\
               .sort_values('cust_name').reset_index(drop=True)
    
       cust_id cust_name  cust_income cust_phone
    0      111         a        78000       sony
    1      222         b         8000        jio
    2      333         c       108000    iphoneX
    3      444         d       200000    iphoneX
    4      555         e        20000    samsung
    5      666         f        76000    oneplus