Search code examples
pythondataframedata-analysis

How do we determine the axis aand index_y in the merge?


I am new to data analysis and python. I came acroos the following

df1.merge(df2,  on='cnpj',suffixes=('','_y')).drop('index_y',axis=1) 

for performing merge operations among the dataframes. I would like to know how the axis and index_y is determined?


Solution

  • The overlapping columns between df1 and df2 will me renamed according to the specified suffix, so "nothing" for df1 ("") and "_y" for df2.

    Afterwards you drop the column "index_y" (which is the index of df2 with the suffix _y) by setting axis=1 (axis=0 along rows). This operation is performed after the merge to most likely get rid of redundant information.