Search code examples
pythonpandasdataframesumpython-2.x

Calculate sum of columns of same name pandas


How can i search for duplicate columns in a dataframe and then create a new column with same name. the new column is result of 'OR' operator of these columns. Then drop old duplicated columns.

Example:

For that, I tried to create a unique column 'job' that is the result of 'OR' operator of the two 'job' columns in the table bellow.

There is my table look like:

name job maried children job
John True True True True
Peter True False True True
Karl False True True True
jack False False False False

the result that I want is:

name job maried children
John True True True
Peter True False True
Karl True True True
jack False False False

I tried to do this (df1 is my table):

df_join = pd.DataFrame()
df1_dulp = pd.DataFrame()
df_tmp = pd.DataFrame()

for column in df1.columns:
    df1_dulp = df1.filter(like=str(column))
    if df1_dulp.shape[1] >= 2:
        for i in range(0, df1_dulp.shape[1]):
            df_tmp += df1_dulp.iloc[:,i]
        if column in df1_dulp.columns:
            df1_dulp.drop(column, axis=1, inplace=True)
        df_join = df_join.join(df1_dulp, how = 'left', lsuffix='left', rsuffix='right')

The result is an empty table (df_join).


Solution

  • You can select the boolean columns with select_dtypes, then aggregate as OR with groupby.any on columns:

    out = (df
       .select_dtypes(exclude='bool')
       .join(df.select_dtypes('bool')
               .groupby(level=0, axis=1, sort=False).any()
            )
    )
    

    output:

        name    job  maried  children
    0   John   True    True      True
    1  Peter   True   False      True
    2   Karl   True    True      True
    3   jack  False   False     False