Search code examples
pythonpython-3.xpandasdataframestring-concatenation

How to unite values and column names in a new column (pandas)?


Given a dataframe (df_original) that potentially can have a variable amount of rows and columns, how can I add one column that "merges" all values of the columns seperated by a character (e.g. by _) ?

The name of that column should also consist out of a merge of the other column names. The output should look like df_final in the example code.

Example code:

import pandas as pd
d = {'col1': ["a", "b", "c"], 'col2': ["a", "b", "c"], 'col3': ["a", "b", "c"], 'col99': ["a", "b", "c"]}
df_original = pd.DataFrame(data=d)

d2 = {'col1': ["a", "b", "c"], 'col2': ["a", "b", "c"], 'col3': ["a", "b", "c"], 'col99': ["a", "b", "c"], 'col1_col2_col3_col99' : ["a_a_a_a", "b_b_b_b", "c_c_c_c"]}
df2 = pd.DataFrame(data=d2)
cols = ["col1","col2","col3","col99","col1_col2_col3_col99"]
df_final = df2[cols]

Solution

  • Using pd.DataFrame.apply:

    df['_'.join(df.columns)] = df.apply('_'.join, axis=1)
    
    print(df)
    
      col1 col2 col3 col99 col1_col2_col3_col99
    0    a    a    a     a              a_a_a_a
    1    b    b    b     b              b_b_b_b
    2    c    c    c     c              c_c_c_c