Search code examples
pythonpandasdataframe

How to add multiple columns to pandas dataframe in one assignment


I'm trying to figure out how to add multiple columns to pandas simultaneously with Pandas. I would like to do this in one step rather than multiple repeated steps.

import pandas as pd

data = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(data)

I thought this would work here...

df[['column_new_1', 'column_new_2', 'column_new_3']] = [np.nan, 'dogs', 3]

Solution

  • I would have expected your syntax to work too. The problem arises because when you create new columns with the column-list syntax (df[[new1, new2]] = ...), pandas requires that the right hand side be a DataFrame (note that it doesn't actually matter if the columns of the DataFrame have the same names as the columns you are creating).

    Your syntax works fine for assigning scalar values to existing columns, and pandas is also happy to assign scalar values to a new column using the single-column syntax (df[new1] = ...). So the solution is either to convert this into several single-column assignments, or create a suitable DataFrame for the right-hand side.

    Here are several approaches that will work:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({
        'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]
    })
    

    Then one of the following:

    1) Three assignments in one, using iterator unpacking

    df['column_new_1'], df['column_new_2'], df['column_new_3'] = np.nan, 'dogs', 3
    

    2) Use DataFrame() to expand a single row to match the index

    df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)
    

    3) Combine with a temporary DataFrame using pd.concat

    df = pd.concat(
        [
            df,
            pd.DataFrame(
                [[np.nan, 'dogs', 3]], 
                index=df.index, 
                columns=['column_new_1', 'column_new_2', 'column_new_3']
            )
        ], axis=1
    )
    

    4) Combine with a temporary DataFrame using .join

    This is similar to 3, but may be less efficient.

    df = df.join(pd.DataFrame(
        [[np.nan, 'dogs', 3]], 
        index=df.index, 
        columns=['column_new_1', 'column_new_2', 'column_new_3']
    ))
    

    5) Use a dictionary instead of the lists used in 3 and 4

    This is a more "natural" way to create the temporary DataFrame than the previous two. Note that in Python 3.5 or earlier, the new columns will be sorted alphabetically.

    df = df.join(pd.DataFrame(
        {
            'column_new_1': np.nan,
            'column_new_2': 'dogs',
            'column_new_3': 3
        }, index=df.index
    ))
    

    6) Use .assign() with multiple column arguments

    This may be the winner in Python 3.6+. But like the previous one, the new columns will be sorted alphabetically in earlier versions of Python.

    df = df.assign(column_new_1=np.nan, column_new_2='dogs', column_new_3=3)
    

    7) Create new columns, then assign all values at once

    Based on this answer. This is interesting, but I don't know when it would be worth the trouble.

    new_cols = ['column_new_1', 'column_new_2', 'column_new_3']
    new_vals = [np.nan, 'dogs', 3]
    df = df.reindex(columns=df.columns.tolist() + new_cols)   # add empty cols
    df[new_cols] = new_vals  # multi-column assignment works for existing cols
    

    8) Three separate assignments

    In the end, it's hard to beat this.

    df['column_new_1'] = np.nan
    df['column_new_2'] = 'dogs'
    df['column_new_3'] = 3
    

    Note: many of these options have already been covered in other questions: