How to add multiple columns to pandas dataframe in one assignment

I'm trying to figure out how to add multiple columns to pandas simultaneously with Pandas. I would like to do this in one step rather than multiple repeated steps.

import pandas as pd

data = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(data)

I thought this would work here...

df[['column_new_1', 'column_new_2', 'column_new_3']] = [np.nan, 'dogs', 3]

Solution

I would have expected your syntax to work too. The problem arises because when you create new columns with the column-list syntax (df[[new1, new2]] = ...), pandas requires that the right hand side be a DataFrame (note that it doesn't actually matter if the columns of the DataFrame have the same names as the columns you are creating).

Your syntax works fine for assigning scalar values to existing columns, and pandas is also happy to assign scalar values to a new column using the single-column syntax (df[new1] = ...). So the solution is either to convert this into several single-column assignments, or create a suitable DataFrame for the right-hand side.

Here are several approaches that will work:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'col_1': [0, 1, 2, 3],
    'col_2': [4, 5, 6, 7]
})

Then one of the following:

1) Three assignments in one, using iterator unpacking

df['column_new_1'], df['column_new_2'], df['column_new_3'] = np.nan, 'dogs', 3

2) Use `DataFrame()` to expand a single row to match the index

df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)

3) Combine with a temporary DataFrame using `pd.concat`

df = pd.concat(
    [
        df,
        pd.DataFrame(
            [[np.nan, 'dogs', 3]], 
            index=df.index, 
            columns=['column_new_1', 'column_new_2', 'column_new_3']
        )
    ], axis=1
)

4) Combine with a temporary DataFrame using `.join`

This is similar to 3, but may be less efficient.

df = df.join(pd.DataFrame(
    [[np.nan, 'dogs', 3]], 
    index=df.index, 
    columns=['column_new_1', 'column_new_2', 'column_new_3']
))

5) Use a dictionary instead of the lists used in 3 and 4

This is a more "natural" way to create the temporary DataFrame than the previous two. Note that in Python 3.5 or earlier, the new columns will be sorted alphabetically.

df = df.join(pd.DataFrame(
    {
        'column_new_1': np.nan,
        'column_new_2': 'dogs',
        'column_new_3': 3
    }, index=df.index
))

6) Use `.assign()` with multiple column arguments

This may be the winner in Python 3.6+. But like the previous one, the new columns will be sorted alphabetically in earlier versions of Python.

df = df.assign(column_new_1=np.nan, column_new_2='dogs', column_new_3=3)

7) Create new columns, then assign all values at once

Based on this answer. This is interesting, but I don't know when it would be worth the trouble.

new_cols = ['column_new_1', 'column_new_2', 'column_new_3']
new_vals = [np.nan, 'dogs', 3]
df = df.reindex(columns=df.columns.tolist() + new_cols)   # add empty cols
df[new_cols] = new_vals  # multi-column assignment works for existing cols

8) Three separate assignments

In the end, it's hard to beat this.

df['column_new_1'] = np.nan
df['column_new_2'] = 'dogs'
df['column_new_3'] = 3

Note: many of these options have already been covered in other questions:

How to add multiple columns to pandas dataframe in one assignment

1) Three assignments in one, using iterator unpacking

2) Use DataFrame() to expand a single row to match the index

3) Combine with a temporary DataFrame using pd.concat

4) Combine with a temporary DataFrame using .join