Pandas Series subtract Pandas Dataframe strange result

I'm wondering why pandas Series subtract a pandas dataframe produce such a strange result.

df = pd.DataFrame(np.arange(10).reshape(2, 5), columns='a-b-c-d-e'.split('-'))
df.max(axis=1) - df[['b']]

What are the steps for pandas to produce the result?

    b   0   1
0 NaN NaN NaN
1 NaN NaN NaN

Solution

By default an operation between a DataFrame and a Series is broadcasted on the DataFrame by column, over the rows. This makes it easy to perform operations combining a DataFrame and aggregation per column:

# let's subtract the DataFrame to its max per column
df.max(axis=0) - df[['b']]

    a  b   c   d   e
b NaN  5 NaN NaN NaN
1 NaN  0 NaN NaN NaN

Here, since you're aggregating per row, this is no longer possible. You should use rsub with the parameter axis=0:

df[['b']].rsub(df.max(axis=1), axis=0)

Output:

   b
0  3
1  3

Note that using two Series would also align the values:

df.max(axis=1) - df['b']

Output:

0    3
1    3
dtype: int64

Why 3 columns with `df.max(axis=1) - df[['b']]`?

First, let's have a look at each operand:

# df.max(axis=1)
0    4
1    9
dtype: int64

# df[['b']]
   b
0  1
1  6

Since df[['b']] is 2D (DataFrame), and df.max(axis=1) is 1D (Series), df.max(axis=1) will be used as if it was a "wide" DataFrame:

# df.max(axis=1).to_frame().T
   0  1
0  4  9

There are no columns in common, thus the output is only NaNs with the union of column names ({'b'}|{0, 1} -> {'b', 0, 1}).

If you replace the NaNs that are used in the operation by 0 this makes it obvious how the values are used:

df[['b']].rsub(df.max(axis=1).to_frame().T, fill_value=0)

     b    0    1
0 -1.0  4.0  9.0
1 -6.0  NaN  NaN

Now let's check a different example in which one of the row indices has the same name as one of the selected columns:

df = pd.DataFrame(np.arange(10).reshape(2, 5),
                  columns=['a', 'b', 'c', 'd', 'e'],
                  index=['b', 0]
                 )
df.max(axis=1) - df[['b']]

Now the output only has 2 columns, b the common indice and 1 the second index in the Series ({'b', 1}|{'b'} -> {'b', 1}):

    1  b
b NaN  3
1 NaN -2

Pandas Series subtract Pandas Dataframe strange result

Why 3 columns with df.max(axis=1) - df[['b']]?

Why 3 columns with `df.max(axis=1) - df[['b']]`?