Search code examples
pythonpandasdataframedivision

pandas columns division returns multiple columns


I am trying to simply divide two columns element-wise, but for some reason this returns two columns instead of one as I would expect.

I think it has something to do with the fact that I need to create the dataframe iteratively, so I opted for by appending rows one at a time. Here's some testing code:

import pandas as pd


df = pd.DataFrame(columns=['image_name partition zeros ones total'.split()])

# Create a DataFrame
data = {
    'dataset': ['177.png', '276.png', '208.png', '282.png'],
    'partition': ['green', 'green', 'green', 'green'],
    'zeros': [1896715, 1914720, 1913894, 1910815],
    'ones': [23285, 5280, 6106, 9185],
    'total': [1920000, 1920000, 1920000, 1920000]
}

for i in range(len(data['ones'])):
    row = []
    for k in data.keys():
        row.append(data[k][i])
    df = df.append(pd.Series(row, index=df.columns), ignore_index=True)

df_check = pd.DataFrame(data)
df_check["result"] = df_check["zeros"] / df_check["total"]

df["result"] = df["zeros"] / df["total"]
df

If you try to run this, you'll see that all work as expected with df_check and the code fails when it get to df["result"] = df["zeros"] / df["total"]:

ValueError: Cannot set a DataFrame with multiple columns to the single column result

In fact, If I try to inspect the result of the division I notice there are two columns with all missing values:

>>> df["zeros"] / df["total"]

    total   zeros
0   NaN NaN
1   NaN NaN
2   NaN NaN
3   NaN NaN

Any suggestion why this happens and how to fix it?


Solution

  • The problem is the following line

    df = pd.DataFrame(columns=['image_name partition zeros ones total'.split()])
    

    the split() method create a list itself, so avoid the list and use the following

    df = pd.DataFrame(columns='image_name partition zeros ones total'.split())