Search code examples
pythonpandasnumpyetl

How do I create a new dataframe based on row values of multiple columns in python?


I have multiple columns that contained only 0s or 1s.

Apple Orange Pear
1 0 1
0 0 1
1 1 0

I would like to count and input the number of 0s (in "Wrong" column) and 1s (in "Correct" column) of each column in the new dataframe, and total them up into a table that looks like the following.

Fruit Correct Wrong
Apple 2 1
Orange 1 2
Pear 2 1

I tried a blend of value_counts(), groupby(), and pandas.pivot_table, but got stuck with the manipulation of the table.


Solution

  • Try this:

    df.apply(pd.Series.value_counts).rename(index={0:'Wrong', 1:'Correct'}).T
    

    Use pd.DataFrame.apply to "apply" pd.Series.value_counts to each column of the dataframe, then rename the index values using a dictionary for 0 and 1 to Wrong and Correct. Lastly, use T to transpose the dataframe.

    Output:

            Wrong  Correct
    Apple       1        2
    Orange      2        1
    Pear        1        2
    

    And, you can add .rename_axis('Fruit').reset_index() to get:

        Fruit  Wrong  Correct
    0   Apple      1        2
    1  Orange      2        1
    2    Pear      1        2