Search code examples
pythonpandasdataframepandas-groupby

select a single value from a column after groupby another columns in python


I tried to select a single value of column class from each group of my dataframe after i performed the groupby function on the column first_register and second_register but it seems did not work.

Suppose I have a dataframe like this:

import numpy as np
import pandas as pd
df = pd.DataFrame({'class': [1, 1, 1, 2, 2, 2, 0, 0, 1],
                   'first_register': ["70/20", "70/20", "70/20", "71/20", "71/20", "71/20", np.NAN, np.NAN, np.NAN],
                   'second_register': [np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, "72/20", "72/20", "73/20"]})

What I have tried and did not work at all:

group_by_df = df.groupby(["first_register", "second_register"])
label_class = group_by_df["class"].unique()
print(label_class)

How can I select/access each single class label from each group of dataframe?

The desired output can be an ordered list like this to represent each class of each group from the first group to the final group:

label_class = [1, 2, 0, 1]

Solution

  • Use GroupBy.first:

    out = df.groupby(["first_register", "second_register"], dropna=False)["class"].first()
    print (out)
    
    first_register  second_register
    70/20           NaN                1
    71/20           NaN                2
    NaN             72/20              0
                    73/20              1
    Name: class, dtype: int64
    
    
    label_class = out.tolist()
    print (label_class)
    [1, 2, 0, 1]