Search code examples
pythonpandasdataframepandas-groupby

Selecting columns outside of .groupby()


I'm working with a dataset in Python that has three columns: the state (ST), the city (City), and the number of certificates (CERT). I have to find the minimum number of certificates for each state which I have done with the following code:

df.groupby(["ST"])["CERT"].min()

but this returns only the state and the number of certificates when I also need the city to be outputted. I need the city in each state that has the minimum number of certificates. When I run the following code:

df.groupby(["ST", "City"])["CERT"].min()

I am getting the minimum for every single city, when I need the minimum for each state with the city that that minimum belongs to. Is there a way to include the City column outside of the .groupby function?


Solution

  • You could use groupby + idxmin to get the index of the minimum values, then filter df with it using loc:

    index_of_min_certificates = df.groupby('ST')['CERT'].idxmin()
    out = df.loc[index_of_min_certificates]
    

    For example, for df like below:

       ST    City  CERT
    0  CA      LA     0
    1  CA      SF     1
    2  NY     NYC     2
    3  NY  Albany     3
    

    the above code produces:

       ST City  CERT
    0  CA   LA     0
    2  NY  NYC     2