Search code examples
pandassorting

How to sort dataframe by specific data in a column


So, I am studying pandas and one of the tasks was to sort values of the table by borough and holidays(data in this column is only "Y"(if it is holiday) and "N"(if not). Also I avereged pickups(as task asked), but I wanna show data specifically on "Y".

import pandas as pd
df = pd.read_csv("2_taxi_nyc.csv")
df.groupby(["borough", "hday"]).agg({"pickups": 'mean'})type here

Result:

enter image description here

I tried using df.query, but it didnt work (error was smth like exp is not supposed to be a "class bool" and in another try a "list"). After that Ive been stuck and still can`t find a solution or even an idea how to solve this in the Net. I hope someone will help me. I apologize for my poor english(not a first language). Thanks in advance.


Solution

  • You have 2 ways to do that:

    1. filter results after aggregation
    2. filter data before aggregation

    Case 1:

    >>> (df.groupby(['borough', 'hday'], as_index=False)  # <- group labels as columns
           .agg({'pickups': 'mean'}).query("hday == 'Y'"))  # <- filter with query
    
              borough hday      pickups
    1           Bronx    Y    48.065868
    3        Brooklyn    Y   527.011976
    5             EWR    Y     0.041916
    7       Manhattan    Y  2035.928144
    9          Queens    Y   320.730539
    11  Staten Island    Y     1.497006
    

    Case 2:

    >>> df[df['hday'] == 'Y'].groupby('borough', as_index=False)['pickups'].mean()
             borough      pickups
    0          Bronx    48.065868
    1       Brooklyn   527.011976
    2            EWR     0.041916
    3      Manhattan  2035.928144
    4         Queens   320.730539
    5  Staten Island     1.497006
    

    Source dataset: uber_nyc_enriched.csv