I have tried the following code and it works however it shows excess columns that I don't require. This is the output showing the extra columns:
import pandas as pd
df = pd.read_csv("data.csv")
df = df.groupby(['City1', 'City2']).sum('PassengerTrips')
df['Vacancy'] = 1-df['PassengerTrips'] / df['Seats']
df = df.groupby(['City1','City2']).max('Vacancy')
df = df.sort_values('Vacancy', ascending =False)
print('The 10 routes with the highest proportion of vacant seats:')
print(df[:11])
I have tried to add the following code in after sorting the vacancy values however it gives me an error:
df = df[['City1', 'City2', 'Vacancy']]
City1
and City2
are in index since you applied a groupby
on it.
You can put those in columns using reset_index
to get the expected result :
df = df.reset_index(drop=False)
df = df[['City1', 'City2', 'Vacancy']]
Or, if you want to let City1
and City2
in index, you can do as @Corralien said in his comment : df = df['Vacancy']
And even df = df['Vacancy'].to_frame()
to get a DataFrame
instead of a Serie
.