Search code examples
pythonpandascalculated-columnscolumnsorting

How can I show only some columns using Python Pandas?


I have tried the following code and it works however it shows excess columns that I don't require. This is the output showing the extra columns: enter image description here

    import pandas as pd
df = pd.read_csv("data.csv")
df = df.groupby(['City1', 'City2']).sum('PassengerTrips')
df['Vacancy'] = 1-df['PassengerTrips'] / df['Seats']
df = df.groupby(['City1','City2']).max('Vacancy')
df = df.sort_values('Vacancy', ascending =False)
print('The 10 routes with the highest proportion of vacant seats:')
print(df[:11])

I have tried to add the following code in after sorting the vacancy values however it gives me an error:

df = df[['City1', 'City2', 'Vacancy']]

Solution

  • City1 and City2 are in index since you applied a groupby on it. You can put those in columns using reset_index to get the expected result :

    df = df.reset_index(drop=False)
    df = df[['City1', 'City2', 'Vacancy']]
    

    Or, if you want to let City1 and City2 in index, you can do as @Corralien said in his comment : df = df['Vacancy']

    And even df = df['Vacancy'].to_frame() to get a DataFrame instead of a Serie.