Search code examples
pythonpandasseriesmode

Not able to understand the use of .mode() in python


I have a requirement where I need to find out the most popular start hour. Following is the code that has helped me in finding the correct solution.

import time
import pandas as pd
import numpy as np

# bunch of code comes
# here
# that help in reaching the following steps

df = pd.read_csv(CITY_DATA[selected_city])

# convert the Start Time column to datetime
df['Start Time'] = pd.to_datetime(df['Start Time'])

# extract hour from the Start Time column to create an hour column
df['hour'] = df['Start Time'].dt.hour

# extract month and day of week from Start Time to create new columns
df['month'] = df['Start Time'].dt.month

df['day_of_week'] = df['Start Time'].dt.weekday_name

# find the most popular hour
popular_hour = df['hour'].mode()[0]

here is a sample o/p that i get when i try to run this query

"print(df['hour'])"

0         15
1         17
2          8
3         13
4         14
5          9
6          9
7         17
8         16
9         17
10         7
11        17
Name: hour, Length: 300000, dtype: int64

The o/p that i get when i use

print(type(df['hour']))

<class 'pandas.core.series.Series'>

The value of the most popular start hour is stored in popular_hour which is equal to "17" (It is the correct value)

However I am not able to understand the part of .mode()[0]

What does this .mode() do and why [0] ?

And will the same concept be to calculate popular month and popular day of the week also irrespective of their datatype


Solution

  • mode returns a Series:

    df.mode()
    0    17
    dtype: int64
    

    From this, you take the first item by calling

    df.mode()[0]
    17
    

    Note that a Series is always returned, and sometimes if there are multiple values for mode, they are all returned:

    pd.Series([1, 1, 2, 2, 3, 3]).mode()
    0    1
    1    2
    2    3
    dtype: int64
    

    You would still take the first value each time and discard the rest. Note that when multiple modes are returned, they are always sorted.

    Read the documentation on mode for more info.