I have a requirement where I need to find out the most popular start hour. Following is the code that has helped me in finding the correct solution.
import time
import pandas as pd
import numpy as np
# bunch of code comes
# here
# that help in reaching the following steps
df = pd.read_csv(CITY_DATA[selected_city])
# convert the Start Time column to datetime
df['Start Time'] = pd.to_datetime(df['Start Time'])
# extract hour from the Start Time column to create an hour column
df['hour'] = df['Start Time'].dt.hour
# extract month and day of week from Start Time to create new columns
df['month'] = df['Start Time'].dt.month
df['day_of_week'] = df['Start Time'].dt.weekday_name
# find the most popular hour
popular_hour = df['hour'].mode()[0]
here is a sample o/p that i get when i try to run this query
"print(df['hour'])"
0 15
1 17
2 8
3 13
4 14
5 9
6 9
7 17
8 16
9 17
10 7
11 17
Name: hour, Length: 300000, dtype: int64
The o/p that i get when i use
print(type(df['hour']))
<class 'pandas.core.series.Series'>
The value of the most popular start hour is stored in popular_hour which is equal to "17" (It is the correct value)
However I am not able to understand the part of .mode()[0]
What does this .mode() do and why [0] ?
And will the same concept be to calculate popular month and popular day of the week also irrespective of their datatype
mode
returns a Series:
df.mode()
0 17
dtype: int64
From this, you take the first item by calling
df.mode()[0]
17
Note that a Series is always returned, and sometimes if there are multiple values for mode, they are all returned:
pd.Series([1, 1, 2, 2, 3, 3]).mode()
0 1
1 2
2 3
dtype: int64
You would still take the first value each time and discard the rest. Note that when multiple modes are returned, they are always sorted.
Read the documentation on mode
for more info.