I tried to pull data from 3 cities. How can I read all 3 city data instead of reading one by one below? Do I have duplicated code for reading data below? How to read data from dictionary to avoid the error? Thanks so much.
import csv
with open('C:\\Users\\jasch\\chicago.csv') as chicago_data:
csvReader = csv.reader(chicago_data)
import csv
with open('C:\\Users\\jasch\\new_york_city.csv') as new_york_data:
csvReader = csv.reader(new_york_data)
import csv
with open('C:\\Users\\jasch\\washington.csv') as washington_data:
csvReader = csv.reader(washington_data)
import time
import pandas as pd
import numpy as np
CITY_DATA = { 'chicago': 'chicago.csv',
'new york city': 'new_york_city.csv',
'washington': 'washington.csv' }
df = pd.read_csv(CITY_DATA[city])
df['Start Time'] = pd.to_datetime(df['Start Time'])
df['month'] = df['Start Time'].dt.month
print (df['month'])
NameError Traceback (most recent call last)
<ipython-input-16-b1588646f194> in <module>()
7 'washington': 'washington.csv' }
8
----> 9 df = pd.read_csv(CITY_DATA[city])
10
11 df['Start Time'] = pd.to_datetime(df['Start Time'])
NameError: name 'city' is not defined
3. csv files of city data have almost the same column names below.
Start Time End Time Trip Duration \
0 2017-05-29 18:36:27 2017-05-29 18:49:27 780
1 2017-06-12 19:00:33 2017-06-12 19:24:22 1429
2 2017-02-13 17:02:02 2017-02-13 17:20:10 1088
3 2017-04-24 18:39:45 2017-04-24 18:54:59 914
4 2017-01-26 15:36:07 2017-01-26 15:43:21 434
Start Station End Station \
0 Columbus Dr & Randolph St Federal St & Polk St
1 Kingsbury St & Erie St Orleans St & Merchandise Mart Plaza
2 Canal St & Madison St Paulina Ave & North Ave
3 Spaulding Ave & Armitage Ave California Ave & Milwaukee Ave
4 Clark St & Randolph St Financial Pl & Congress Pkwy
User Type Gender Birth Year
0 Subscriber Male 1991.0
1 Customer NaN NaN
2 Subscriber Female 1982.0
3 Subscriber Male 1966.0
4 Subscriber Female 1983.0
I think you don't need to go through all the trouble of reading in files with the csv module first. You are also re-assigning csvReader two times, so the first two files (Chicago and New York) are not referred to by anything after you are done reading in csv files.
Below is the pandas way of reading in multiple files and combining them into one file:
import pandas as pd
import os
city_data_files = ['C:\\Users\\jasch\\chicago.csv','C:\\Users\\jasch\\new_york_city.csv', 'C:\\Users\\jasch\\washington.csv']
In this line below, we are looping through the list of file paths and creating a DataFrame for each one, leaving us with a list of DataFrames. Additionally we are using the .assign()
method to add a column with the filename. We do this so after combining the DataFrames together we can still tell apart which row came from which file.
dfs = [
pd.read_csv(city_data_file, parse_dates=['Start Time'])\
.assign(filename=os.path.basename(city_data_file))
for city_data_file in city_data_files
]
Now we can go ahead and combine all the DataFrames into one DataFrame.
df = pd.concat(dfs) # this line combines the contents of the files
df['month'] = df['Start Time'].dt.month
As for your error - the stack trace is telling you exactly what the problem is:
----> 9 df = pd.read_csv(CITY_DATA[city])
NameError: name 'city' is not defined
You are using the variable city but have never defined it anywhere in your code.