I have a code with python that cleans a .csv up before I append it to another data set. It is missing a couple columns so I have been trying to figure how to use Pandas to add the column and fill the rows.
I currently have a column DiscoveredDate in a format of 10/1/2017 12:49.
What I'm trying to do is take that column and anything from the date range 10/1/2016-10/1/2017 have a column FedFY have its row filled with 2017 and like wise for 2018.
Below is my current script minus a few different column cleanups.
import os
import re
import pandas as pd
import Tkinter
import numpy as np
outpath = os.path.join(os.getcwd(), "CSV Altered")
# TK asks user what file to assimilate
from Tkinter import Tk
from tkFileDialog import askopenfilename
Tk().withdraw()
filepath = askopenfilename() # show an "Open" dialog box and return the path to the selected file
#Filepath is acknowledged and disseminated with the following totally human protocols
filenames = os.path.basename(filepath)
filename = [filenames]
for f in filename:
name = f
df = pd.read_csv(f)
# Make Longitude values negative if they aren't already.
df['Longitude'] = - df['Longitude'].abs()
# Add Federal Fiscal Year Field (FedFY)
df['FedFY'] = df['DiscoveredDate']
df['FedFY'] = df['FedFY'].replace({df['FedFY'].date_range(10/1/2016 1:00,10/1/2017 1:00): "2017",df['FedFY'].date_range(10/1/2017 1:00, 10/1/2018 1:00): "2018"})
I also tried this but figured I was completely fudging it up.
for rows in df['FedFY']:
if rows = df['FedFY'].date_range(10/1/2016 1:00, 10/1/2017 1:00):
then df['FedFY'] = df['FedFY'].replace({rows : "2017"})
elif df['FedFY'] = df['FedFY'].replace({rows : "2018"})
How should I go about this efficiently? Is it just my syntax messing me up? Or do I have it all wrong?
[Edited for clarity in title and throughout.]
If you are concerned only with these two FYs, you can compare your date directly to the start/end dates:
df["FedFY"] = np.where((df.DiscoveredDate < pd.to_datetime("10/1/2017")) &\
(df.DiscoveredDate > pd.to_datetime("10/1/2016")),
2017, 2018)
Any date before 10/1/2016 will be labeled incorrectly! (You can fix this by adding another np.where
).
Make sure that the start/end dates are correctly included or not included (change <
and/or >
to <=
and >=
, if necessary).