Search code examples
pythonpandasdaterangefiscal

How to use Pandas within Python to create a Fiscal Year Column?


I have a code with python that cleans a .csv up before I append it to another data set. It is missing a couple columns so I have been trying to figure how to use Pandas to add the column and fill the rows.

I currently have a column DiscoveredDate in a format of 10/1/2017 12:49.

What I'm trying to do is take that column and anything from the date range 10/1/2016-10/1/2017 have a column FedFY have its row filled with 2017 and like wise for 2018.

Below is my current script minus a few different column cleanups.

    import os
    import re
    import pandas as pd
    import Tkinter
    import numpy as np

    outpath = os.path.join(os.getcwd(), "CSV Altered")

    # TK asks user what file to assimilate
    from Tkinter import Tk
    from tkFileDialog import askopenfilename

    Tk().withdraw()
    filepath = askopenfilename() # show an "Open" dialog box and return the path to the selected file

    #Filepath is acknowledged and disseminated with the following totally human protocols
    filenames = os.path.basename(filepath)

    filename = [filenames]

    for f in filename:
    name = f
    df = pd.read_csv(f)

        # Make Longitude values negative if they aren't already.
        df['Longitude'] = - df['Longitude'].abs()

        # Add Federal Fiscal Year Field (FedFY)
        df['FedFY'] = df['DiscoveredDate']
        df['FedFY'] = df['FedFY'].replace({df['FedFY'].date_range(10/1/2016 1:00,10/1/2017 1:00): "2017",df['FedFY'].date_range(10/1/2017 1:00, 10/1/2018 1:00): "2018"})

I also tried this but figured I was completely fudging it up.

 for rows in df['FedFY']:
    if rows = df['FedFY'].date_range(10/1/2016 1:00, 10/1/2017 1:00):
        then df['FedFY'] =  df['FedFY'].replace({rows : "2017"})
    elif df['FedFY'] =  df['FedFY'].replace({rows : "2018"})

How should I go about this efficiently? Is it just my syntax messing me up? Or do I have it all wrong?

[Edited for clarity in title and throughout.]


Solution

  • If you are concerned only with these two FYs, you can compare your date directly to the start/end dates:

    df["FedFY"] = np.where((df.DiscoveredDate < pd.to_datetime("10/1/2017")) &\
                           (df.DiscoveredDate > pd.to_datetime("10/1/2016")), 
                           2017, 2018)
    

    Any date before 10/1/2016 will be labeled incorrectly! (You can fix this by adding another np.where).

    Make sure that the start/end dates are correctly included or not included (change < and/or > to <= and >=, if necessary).