Search code examples
pandasdatearcpy

ArcPy & Python - Get Latest TWO dates, grouped by Value


I've looked around for the last week to an answer but only see partial answers. Being new to python, I really could use some assistance. I have two fields in a table [number] and [date]. The date format is date and time, so: 07/09/2018 3:30:30 PM. The [number] field is just an integer, but each row may have the same number.

I have tried a few options to gain access to the LATEST date, and I can get these using Pandas:

myarray = arcpy.da.FeatureClassToNumPyArray (fc, ['number', 'date'])
mydf = pd.DataFrame(myarray)
date_index = mydf.groupby(['number'])['date'].transform(max)==mydf['date']

However, I need the latest TWO dates. I've moved on to trying an "IF" statement because I feel arcpy.da.UpdateCursor is better suited to look through the record and update another field by grouping by NUMBER and returning the rows with the latest TWO dates.

End result would like to see the following table grouped by number, latest two dates (as examples):

Number : Date
1       7/29/2018 4:30:44 PM
1       7/30/2018 5:55:34 PM
2       8/2/2018  5:45:23 PM
2       8/3/2018  6:34:32 PM

Solution

  • Try this.

    import pandas as pd
    import numpy as np
    
    # Some data.
    
    data = pd.DataFrame({'number': np.random.randint(3, size = 15), 'date': pd.date_range('2018-01-01', '2018-01-15')})
    
    # Look at the data.
    
    data
    

    Which gives some sample data like this:

    enter image description here

    So in our output we'd expect to see number 0 with the 5th and the 9th, 1 with the 14th and 15th, and 2 with the 6th and the 12th.

    Then we group by number, grab the last two rows, and set and sort the index.

    # Group and label the index.
    
    last_2 = data.groupby('number').tail(2).set_index('number').sort_index()
    
    last_2
    

    Which gives us what we expect.

    enter image description here