Search code examples
python-3.xpandasdataframedate-comparison

Compare date present in different rows of a data-frame using Python


I am reading a document from mongodb and a csv file and merging them both to retrieve duplicate records. I have the following code. Now I want to compare the date (LastUpdate) between these records and return a row which has a latest date. can somebody help?

Code:

import json
import pandas as pd
import xlrd
from pymongo import MongoClient
from functools import reduce

try: 
    client = MongoClient() 
    print("Connected successfully!!!") 
except:   
    print("Could not connect to MongoDB") 

# database 
db = client.conn
collection = db.contactReg

df = pd.DataFrame(list(collection.find()))
print(df)

df1 = df[df.duplicated(['name'], keep = False)]
print(df1)

# reading the csv file
df2 = pd.read_csv(r'C:\Users\swetha1\Desktop\rules.csv')
print(df2)

df3 = pd.merge(df1,df2,on="source")
print(df3)
print(df3.dtypes)

Output:

Connected successfully!!!
data from mongo
    LastUpdate                       _id    name  nameId  source sourceId
0  10-Oct-2018  5bbc86e5c16a27f1e1bd39f8  swetha   123.0   Blore       10
1  11-Oct-2018  5bbc86e5c16a27f1e1bd39f9  swetha   123.0   Mlore       11
2   9-Oct-2018  5bbc86e5c16a27f1e1bd39fa  swathi   124.0   Mlore       11

fetching duplicates
    LastUpdate                       _id    name  nameId  source sourceId
0  10-Oct-2018  5bbc86e5c16a27f1e1bd39f8  swetha   123.0  Blore       10
1  11-Oct-2018  5bbc86e5c16a27f1e1bd39f9  swetha   123.0  Mlore       11

reading CSV file
   source  P.weight  N.weight  Tolerance(days)  Durability(Days)
0  Blore       100      -100                0                 0
1  Mlore       200      -200               30               365

merging
    LastUpdate                       _id    name  nameId  source sourceId  
P.weight  N.weight  Tolerance(days)  Durability(Days)
0  10-Oct-2018  5bbc86e5c16a27f1e1bd39f8  swetha   123.0  Blore       10       
100      -100                0                 0
1  11-Oct-2018  5bbc86e5c16a27f1e1bd39f9  swetha   123.0  Mlore       11       
200      -200               30               365

Solution

  • First convert column to_datetime and then filter by boolean indexing:

    df3['LastUpdate'] = pd.to_datetime(df3['LastUpdate'])
    df4 = df3[df3['LastUpdate'] == df3['LastUpdate'].max()]
    

    Or use idxmax :

    df3['LastUpdate'] = pd.to_datetime(df3['LastUpdate'])
    df4 = df3.loc[[df3['LastUpdate'].idxmax()]]