Search code examples
pythondataframedatetimeazure-databricks

How to compare date(different format) from a list of Variables in Python


I need to extract the string variable with latest timestamp from a list.

The variables are in below format: |Name| |:---| |First_Record2022-10-11_NameofRecord.txt| |Second_Record_20221017.txt|

for now, i am fetching this in a list and iterating in a for loop to get the latest date from the two records using below line of code:

 ```python
  for index,rows in df.iterrows:
    datestr=rows['name'].replace('-','')
    datestr=re.search(r'\d{8}|\d{6}',datestr).group()    
    date=DT.datetime.strptime(datestr,'%Y%m%d')   
    print('{:23}-->{}'.format(rows['name'],date))```

But this is only giving me date back. How do i compare the two strings and find out the string with latest date as in while comparing these two variables - "First_Record2022-10-11_NameofRecord.txt" and "Second_Record_20221017.txt ", i should be able to get "Second_Record_20221017.txt " as result.


Solution

  • IIUC , is that what you're looking for?

    df['date']= df['Name'].str.extract(r'(\d{4}.*?(?=[_|\.]))').replace(r'-','',regex=True)
    df.sort_values('date').tail(1)['Name'].squeeze()
    
    
    'Second_Record_20221017.txt'