Search code examples
python-3.xdatetimepython-dateutildatefinder

why I am getting multiple date from a string which contains only single date and time? Python


I am using the datefinder module in python and I need to extract the DateTime from a string but I am getting multiple dates from the string that contains only one date and time.

Code:

import datefinder
def date_using_datefinder(date_string):
    matches = datefinder.find_dates(date_string)
    for match in matches:
        print(match)

Input:

test3='''
[26/08/2018 06:58:29.126900]
[26/03/2004 06:58:29.126985][SDAP_CODEC][JET_AT_JUMP][43][HTXXHTXX] DUST_RPOP:QFI:9
[02/06/2003 06:58:29.254621][SDAP_CODEC][JET_AT_JUMP][43][HTXXHTXX] DUST_RPOP:QFI:9
[20/05/2022 06:58:29.124898][SDAP_CODEC][JET_AT_JUMP][43][HTXX] DUST_RPOP:QFI:9
[26/08/2020 06:58:29.136579][ALST][stx][29][ggg] JET_AT_JUMP:TRUX_MSGD_HTXX:13265261686865256:QWERT_DUMPING_TDD:45:DUST_RPOP_CVX:32:AXTP_DI:65576
'''

Output:

2018-08-26 06:58:29.126900
2004-03-26 06:58:29.126985
2003-02-06 06:58:29.254621
2022-05-20 06:58:29.124898
2020-08-26 06:58:29.136579
2045-08-02 00:00:00
2032-08-02 00:00:00

why these last two dates are appearing which is I guess nowhere in the string.

PS: I tried DateUtil Module also but it's showing ParseError.

just for reference, the code is:

from datetime import datetime
from dateutil import tz
import dateutil.parser as dparser
import warnings
warnings.filterwarnings('ignore')

def date_Using_UtilModule(date_string):
    res = dparser.parse(date_string, fuzzy = True)
    return res

res = date_Using_UtilModule("[26/08/2020 06:58:29.136579][ALST][stx][29][ggg] JET_AT_JUMP:TRUX_MSGD_HTXX:13265261686865256:QWERT_DUMPING_TDD:45:DUST_RPOP_CVX:32:AXTP_DI:65576")
print(res)

output:
ParserError: Unknown string format: [26/08/2020 06:58:29.136579][ALST][stx][29][ggg] JET_AT_JUMP:TRUX_MSGD_HTXX:13265261686865256:QWERT_DUMPING_TDD:45:DUST_RPOP_CVX:32:AXTP_DI:65576

Note: using regex will not work in my case because my log lines can have random patterns and also any DateTime format, or I can say not want to use regex.


Solution

  • Let me help myself

    I have created a python library to do my task if anyone else is needed can also use this lib. tried to cover most of the date-time format and will update for more.

    It's time to use our own library

    Installation -> pip install MyDateTimeLib==0.1.2
    Importing as -> from MyDateTimeLib import myfunction
    How to use? --> myfunction.date_find("passing date string")
    Returns     --> it return the dictionary containing all the dates from the string else null dict.
    check on    --> https://pypi.org/project/MyDateTimeLib/0.1.2/
    

    DEMO:

    data_for_date = '''
    [26/08/2018 06:58:29.126900]
    [26/03/2004 06:58:29.126985][SDAP_CODEC][JET_AT_JUMP][43][HTXXHTXX] DUST_RPOP:QFI:9
    [02/06/2003 06:58:29.254621][SDAP_CODEC][JET_AT_JUMP][43][HTXXHTXX] DUST_RPOP:QFI:9[26/03/2036 06:58:29.126985]
    [20/05/2022 06:58:29.124898][SDAP_CODEC][JET_AT_JUMP][43][HTXX] DUST_RPOP:QFI:9
    [26/08/2020 06:58:29.136579][ALST][stx][29][ggg] JET_AT_JUMP:TRUX_MSGD_HTXX:13265261686865256:QWERT_DUMPING_TDD:45:DUST_RPOP_CVX:32:AXTP_DI:65576
    '''
    

    CODE:

    from MyDateTimeLib import myfunction
    for x in data_for_date.splitlines():
        if len(x)>1:
            dic = myfunction.date_find(x)
            print()
            for k,v in dic.items():
                print(k,v)
                
    

    OUTPUT:

    Date:0 2018-08-26 06:58:29.126900
    
    Date:0 2004-03-26 06:58:29.126985
    
    Date:0 2003-02-06 06:58:29.254621
    Date:1 2036-03-26 06:58:29.126985
    
    Date:0 2022-05-20 06:58:29.124898
    
    Date:0 2020-08-26 06:58:29.136579