Search code examples
pythondatetimeparsingstrptime

How to retrieve strptime model with string and datetime object available?


Suppose I have a large set of strings I want to parse to a set of datetime objects. I could use the dateutils.parser and iterate through the set but it is more computer intensive and takes a longer time than parsing one, retrieving the strptime format applied and just do datetime.strptime(string, model).

I wanted to create a function, a bit like the following:

def retrieve_format(datetime_object, string):
    #do some things
    return model

with the model being a string.

I have found nothing that explains the inner workings of the dateutils parser, and I believe the developers have the ability to add such a feature.

Any idea on how to do it ? It would save time and computing power.

Example

Suppose I have a set of string that are formatted the same way as this one:

myStr = '27/03/2020 - 16:20'

I could do

myDate = dateutils.parser.parse(myStr)

and get 'myDate' as being

datetime.datetime(2020, 3, 27, 16, 20)

but now I could use my function as such

>>> model = retrieve_format(myDate, myStr)
>>> print(model)
%d/%m/%Y - %H:%M

I could then do

datetime_set = {}
for formatted_string in set:
    raw = datetime.datetime.strptime(formatted_string, model)
    datetime_set.add(raw)

to treat all the other elements very efficiently.


Solution

  • Okay so thanks to snakecharmerb's comment on my question, I found this comment which uses the dateinfer library. Here, just the string is needed. Installation with pip is possible

    pip install pydateinfer
    

    A working example would be the following

    import dateinfer
    dateinfer.infer(['27/03/2020 - 16:20', '28/03/2020 - 14:56' ])
    

    and the output is

    '%d/%m/%Y - %H:%M'
    

    The input is always a list, even if it contains only one element. Depending on the ambiguity of the string, the list should have more or less elements. That is because for example in '04/04/2020', we have no means of distinguishing the day or the month.