Suppose I have a large set of strings I want to parse to a set of datetime objects. I could use the dateutils.parser
and iterate through the set but it is more computer intensive and takes a longer time than parsing one, retrieving the strptime format applied and just do datetime.strptime(string, model)
.
I wanted to create a function, a bit like the following:
def retrieve_format(datetime_object, string):
#do some things
return model
with the model
being a string.
I have found nothing that explains the inner workings of the dateutils parser, and I believe the developers have the ability to add such a feature.
Any idea on how to do it ? It would save time and computing power.
Example
Suppose I have a set of string that are formatted the same way as this one:
myStr = '27/03/2020 - 16:20'
I could do
myDate = dateutils.parser.parse(myStr)
and get 'myDate' as being
datetime.datetime(2020, 3, 27, 16, 20)
but now I could use my function as such
>>> model = retrieve_format(myDate, myStr)
>>> print(model)
%d/%m/%Y - %H:%M
I could then do
datetime_set = {}
for formatted_string in set:
raw = datetime.datetime.strptime(formatted_string, model)
datetime_set.add(raw)
to treat all the other elements very efficiently.
Okay so thanks to snakecharmerb's comment on my question, I found this comment which uses the dateinfer library. Here, just the string is needed. Installation with pip is possible
pip install pydateinfer
A working example would be the following
import dateinfer
dateinfer.infer(['27/03/2020 - 16:20', '28/03/2020 - 14:56' ])
and the output is
'%d/%m/%Y - %H:%M'
The input is always a list, even if it contains only one element. Depending on the ambiguity of the string, the list should have more or less elements. That is because for example in '04/04/2020', we have no means of distinguishing the day or the month.