python-3.x datetime parsing python-dateutil

how to get only date string from a long string

I know there are lots of Q&As to extract datetime from string, such as dateutil.parser, to extract datetime from a string

import dateutil.parser as dparser
dparser.parse('something sep 28 2017 something',fuzzy=True).date()

output: datetime.date(2017, 9, 28)

but my question is how to know which part of string results this extraction, e.g. i want a function that also returns me 'sep 28 2017'

datetime, datetime_str = get_date_str('something sep 28 2017 something')
outputs: datetime.date(2017, 9, 28), 'sep 28 2017'

any clue or any direction that i can search around?

Solution

Extend to the discussion with @Paul and following the solution from @alecxe, I have proposed the following solution, which works on a number of testing cases, I've made the problem slight challenger:

Step 1: get excluded tokens

import dateutil.parser as dparser

ostr = 'something sep 28 2017 something abcd'
_, excl_str = dparser.parse(ostr,fuzzy_with_tokens=True)

gives outputs of:

excl_str:     ('something ', ' ', 'something abcd')

Step 2 : rank tokens by length

excl_str = list(excl_str)
excl_str.sort(reverse=True,key = len)

gives a sorted token list:

excl_str:   ['something abcd', 'something ', ' ']

Step 3: delete tokens and ignore space element

for i in excl_str:
    if i != ' ':
        ostr = ostr.replace(i,'') 
return ostr

gives a final output

ostr:    'sep 28 2017 '

Note: step 2 is required, because it will cause problem if any shorter token a subset of longer ones. e.g., in this case, if deletion follows an order of ('something ', ' ', 'something abcd'), the replacement process will remove something from something abcd, and abcd will never get deleted, ends up with 'sep 28 2017 abcd'