Search code examples
python-3.xpandasdataframemachine-learningfeature-engineering

Extract Datetime information from a string in a DataFrame column


So I have the Edition Column which contains data in unevenly pattern, as some have ',' followed by the date and some have ',-' pattern.

df.head()

17                Paperback,– 1 Nov 2016
18    Mass Market Paperback,– 1 Jan 1991
19                      Paperback,– 2016
20               Hardcover,– 24 Nov 2018
21        Paperback,– Import, 4 Oct 2018

How can I extract the date to a separate column. I tried using str.split() but can't find specific pattern to extract.Is there any method I could do it?


Solution

  • obj = df['Edition']
    obj.str.split('((?:\d+\s+\w+\s+)?\d{4}$)', expand=True)
    

    or

    obj.str.split('[,–]+').str[0]
    obj.str.split('[,–]+').str[-1] # date