Search code examples
pythonpandaspython-dateutil

Multiple Date Formate to a single date pattern in pandas dataframe


I have a pandas date column in the following format

                                    Date
0                         March 13, 2020, March 13, 2020
1      3.9.2021, 3.9.2021, 03.09.2021, 3. September 2021
2                                                    NaN
3                             May 20, 2022, May 21, 2022

I tried to convert the pattern to a single pattern to store to a new column.

import pandas as pd
import dateutil.parser

# initialise data of lists.
data = {'Date':['March 13, 2020, March 13, 2020', '3.9.2021, 3.9.2021, 03.09.2021, 3. September 2021', 'NaN','May 20, 2022, May 21, 2022']}
 
# Create DataFrame
df = pd.DataFrame(data)
 
df["FormattedDate"] = df.Date.apply(lambda x: dateutil.parser.parse(x.strftime("%Y-%m-%d") ))

But i am getting an error AttributeError: 'str' object has no attribute 'strftime'

Desired Output

                                    Date                           DateFormatted
0                         March 13, 2020, March 13, 2020       2020-03-13, 2020-03-13   
1      3.9.2021, 3.9.2021, 03.09.2021, 3. September 2021       2021-03-09, 2021-03-09, 2021-03-09, 2021-09-03
2                                                    NaN       NaN                     
3                             May 20, 2022, May 21, 2022       2022-05-20, 2022-05-21

Solution

  • I was authot of previous solution, so possible solution is change also it for avoid , like separator and like value in date strings is used Series.str.extractall, converting to datetimes and last is aggregate join:

    format_list = ["[0-9]{1,2}(?:\,|\.|\/|\-)(?:\s)?[0-9]{1,2}(?:\,|\.|\/|\-)(?:\s)?[0-9]{2,4}",
                  "[0-9]{1,2}(?:\.)(?:\s)?(?:(?:(?:j|J)a)|(?:(?:f|F)e)|(?:(?:m|M)a)|(?:(?:a|A)p)|(?:(?:m|M)a)|(?:(?:j|J)u)|(?:(?:a|A)u)|(?:(?:s|S)e)|(?:(?:o|O)c)|(?:(?:n|N)o)|(?:(?:d|D)e))\w*(?:\s)?[0-9]{2,4}",
                  "(?:(?:(?:j|J)an)|(?:(?:f|F)eb)|(?:(?:m|M)ar)|(?:(?:a|A)pr)|(?:(?:m|M)ay)|(?:(?:j|J)un)|(?:(?:j|J)ul)|(?:(?:a|A)ug)|(?:(?:s|S)ep)|(?:(?:o|O)ct)|(?:(?:n|N)ov)|(?:(?:d|D)ec))\w*(?:\s)?(?:\n)?[0-9]{1,2}(?:\s)?(?:\,|\.|\/|\-)?(?:\s)?[0-9]{2,4}(?:\,|\.|\/|\-)?(?:\s)?[0-9]{2,4}",
                  "[0-9]{1,2}(?:\.)?(?:\s)?(?:\n)?(?:(?:(?:j|J)a)|(?:(?:f|F)e)|(?:(?:m|M)a)|(?:(?:a|A)p)|(?:(?:m|M)a)|(?:(?:j|J)u)|(?:(?:a|A)u)|(?:(?:s|S)e)|(?:(?:o|O)c)|(?:(?:n|N)o)|(?:(?:d|D)e))\w*(?:\,|\.|\/|\-)?(?:\s)?[0-9]{2,4}"]
    
    # initialise data of lists.
    data = {'Name':['Today is 09 September 2021', np.nan, '25 December 2021 is christmas', '01/01/2022 is newyear and will be holiday on 02.01.2022 also']}
    
    # Create DataFrame
    df = pd.DataFrame(data)
    
    import dateutil.parser
    f = lambda x: dateutil.parser.parse(x).strftime("%Y-%m-%d")
                                    
    df['DateFormatted'] = df['Name'].str.extractall(f'({"|".join(format_list)})')[0].apply(f).groupby(level=0).agg(','.join)
    print (df)
                                                    Name          DateFormatted
    0                         Today is 09 September 2021             2021-09-09
    1                                                NaN                    NaN
    2                      25 December 2021 is christmas             2021-12-25
    3  01/01/2022 is newyear and will be holiday on 0...  2022-01-01,2022-02-01
    

    Another alternative is processing lists after remove missing values in generato comprehension with join:

    import dateutil.parser
    f = lambda x: dateutil.parser.parse(x).strftime("%Y-%m-%d")
                                    
    df['Date'] = df['Name'].str.findall("|".join(format_list)).dropna().apply(lambda y: ','.join(f(x) for x in y))
    print (df)
                                                    Name                   Date
    0                         Today is 09 September 2021             2021-09-09
    1                                                NaN                    NaN
    2                      25 December 2021 is christmas             2021-12-25
    3  01/01/2022 is newyear and will be holiday on 0...  2022-01-01,2022-02-01