Search code examples

Remove noise(hours) for parsing time in Y/M/D format

I am parsing the dates of my dataset, but am encountering a lot of ParserError because the hours are often in the wrong format. I've decided to skip the hours and only focus on Years, Months, Days

These are the variants I have for date:

| Startdate |

| --- |

| March 23, 2022 6:00 |

| March 23, 2022 7:0 |

| March 23, 2022 7: |

| March 23, 2022 7 |

For now, only the first date/row works for parsing data. I currently skip the other rows, however I would want to also include them by just excluding the hours.

for date in df_en['Startdate']:



    except Exception:


What is the right way to still parse the other dates without having to bother with hours?

I've tried to convert the time into a valid hours format. using pd.to_datetime did not work because the time format was a str march not number 3. When manually changed towards 3, it still gave the error ValueError: unconverted data remains: :00. Therefore with no relevancy for hours, I just wanted to skip it.


dates = ['December 1, 2021 6:00', 'March 23, 2022 6']

for date in dates:

    date.replace(' (\d{1})', ' 0\\1')

    pd.to_datetime(date, format='%m %d, %Y %H')



| Year | Month | Day |

| --- | --- | --- |

| 2022 | March | 23 |

| 2022 | March | March |


  • If you just need year/month/day columns, there's actually no need to parse to datetime. Just deal with the strings by splitting and rearranging; EX:

    import pandas as pd
    df = pd.DataFrame({'Startdate': ['December 1, 2021 6:00', 'March 23, 2022 6']})
    parts = df['Startdate'].str.split('\ |, ')
    df['year'], df['month'], df['day'] = parts.str[2], parts.str[0], parts.str[1]
    #                Startdate  year     month day
    # 0  December 1, 2021 6:00  2021  December   1
    # 1       March 23, 2022 6  2022     March  23