Search code examples
pythonpandasvalueerror

ValueError: time data "2020-04-29T17:18:25.574824Z" doesn't match format "%Y-%m-%dT%H:%M:%S%z", at position 3 - I am changing object dtype to datetime


I am trying change object dtype to datetime

# My Code:

import pandas as pd

# Sample data
data = {'Channel_Started': ['2013-05-18T04:46:00Z', '2018-01-16T15:55:22Z', '2016-12-12T05:00:55Z', '2020-04-29T17:18:25.574824Z', '2023-11-24T21:56:18.571502Z', '2020-06-13T05:20:37.182391Z', '2015-08-22T13:22:51Z', '2016-11-16T14:52:31Z', '2020-05-09T20:35:27.233665Z', '2022-03-16T22:09:57.246468Z', '2023-02-11T05:55:01.369504Z', '2023-03-10T12:18:40.189285Z', '2005-12-16T09:01:28Z', '2013-09-05T01:15:06Z', '2017-07-13T09:30:23Z', '2020-08-05T16:09:28.304314Z']}

# Create DataFrame
df = pd.DataFrame(data)

# Convert to datetime
df['Channel_Started'] = pd.to_datetime(df['Channel_Started'])

# Extract date
df['Channel_Started'] = df['Channel_Started'].dt.date

print(df)

Output:

# ValueError: time data "2020-04-29T17:18:25.574824Z" doesn't match format "%Y-%m-%dT%H:%M:%S%z", at position 3. You might want to try:

- passing `format` if your strings have a consistent format;
- passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
- passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

Based on the error output provided, it seems that there is an issue with the format of the datetime string "2020-04-29T17:18:25.574824Z" not matching the expected format "%Y-%m-%dT%H:%M:%S%z". The code is attempting to convert this string to a datetime object using the Pandas library in Python, but it is encountering a ValueError because the format specified does not match the actual datetime string.

I expecting This output:

   Channel_Started
0       2013-05-18
1       2018-01-16
2       2016-12-12
3       2020-04-29
4       2023-11-24
5       2020-06-13
6       2015-08-22
7       2016-11-16
8       2020-05-09
9       2022-03-16
10      2023-02-11
11      2023-03-10
12      2005-12-16
13      2013-09-05
14      2017-07-13
15      2020-08-05

Solution

  • You have mixed formats. The actual format can be inferred for each value as follows:

    import pandas as pd
    
    data = {
        "Channel_Started": [
            "2013-05-18T04:46:00Z",
            "2018-01-16T15:55:22Z",
            "2016-12-12T05:00:55Z",
            "2020-04-29T17:18:25.574824Z",
            "2023-11-24T21:56:18.571502Z",
            "2020-06-13T05:20:37.182391Z",
            "2015-08-22T13:22:51Z",
            "2016-11-16T14:52:31Z",
            "2020-05-09T20:35:27.233665Z",
            "2022-03-16T22:09:57.246468Z",
            "2023-02-11T05:55:01.369504Z",
            "2023-03-10T12:18:40.189285Z",
            "2005-12-16T09:01:28Z",
            "2013-09-05T01:15:06Z",
            "2017-07-13T09:30:23Z",
            "2020-08-05T16:09:28.304314Z",
        ]
    }
    
    df = pd.DataFrame(data)
    
    df["Channel_Started"] = pd.to_datetime(df["Channel_Started"], format="mixed")
    
    print(df)
    

    Output:

                        Channel_Started
    0         2013-05-18 04:46:00+00:00
    1         2018-01-16 15:55:22+00:00
    2         2016-12-12 05:00:55+00:00
    3  2020-04-29 17:18:25.574824+00:00
    4  2023-11-24 21:56:18.571502+00:00
    5  2020-06-13 05:20:37.182391+00:00
    6         2015-08-22 13:22:51+00:00
    7         2016-11-16 14:52:31+00:00
    8  2020-05-09 20:35:27.233665+00:00
    9  2022-03-16 22:09:57.246468+00:00
    10 2023-02-11 05:55:01.369504+00:00
    11 2023-03-10 12:18:40.189285+00:00
    12        2005-12-16 09:01:28+00:00
    13        2013-09-05 01:15:06+00:00
    14        2017-07-13 09:30:23+00:00
    15 2020-08-05 16:09:28.304314+00:00