Search code examples
pythonpandasvalueerror

Python ValueError: Both dates must have the same UTC offset with .loc


I am having trouble using loc to get all entries in a pandas DataFrame between two periods. For example, the two lines below both give me a value:

periods.loc[start]

periods.loc[end]

However, when i run the following, I get a ValueError: "Both dates must have the same UTC offset":

periods.loc[start:end]

I was expecting to get a DataFrame with all of the entries that fall between those two dates. All of the entries are a string in the form of "yyyy-mm-dd". Here are the first 10 entries of periods:

0    2007-01-25 09:10:02
1    2007-01-26 07:03:01
2    2007-02-02 04:50:51
3    2007-02-06 07:54:35
4    2007-02-07 06:31:05
5    2007-02-07 09:09:47
6    2007-02-07 09:43:12
7    2007-02-09 07:34:55
8    2007-02-13 04:32:04
9    2007-02-15 06:30:51

Solution

  • You need to convert your data into datetime.datetime objects. Here is a complete example of how you can do that:

    >>> import pandas as pd
    >>> from datetime import datetime
    
    
    >>> df = pd.DataFrame({"date": ["2007-01-25 09:10:02", "2007-01-26 07:03:01",
    ...                             "2007-02-02 04:50:51", "2007-02-06 07:54:35",
    ...                             "2007-02-07 06:31:05", "2007-02-07 09:09:47",
    ...                             "2007-02-07 09:43:12", "2007-02-09 07:34:55",
        ...                         "2007-02-13 04:32:04", "2007-02-15 06:30:51"]})
    >>> # convert the date column to datetime object
    >>> df["date"] = pd.to_datetime(df["date"], format="%Y-%m-%d %H:%M:%S")
    >>> df
                     date
    0 2007-01-25 09:10:02
    1 2007-01-26 07:03:01
    2 2007-02-02 04:50:51
    3 2007-02-06 07:54:35
    4 2007-02-07 06:31:05
    5 2007-02-07 09:09:47
    6 2007-02-07 09:43:12
    7 2007-02-09 07:34:55
    8 2007-02-13 04:32:04
    9 2007-02-15 06:30:51
    

    Now, let's slice it using start and end which are both dates:

    >>> start = "2007-01-25 09:10:02"
    >>> end = "2007-02-07 08:53:51"
    
    >>> # convert start and end from string to datetime object
    >>> start = datetime.strptime(start, '%Y-%m-%d %H:%M:%S')
    >>> end = datetime.strptime(end, '%Y-%m-%d %H:%M:%S')
    
    >>> # let's access some values
    >>> df.loc[(df["date"] >= start) & (df["date"] < end)] #mimics the slice object
                     date
    0 2007-01-25 09:10:02
    1 2007-01-26 07:03:01
    2 2007-02-02 04:50:51
    3 2007-02-06 07:54:35
    4 2007-02-07 06:31:05
    

    Also, you can access any particular date using loc:

    >>> new_date = "2007-02-07 06:31:05"
    >>> new_date = datetime.strptime(new_date, '%Y-%m-%d %H:%M:%S')
    >>> df.loc[df["date"] == new_date]
                     date
    4 2007-02-07 06:31:05