Search code examples
pythonpandasdatetimetimestamp

Out of bound timestamps in pandas


I need to rewrite some sql code to python, and my problem is necessity of calculation differences in days: enter image description here As you can see, for cases with final_pmt_date ‘9999-12-31’, the dates subtracted easily.

But in pandas there is limit for datetime64 type, so I get exception: enter image description here

All answers I saw were about converting this dates into NaN (with ‘coerce’ keyword). But I need to calculate number of days for such datetimes also.

Thank you in advance


Solution

  • A date like 9999-12-31 is out of range for pandas datetime.

    Using vanilla Python datetime might be an alternative here, e.g. like

    from datetime import datetime
    import pandas as pd
    
    df = pd.DataFrame(
        {
            "open": ["2021-12-27 00:00:00.000", "2019-03-06 00:00:00.000"],
            "close": ["9999-12-31 00:00:00.000", "2022-04-06 00:00:00.000"],
        }
    )
    
    df["delta"] = df.apply(
        (
            lambda row: datetime.fromisoformat(row["close"])
            - datetime.fromisoformat(row["open"]),
        ),
        axis=1,
    )
    
    df
                          open                    close                  delta
    0  2021-12-27 00:00:00.000  9999-12-31 00:00:00.000  2913908 days, 0:00:00
    1  2019-03-06 00:00:00.000  2022-04-06 00:00:00.000     1127 days 00:00:00
    

    However note that you'll have to use an apply which is not very efficient compared to the "vectorized" pandas datetime methods. Maybe using NaT as an "invalid-value-identfier" is an option after all?