Search code examples
pandasdatetimedate-conversion

Pandas Series: Decrement DateTime by 100 Years


I have a pandas series as follows...

0  2039-03-16 
1  2056-01-21 
2  2051-11-18 
3  2064-03-05 
4  2048-06-05

Name: BIRTH, dtype: datetime64

It was created from string data as follows

s = data['BIRTH']
s = pd.to_datetime(s)
s

I want to convert all dates after year 2040 to 1940

I can do this for a single record as follows

s.iloc[0].replace(year=d.year-100)

but I really want to just run it over the whole series. I can't work it out. Help!??

PS - I know there's ways outside of pandas using Python's DT module but I'd like to learn how to do this within Pandas please


Solution

  • Using DateOffset is the obvious choice here:

    df['date'] - pd.offsets.DateOffset(years=100)
    
    0   1939-03-16
    1   1956-01-21
    2   1951-11-18
    3   1964-03-05
    4   1948-06-05
    Name: date, dtype: datetime64[ns]
    

    Assign it back:

    df['date'] -= pd.offsets.DateOffset(years=100)
    df
            date
    0 1939-03-16
    1 1956-01-21
    2 1951-11-18
    3 1964-03-05
    4 1948-06-05
    

    We have the offsets module to deal with non-fixed frequencies, it comes in handy in situations like these.


    To fix your code, you'd have wanted to apply datetime.replace rowwise using apply (not recommended):

    df['date'].apply(lambda x: x.replace(year=x.year-100))
    
    0   1939-03-16
    1   1956-01-21
    2   1951-11-18
    3   1964-03-05
    4   1948-06-05
    Name: date, dtype: datetime64[ns]
    

    Or using a list comprehension,

    df.assign(date=[x.replace(year=x.year-100) for x in df['date']])
    
            date
    0 1939-03-16
    1 1956-01-21
    2 1951-11-18
    3 1964-03-05
    4 1948-06-05
    

    Neither of these handle NaT entries very well.