Search code examples
pandaspyarrow

get seconds from pandas timedelta with pyarrow dtype


I have a dataframe with pyarrow dtypes such as `duration[ns][pyarrow]'.

Using good old numpy dtypes, I can get the seconds using

foo['DURATION_NEW'].dt.total_seconds()

but pyarrow equivalent gives me an AttributeError: Can only use .dt accessor with datetimelike values

Sadly, the usually helpful pandas documentation is rather short regarding pyarrow-dtype differences. https://pandas.pydata.org/docs/user_guide/pyarrow.html

Furthermore, I couldn't find a (official/helpful) migration guide from numpy to pyarrow backend covering this case. I am using pandas 2.1.3.


Solution

  • The error can be triggered with the example below :

    df = pd.DataFrame(
        {"DURATION_NEW": [pd.Timedelta(minutes=60), pd.Timedelta(seconds=1000)]},
        dtype=pd.ArrowDtype(pa.duration("ns"))
    )
    
    df.dtypes
    #DURATION_NEW    duration[ns][pyarrow]
    
    df["DURATION_NEW"].dt.total_seconds()
    # AttributeError: Can only use .dt accessor with datetimelike values
    

    But unfortunately, there is an open issue (see apache/arrow#33962) because pyarrow can't compute functions for timedeltas yet (see pandas-dev/pandas#52284).

    As a workaround, you can try using apply/ Timedelta.seconds :

    df["DURATION_NEW"].apply(lambda td: td.seconds)
    
    # 0    3600
    # 1    1000
    # Name: DURATION_NEW, dtype: int64
    

    Or with a listcomp :

    [td.seconds for td in df["DURATION_NEW"]]
    # [3600, 1000]