Search code examples
pythonpandastimestamp-with-timezone

Get all columns in a pandas dataframe that is a date-column when different time-zones are present in different columns


(Note, this SO question does not take into account the different time-zones i.e it does not catch a date when a time-zone is present).

I have a dataframe where some column dtypes are datetime64[ns] and some are datetime64[ns,UTC].

Note, they have all been converted using

df["some_time_col"] = pd.to_datetime(df["some_time_col"]) but since df is a merge of data across different databases, the time-zones are different.

Is there a way to get all columns that are a date-column i.e something like this

dt_cols = df.select_dtypes(include=[pd.datetime]).columns

instead of having to specify all time-zones like

dt_cols = df.select_dtypes(include=["datetim64","datetime64[UTC]","datetime64[UTC+1]"...).columns

Using either of

dt_cols = df.select_dtypes(include=[np.datetime64]).columns
dt_cols = df.select_dtypes(include=["datetime64"]).columns

does not catch datetime64[ns,UTC] but only datetime64[ns]


Solution

  • Use:

    df.select_dtypes('datetimetz')
    

    From select_dtypes docs:

    To select Pandas datetimetz dtypes, use 'datetimetz' (new in 0.20.0) or 'datetime64[ns, tz]'

    For selecting both datetime without tz and with tz, do:

    df.select_dtypes(['datetimetz', 'np.datetime64'])