Search code examples
pythonpandasdataframedatetimemask

masking a dataframe to convert strings to datetimes is turning some entries to NaT


I have a dataframe something like this:

Index Hour  type    Horizon      Day    BZ  Value   
0      1    ND       D-2    17-12-2022  FI  7258    
1      1    ND       D-1    16-12-2022  FI  3702    
2      1    ND       D-1    15-12-2022  FI  3702    
3      1    ND       D-1    14-12-2022  FI  3702    
4      1    ND       D-1    13-12-2022  FI  3702    
5      1    ND       D-1    10-12-2022  FI  3702    
6      1    Selected               7    FI  4885    

I tried masking it, like you can see in the code below, then using that mask to convert entries in another column into datetimes but it is also selecting entries for which the mask condition is false and converting them into 'NaT'. I have no idea why.

mask = df_long['type'] == 'ND'
df_long['Day2']= pd.to_datetime(df_long['Day'][mask],format='%d-%m-%Y')

Like so:

Index Hour  type    Horizon      Day    BZ  Value   Day2
0      1    ND       D-2    17-12-2022  FI  7258    2022-12-17 00:00:00
1      1    ND       D-1    16-12-2022  FI  3702    2022-12-16 00:00:00
2      1    ND       D-1    15-12-2022  FI  3702    2022-12-15 00:00:00
3      1    ND       D-1    14-12-2022  FI  3702    2022-12-14 00:00:00
4      1    ND       D-1    13-12-2022  FI  3702    2022-12-13 00:00:00
5      1    ND       D-1    10-12-2022  FI  3702    2022-12-10 00:00:00
6      1    Selected               7    FI  4885    NaT

I would like them to stay as they are.


Solution

  • As said in comment, if you have a column with datetimes, it doesn't really make sense to have other types (such as integers).

    You can do it if you really need to, but this would force an object dtype:

    df_long['Day2'] = df_long['Day'].mask(mask, pd.to_datetime(df_long['Day'].where(mask), format='%d-%m-%Y').astype(object))
    

    Output:

       Index  Hour      type Horizon         Day  BZ  Value                 Day2
    0      0     1        ND     D-2  17-12-2022  FI   7258  2022-12-17 00:00:00
    1      1     1        ND     D-1  16-12-2022  FI   3702  2022-12-16 00:00:00
    2      2     1        ND     D-1  15-12-2022  FI   3702  2022-12-15 00:00:00
    3      3     1        ND     D-1  14-12-2022  FI   3702  2022-12-14 00:00:00
    4      4     1        ND     D-1  13-12-2022  FI   3702  2022-12-13 00:00:00
    5      5     1        ND     D-1  10-12-2022  FI   3702  2022-12-10 00:00:00
    6      6     1  Selected     NaN           7  FI   4885                    7
    

    dtypes:

    df_long.dtypes
    
    Index       int64
    Hour        int64
    type       object
    Horizon    object
    Day        object
    BZ         object
    Value       int64
    Day2       object  # not datetime64
    dtype: object