Search code examples
pythonnumpydatetime-formatunix-timestamp

Converting np.array of unix timestamps (dtype '<U21') to np.datetime64


I am looking to process a large amount of data, so I am interested in the fastest way to compute the following:

I have the below np.array as part of an np.ndarray, which I would like to convert from '<U21' to 'np.datetime64' (ms).

When I execute the following code on one entry, it works:

tmp_array[:,0][0].astype(int).astype('datetime64[ms]')

Result: numpy.datetime64('2019-10-09T22:54:00.000')

When I execute the same on the sub-array like so:

tmp_array[:,0] = tmp_array[:,0].astype(int).astype('datetime64[ms]')

I always get the following error:

RuntimeError: The string provided for NumPy ISO datetime formatting was too short, with length 21

numpy version 1.22.4

array(['1570661640000', '1570661700000', '1570661760000'],dtype='<U21')

Solution

  • I am sure there is a way to use the power of numpy to do this more efficiently but this approach works: Given your tmp_array of the form:

    array(['1570661640000', '1570661700000', '1570661760000'], dtype='<U21')  
    

    express the unix base date as:

    db = np.datetime64('1970-01-01')  
    

    then create the desired datetime array by:

    cnvrt_array = np.array([db + np.timedelta64(int(x), 'ms') for x in tmp_array])  
    

    This yields the array:

    array(['2019-10-09T22:54:00.000', '2019-10-09T22:55:00.000',
           '2019-10-09T22:56:00.000'], dtype='datetime64[ms]')  
    

    As suggested by @FObersteiner, you can utilize the power of numpy to convert the array at a rate which is an order of magnitude faster than the list comprehension approach.

    cvrted_array = tmp_array..astype(np.longlong).astype("datetime64[ms]") 
    

    which yields the same results as the list comprehension