Search code examples
pythonpandasmatplotlibunix-timestampcategorical-data

Unix time conversion in python


I have a machine data set I need to analyze but in parsing an excel file from the line controller PC, the date info (which starts out formatted like "10/09/2020") ends up as what I assume is a unix time stamp that's longer than I've ever seen or can find an example of on SO (e.g. 1599696000000000000).

Efforts I have made to covert it have failed. I am doing a python coursera program and the lecturer did say it is a difficult proposition as there's no standard way and none that I have tried have worked

Anyway, the excel file looks like this

Datum                WKZ_code    Time        Rad_t1   Not Important   Cut_Event
1599696000000000000  W03          00:00:00    100           250           /
1599696000000000000  W03          00:00:01    100           250           /
1599696000000000000  W03          00:00:02    100           250           /
1599696000000000000  W03          00:00:03    100           250           /
1599696000000000000  W03          00:00:04    100           250           /
1599696000000000000  W03          00:00:00    100           250           Speed Cut

I need to be able to bring the Datum column back to "DD/MM/YYYY" format otherwise my plots of the cut data fail as python seems to be unable to determine where to put them on a y axis.

What I tried that appeared the neatest solution was

pd.Timestamp.combine(pd.Timestamp.utcfromtimestamp(df['DATUM'][0]/1E9), pd.to_datetime(df['Time'][0]))

but that produced the error

TypeError                                 Traceback (most recent call last)
<ipython-input-102-b9d1b1186db5> in <module>
----> 1 pd.Timestamp.combine(pd.Timestamp.utcfromtimestamp(df['DATUM'][0]/1E9), pd.to_datetime(df['Time'][0]))

pandas\_libs\tslibs\timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.combine()

TypeError: combine() argument 2 must be datetime.time, not Timestamp

A python developer in another department suggested:

df = pd.read_excel("tool W03 with cuts and dates.xlsx",
                   dtype=object)
df = df[['DATUM']]
df = df['DATUM']=(pd.to_datetime(df['DATUM'],unit='s')) 
df.head()

Which also threw an error around the unit, but I have been unable to come up with an a cure

pandas\_libs\tslib.pyx in pandas._libs.tslib.array_with_unit_to_datetime()

OutOfBoundsDatetime: cannot convert input 1599696000000000000 with the unit 's

If anyone can provide a solution i'd be very, very grateful. I've tried hard to keep to conventions with asking this, apologies in advance for errors.


Solution

  • Use to_datetime with default unit='ns' and convert times to timedeltas by to_timedelta:

    df['Datum'] = pd.to_datetime(df['Datum']) + pd.to_timedelta(df['Time'])
    print (df)
                    Datum WKZ_code      Time  Rad_t1  Not Important  Cut_Event
    0 2020-09-10 00:00:00      W03  00:00:00     100            250          /
    1 2020-09-10 00:00:01      W03  00:00:01     100            250          /
    2 2020-09-10 00:00:02      W03  00:00:02     100            250          /
    3 2020-09-10 00:00:03      W03  00:00:03     100            250          /
    4 2020-09-10 00:00:04      W03  00:00:04     100            250          /
    5 2020-09-10 00:00:00      W03  00:00:00     100            250  Speed Cut