Search code examples
pythonpandastimestamppython-dateutil

How to interpret values in a .txt data file as a time series


I have a data file that has values in it like this:

@ DD MM YYYY HH MN SS Hs Hrms Hma x Tz Ts Tc THmax EP S T0 2 Tp Hrms EPS

29 11 2000 13 17 56 2.44 1.71 3.12 9.12 11.94 5.03 12.74 .83 8.95 15.03 1.80 .86
29 11 2000 13 31 16 2.43 1.74 4.16 9.17 11.30 4.96 11.70 .84 8.84 11.86 1.80 .87

I use the following to get the data in:

infile = open ("testfile.txt", 'r')
data = np.genfromtxt(infile,skiprows=2) 

which gives me a numpy.ndarray

I want to be able to interpret the first 0-5 columns as a timestamp (DD:MM:YYY:HH:MN:SS), but this is where I get stumped - there seems to be a million ways to do it and I don't know what's best.

I've been looking at dateutil and pandas - I know there is something blindingly obvious I should do, but am at a loss. Should I convert to a csv format first? Somehow concatenate the values from each row (cols 0-5) using a for loop?

After this I'll plot values from other columns against the timestamps/deltas.

I'm totally new to python, so any pointers appreciated :)


Solution

  • Here's a pandas solution for you:

    test.csv:

    29 11 2000 13 17 56 2.44 1.71 3.12 9.12 11.94 5.03 12.74 .83 8.95 15.03 1.80 .86
    29 11 2000 13 31 16 2.43 1.74 4.16 9.17 11.30 4.96 11.70 .84 8.84 11.86 1.80 .87
    

    pandas provide a read_csv util for reading the csv, you should give the following parameters to parse your file:

    1. delimiter: the default one is comma, so you need to set it as a space
    2. parse_dates: those date columns (order sensitive)
    3. date_parser: the default is dateutil.parser.parse, but seems it doesn't work for your case, so you should implement your own parser
    4. header: if your csv doesn't have the column name, you should set it as None

    Finally, here the sample code:

    In [131]: import datetime as dt
    
    In [132]: import pandas as pd
    
    In [133]: pd.read_csv('test.csv', 
                           parse_dates=[[2,1,0,3,4,5]], 
                           date_parser=lambda *arr:dt.datetime(*[int(x) for x in arr]),
                           delimiter=' ', 
                           header=None)
    Out[133]:
              2_1_0_3_4_5     6     7     8     9     10    11     12    13    14  \
    0 2000-11-29 13:17:56  2.44  1.71  3.12  9.12  11.94  5.03  12.74  0.83  8.95
    1 2000-11-29 13:31:16  2.43  1.74  4.16  9.17  11.30  4.96  11.70  0.84  8.84
    
          15   16    17
    0  15.03  1.8  0.86
    1  11.86  1.8  0.87