Search code examples
pythonrtime-seriesdata-analysis

time series with alternative ways of index


I am now processing time series like data which is in the following shape:

It has three columns say t_1, t_2, att. And t_1 and t_2 are ordered observations of time and att is numerical value.

Toy example of data:

    t_1           t_2        att
    12:30:32      12:33:12   1
    12:30:55      12:33:43   3
    12:31:21      12:34:34   2

The object I want to build is stick to the following rule:

  1. If t_1 is "continuous" then I build a time series object with t_1 as time Index, att and t_2 as value.

  2. If t_1 is not "continuous" and t_2 is "continuous" then I build a time series object with t_2 as time index, t_1 and att as value

  3. If t_1 and t_2 both not continuous, then report message back and build nothing

  4. define interval< 1 hour, say, as continuous

An example of non-continuous t_1 but continuous t_2:

    t_1           t_2        att
    12:30:32      12:33:12   1
    12:30:55      12:33:43   3
    14:31:21      12:34:34   2
    14:33:24      12:35:34   -12

Any ideas for implementation either in python or R will be super welcome. The data will be imported in as dataframe, either pandas dataframe or R dataframe.

Time series object like xts or ts in R


Solution

  • Hopefully, you can build time series objects from the tuples built from this:

    import itertools as it
    import datetime
    data = [['12:30:32', '12:33:12', 1],
            ['12:30:55', '12:33:43', 3],
            ['14:31:21', '12:34:34', 2],
            ['14:33:24', '12:35:34', -12]]
    
    def continuous(series, time_format = '%H:%M:%S', criteria = 3600):
        '''Returns True if time series is continuous.
    
        series -- sequence of strings
        time_format -- str (default '%H:%M:%S')
        criteria -- int (default 3600)
        '''
        # make datetime objects
        t = [datetime.datetime.strptime(thing, time_format) for thing in series]
        # find the deltas
        t2 = (two - one for one, two in it.izip(t, t[1:]))
        # apply the criteria
        return all(item.seconds <= criteria for item in t2)
    
    # extract the time series data
    one, two, values = zip(*data)
    if continuous(one):
        # make tuples - (t1, (t2, att))
        time_series_data = [(t1, (t2, att)) for t1, t2, att in it.izip(one, two, values)]
    elif continuous(two):
        # make tuples - (t2, (t1, att))
        time_series_data = [(t2, (t1, att)) for t1, t2, att in it.izip(one, two, values)]
    else:
        raise ValueError('No Continuous Data')