Search code examples
rtime-seriesinterpolationxts

How can I interpolate my time series to have the same size (for clustering purpose)?


I have a list (df_list) of 40 time series (class "xts" "zoo") with different lengths and would like to interpolate them against the time series with the longest length to obtain time series of the same length. Time series begin and end on different dates/times. There is a lack of values at the extremes (before the beginning and after the end). May I add "Na" at at the start and/or at the end.

Start must be at :2012-12-01 12:52:00
End must be at :2012-12-19 12:56:00


Date: POSIXct
Num_tweets: number of tweets at a certain Date/time 

Here below the structure of 2 of the time series in the list:

1. 
Date, Num_tweets
2012-12-01 12:52:00, 3
2012-12-01 12:53:00, 3
2012-12-01 12:54:00, 1
2012-12-01 12:55:00, 2
2012-12-01 12:56:00, 3
2012-12-01 12:57:00, 0
2012-12-01 12:58:00, 0
2012-12-01 12:59:00, 3
2012-12-01 13:00:00, 2
2012-12-01 13:01:00, 0
2012-12-01 13:02:00, 1


2. 
1. 
Date, Num_tweets
2012-12-01 13:52:00, 3
2012-12-01 13:53:00, 3
2012-12-01 13:54:00, 1
2012-12-01 13:55:00, 2
2012-12-01 13:56:00, 3
2012-12-01 13:57:00, 0
2012-12-01 13:58:00, 0
2012-12-01 13:59:00, 3
2012-12-01 13:00:00, 2
2012-12-01 13:01:00, 0
2012-12-01 13:02:00, 1
2012-12-01 13:03:00, 0
2012-12-01 13:04:00, 3
2012-12-01 13:05:00, 2
2012-12-01 13:06:00, 0
2012-12-01 13:07:00, 1

What I tried :

series <- reinterpolate(df_list, new.length = max(lengths(df_list)))

This error is came up :

Error in stats::approx(x, method = "linear", n = new.length) : 
  need at least two non-NA values to interpolate

How can i solve this problem ? Thank you in advice !


Solution

  • This will be done automatically using merge.xts(df1,df2) where df1 and df2 are the 1st and 2nd time series within your list df_list

    NA will apppear if data is missing at time T within one of the time series.