Search code examples
rmaxminderivativetrend

How to find significant local trends or segments in vector R


I have vectors of barometric pressure data, in millibars. It is common practice to display barometric trends over the past 3 hours, as symbols that split the series into two segments. These segments split the data into variations of falling, rising, and steady classifications (nine different classifications total, see https://static1.squarespace.com/static/561db783e4b0383208b79bb3/t/56833dac1115e07a056a8b53/1451441580880/Pressure+Tendency+%2810%29.pdf for more details) enter image description here

Visually, it's easy to pick out these segments. Programmatically, I'm having trouble automatically detecting the beginning and end of statistically significant LOCAL trends in the data. Also, I would prefer to be able to use 3 or more segments if needed, rather than 2, as two segments is often not representative of the past 3 hours (common for pressure to rise, fall, become steady, or some other 3 or greater-segment pattern during this time).

Using this data and the following code, I have created a dataframe that contains the local maximums and minimums on a time resolution as high as 20 minutes. The find_peaks function was taken from: https://stats.stackexchange.com/questions/22974/how-to-find-local-peaks-valleys-in-a-series-of-data. Note that 210 minutes (3.5 hrs) of data is required as local trends are right-sided.

data <- c(966.6003, 966.5772, 966.6102, 966.5933, 966.5768, 966.565, 966.5139, 
966.4826, 966.4626, 966.4595, 966.5143, 966.5732, 966.636, 966.6833, 966.7229, 
966.7517, 966.7523, 966.7955, 966.8255, 966.7964, 966.8406, 966.9269, 966.9753, 
966.9807, 967.001, 967.0519, 967.0389, 967.107, 967.098, 967.0935, 967.0296, 967.0562, 
966.9952, 967.0031, 966.9976, 966.9888, 966.9832, 967.0474, 966.991, 967.0148, 
967.0944, 967.0556, 967.0732, 967.051, 967.1252, 967.0908, 967.0746, 967.0633, 
967.0821, 967.1274, 967.1229, 967.0838, 967.045, 967.0626, 967.0178, 967.024, 967.0806, 
967.0135, 967.0191, 967.0537, 967.0317, 967.0933, 967.0366, 967.0532, 967.0474, 
967.0831, 967.0982, 967.1099, 967.0803, 967.0351, 967.0497, 967.0833, 967.0707, 
967.1096, 967.1709, 967.1645, 967.1979, 967.1406, 967.1523, 967.1288, 967.1345, 
967.1495, 967.1555, 967.1013, 967.0684, 967.1175, 967.1157, 967.0471, 967.0621, 
967.0117, 966.9926, 966.9926, 966.9672, 966.9619, 966.982, 966.9976, 966.9993, 
967.001, 966.9885, 966.9634, 966.9919, 967.0289, 967.0289, 967.037, 967.0547, 967.0158, 
967.0444, 966.9938, 966.9659, 966.9753, 966.9716, 966.9322, 966.9338, 966.8095, 
966.7881, 966.7939, 966.7171, 966.738, 966.7165, 966.7402, 966.7237, 966.6966, 
966.7085, 966.7268, 966.7099, 966.687, 966.6978, 966.6718, 966.6927, 966.7132, 
966.6773, 966.5689, 966.5837, 966.4875, 966.4287, 966.4399, 966.4438, 966.4666, 
966.4139, 966.3791, 966.3496, 966.3203, 966.2857, 966.2732, 966.2787, 966.2605, 
966.2548, 966.2368, 966.2141, 966.2078, 966.1792, 966.1958, 966.1841, 966.1838, 
966.1778, 966.127, 966.1378, 966.1152, 966.1035, 966.126, 966.0864, 966.1371, 966.1143, 
966.0972, 966.1028, 966.1141, 966.1089, 966.1031, 966.1031, 966.1075, 966.051, 
966.1406, 966.1064, 966.1511, 966.1337, 966.1617, 966.1443, 966.1216, 966.0759, 
966.0251, 966.0699, 966.0417, 966.0471, 966.1147, 966.0694, 966.0919, 966.0938, 966.03, 
966.0525, 966.0794, 966.0342, 966.0557, 966.033, 966.014, 966.0384, 966.0267, 965.981, 
965.9422, 965.9099, 965.8817, 965.8833, 965.8833, 965.9128, 965.879, 965.8757, 
965.8965, 965.9114, 965.8699, 965.8426, 965.8389)

plot(data, type = "l")

find_peaks <- function (x, m){
  shape <- diff(sign(diff(x, na.pad = FALSE)))
  pks <- sapply(which(shape < 0), FUN = function(i){
  z <- i - m + 1
  z <- ifelse(z > 0, z, 1)
  w <- i + m + 1
  w <- ifelse(w < length(x), w, length(x))
  if(all(x[c(z : i, (i + 2) : w)] <= x[i + 1])) return(i + 1) else return(numeric(0))
  })
   pks <- unlist(pks)
   pks}

# make data frame of local max and mins (threshold of 21 minutes width)
maximums_df <- data.frame(trend = 'max', index = find_peaks(data, m = 10))
minimums_df <- data.frame(trend = 'min', index = find_peaks(data, m = 10))
maxmin_df <- rbind(maximums_df, minimums_df)[order(rbind(maximums_df, 
             minimums_df)$index),]

I have tried many different ways of automatically defining appropriate segments in the data. There should be 3 in this data set. Plotting the data clearly shows the first one is rising, the second is steady, then the largest, and last, is falling pressure. The threshold for falling or rising is absolute value of > +- 0.1 mb/hr.

Performing a moving linear regression is plausible, but again, defining the beginning/end of a trend is difficult with this as matching the max/min's of the data series is difficult. There should be a way using just the maxmin_df, I'm thinking, but it's escaping me at the moment. Any help or ideas would be wonderful.


Solution

  • You could try segmented regression which estimates breakpoints. You can eyeball starting values psi=c(50, 100, 150).

    > fit <- lm(y ~ x, df)
    > library(segmented)
    > sfit <- segmented::segmented(fit, seg.Z=~x, psi=c(50, 100, 150), 
    +                              control=seg.control(display=FALSE))
    > summary(sfit)
    
        ***Regression Model with Segmented Relationship(s)***
    
    Call: 
    segmented.lm(obj = fit, seg.Z = ~x, psi = c(50, 100, 150), control = seg.control(display = FALSE))
    
    Estimated Break-Point(s):
               Est. St.Err
    psi1.x  29.531  1.120
    psi2.x 104.665  1.156
    psi3.x 149.889  1.647
    
    Coefficients of the linear terms:
                  Estimate Std. Error  t value Pr(>|t|)    
    (Intercept)  9.664e+02  2.175e-02 44434.73   <2e-16 ***
    x            2.216e-02  1.266e-03    17.50   <2e-16 ***
    U1.x        -2.231e-02  1.302e-03   -17.13       NA    
    U2.x        -1.858e-02  7.221e-04   -25.73       NA    
    U3.x         1.367e-02  7.752e-04    17.63       NA    
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 0.05705 on 202 degrees of freedom
    Multiple R-Squared: 0.9832,  Adjusted R-squared: 0.9826 
    
    Boot restarting based on 6 samples. Last fit:
    Convergence attained in 3 iterations (rel. change 4.2078e-13)
    

    > plot(sfit, ylim=range(df$y))
    > lines(df$y)
    > legend('topright', legend=c('data', 'segmented fit'), lty=1, col=c(1, 'red'))
    

    enter image description here


    Data:

    > df <- data.frame(y=data, x=seq_along(data))  ## from OP