Search code examples
rdataframefiltersubset

Trying to filter a data frame - looking for set time periods where there is limited variation in another variable


I have a dataframe with multiple columns and they are ordered by a time column with a time stamp every second. I want to search the data frame for 1-minute periods that have limited variation in another variable.

For example, I want every minute in the data frame where the TWS(true wind speed) has a variation of no more than 5 knots. These 1 minute periods should also not overlap.

Once we have the 1-minute sections, create another data frame with each minute of data averaged into rows.

Here is the head of the data

        Date                Time     Lat  Lon   AWA  AWS    TWA  TWS  
1 19/10/2018 2019-02-11 12:06:16 35.8952 14.5  -99.7 8.42  -99.7 8.42 
2 19/10/2018 2019-02-11 12:06:17 35.8952 14.5  -99.1 8.24  -99.1 8.24 
3 19/10/2018 2019-02-11 12:06:18 35.8952 14.5  -99.2 7.34  -99.2 7.34 
4 19/10/2018 2019-02-11 12:06:19 35.8952 14.5  -99.6 6.87  -99.6 6.87 
5 19/10/2018 2019-02-11 12:06:20 35.8952 14.5  -101.1 8.85 -101.1 8.85 
6 19/10/2018 2019-02-11 12:06:21 35.8952 14.5  -101.6 9.39 -101.6 9.39 

Solution

  • library(dplyr)
    library(lubridate)
    df %>% 
       mutate(Date=as.Date(Date), Time=ymd_hms(Time)) %>% 
       group_by(gr=minute(Time)) %>% 
       mutate(flag=max(TWS,na.rm=TRUE)-min(TWS,na.rm=TRUE)) %>% 
       filter(flag<5) %>% 
       mutate_all(.,mean,na.rm=TRUE) %>% distinct()
    
    # A tibble: 1 x 10
    # Groups:   gr [1]
       Date       Time                  Lat   Lon   AWA   AWS   TWA   TWS    gr  flag
       <date>     <dttm>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
    1 0019-10-20 2019-02-11 12:06:17  35.9  14.5 -99.3    8. -99.3    8.     6  1.08
    

    For variation between elements in each group, we can use dplyr::lag:

    ... mutate(flag=TWS-lag(TWS, default = first(TWS))) %>% 
        filter(all(abs(flag)<5)) %>% mutate_all(.,mean,na.rm=TRUE) %>% distinct() 
    

    Data

    df <- read.table(text = "
    Date                Time     Lat  Lon   AWA  AWS    TWA  TWS  
    1 '19/10/2018' '2019-02-11 12:06:16' 35.8952 14.5  -99.7 8.42  -99.7 8.42 
    2 '19/10/2018' '2019-02-11 12:06:17' 35.8952 14.5  -99.1 8.24  -99.1 8.24 
    3 '19/10/2018' '2019-02-11 12:06:18' 35.8952 14.5  -99.2 7.34  -99.2 7.34 
    4 '19/10/2018' '2019-02-11 12:07:19' 35.8952 14.5  -99.6 6.87  -99.6 6.87 
    5 '19/10/2018' '2019-02-11 12:07:20' 35.8952 14.5  -101.1 8.85 -101.1 8.85 
    6 '19/10/2018' '2019-02-11 12:07:21' 35.8952 14.5  -101.6 9.39 -101.6 16.39 
    ", header=TRUE)