Search code examples
rsequencedplyrmutated

R dplyr identifying a condition sequence in one column and mutating another (or so)


I have eye tracking data in the following form:

    smp    x   y  time dur
1     1  491 798    62   0
2     2  491 798    62   0
3     3  491 798    62   0
4     4  491 798    62   0
5     5  491 798    62   0
6     6  491 798    62   0
7     7  491 798    62   0
8     8  491 798    62   0
9     9  491 798    62   0
10   10  494 798   781 719
11   11  492 794   828  47
12   12  491 787   953 125
13   13  496 625   984  31
14   14  500 535  1046  62
15   15  544 488  1109  63
16   16  567 465  1171  62
17   17  582 453  1234  63

When the dur (final column) is zero, the subject has their eyes closed but blinks take a certain amount of time to execute and additionally this equipment is old and the sampling/logging rate is not very precise.

I am hoping for a dplyr approach that mutates a blinks column true or false if zeros in dur are >= 4 sequence of 0s.

Expected Output

   smp   x   y time dur blink
1    1 491 798   62   0  TRUE
2    2 491 798   62   0  TRUE
3    3 491 798   62   0  TRUE
4    4 491 798   62   0  TRUE
5    5 491 798   62   0  TRUE
6    6 491 798   62   0  TRUE
7    7 491 798   62   0  TRUE
8    8 491 798   62   0  TRUE
9    9 491 798   62   0  TRUE
10  10 494 798  781 719 FALSE
11  11 492 794  828  47 FALSE
12  12 491 787  953 125 FALSE
13  13 496 625  984  31 FALSE
14  14 500 535 1046  62 FALSE
15  15 544 488 1109  63 FALSE
16  16 567 465 1171  62 FALSE
17  17 582 453 1234  63 FALSE

Reproducible Data

structure(list(smp = 1:17, x = c(491L, 491L, 491L, 491L, 491L, 
491L, 491L, 491L, 491L, 494L, 492L, 491L, 496L, 500L, 544L, 567L, 
582L), y = c(798L, 798L, 798L, 798L, 798L, 798L, 798L, 798L, 
798L, 798L, 794L, 787L, 625L, 535L, 488L, 465L, 453L), time = c(62L, 
62L, 62L, 62L, 62L, 62L, 62L, 62L, 62L, 781L, 828L, 953L, 984L, 
1046L, 1109L, 1171L, 1234L), dur = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 719L, 47L, 125L, 31L, 62L, 63L, 62L, 63L)), .Names = c("smp", 
"x", "y", "time", "dur"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17"))

Note: On the one hand I want to document actual blinks, on the other I want to preserve the uncertainty of measurement presented by the instrumentation in this case. Additionally, I would like a long_blinks column to check if the equipment has failed to pick up the end of one blink and the start of another due to low frame rate on the video capture. This could also imply that a testing subject merely closed their eyes for an extended period of time, but either case is implicated. I will post details for this second case.

As to a second case: the blinks would just be longer so the solution provided satisfies if you happen to have "integer" data.
My mistake for not supplying a reproducible data.frame.

Reproducible Data with numeric

structure(list(smp = 1:17, x = c(491, 491, 491, 491, 491, 
491, 491, 491, 491, 494, 492, 491, 496, 500, 544, 567, 
582), y = c(798, 798, 798, 798, 798, 798, 798, 798, 
798, 798, 794, 787, 625, 535, 488, 465, 453), time = c(62, 
62, 62, 62, 62, 62, 62, 62, 62, 781, 828, 953, 984, 
1046, 1109, 1171, 1234), dur = c(0, 0, 0, 0, 0, 0, 
0, 0, 0, 719, 47, 125, 31, 62, 63, 62, 63)), .Names = c("smp", 
"x", "y", "time", "dur"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17"))

So the mutate comparison blink = dur == 0L is just the wrong test as there are no integers.


Solution

  • With dplyr

    df %>% group_by(time) %>% mutate(blink = dur==0L & n() >= 4)
    #    smp   x   y time dur blink
    # 1    1 491 798   62   0  TRUE
    # 2    2 491 798   62   0  TRUE
    # 3    3 491 798   62   0  TRUE
    # 4    4 491 798   62   0  TRUE
    # 5    5 491 798   62   0  TRUE
    # 6    6 491 798   62   0  TRUE
    # 7    7 491 798   62   0  TRUE
    # 8    8 491 798   62   0  TRUE
    # 9    9 491 798   62   0  TRUE
    # 10  10 494 798  781 719 FALSE
    # 11  11 492 794  828  47 FALSE
    # 12  12 491 787  953 125 FALSE
    # 13  13 496 625  984  31 FALSE
    # 14  14 500 535 1046  62 FALSE
    # 15  15 544 488 1109  63 FALSE
    # 16  16 567 465 1171  62 FALSE
    # 17  17 582 453 1234  63 FALSE
    

    Date

    df <- read.table(text="smp    x   y  time dur
    1     1  491 798    62   0
    2     2  491 798    62   0
    3     3  491 798    62   0
    4     4  491 798    62   0
    5     5  491 798    62   0
    6     6  491 798    62   0
    7     7  491 798    62   0
    8     8  491 798    62   0
    9     9  491 798    62   0
    10   10  494 798   781 719
    11   11  492 794   828  47
    12   12  491 787   953 125
    13   13  496 625   984  31
    14   14  500 535  1046  62
    15   15  544 488  1109  63
    16   16  567 465  1171  62
    17   17  582 453  1234  63", header=T)