Search code examples
rdplyrdata.table

Determine overlapping time windows


This question is a follow-up on this: Compute mutual gaze from individual gazes.

I'm working with gaze data in Q&A sequences in tradic conversation and want to know when exactly and for how long there is mutual gaze, that is, one person looking at one other person and that other person looking back. The answer to the previous question dealt beautifully with Q&A sequences in which there was only mutual gaze between the speaker and the listeners but it cannot handle a situation where there is also mutual gaze between the two listeners.

The kind of data I have is illustrated here:

   GazeID Sequ Utterance Q_by Answ_by Gaze_by Gaze_to        Role start end Gaze_pair
1:      1   55 where...?    A       B       A       B     Speaker   100 700        AB
2:      2   55 where...?    A       B       B       A    Answerer     0 200        AB
3:      3   55 where...?    A       B       B       C    Answerer   230 500        BC
4:      4   55 where...?    A       B       C       B NonAnswerer   120 620        BC
5:      5   55 where...?    A       B       C       A NonAnswerer   650 700        AC

Here, there is mutual gaze (i) between Speaker and Answerer but also between Answerer and NonAnswerer, such that the desired output is this:

   Sequ    Utterance   MG_start    MG_end    MG_dur   Gaze_pair
1: 55      where...?        100       200       100          AB
2: 55      where...?        230       500       270          BC

I tried to adapt the solution of the prior question, which was built on the data.table function foverlaps, to this scenario but received an error.

NB: There can be, in any one Q&A sequence, multiple mutual gazes and there can be mutual gaze between (i) Speaker_Answerer, (ii) Speaker-NonAnswerer, and (iii) Answerer-NonAnswerer.

Any help with this question is much appreciated.

Reproducible data:

gazes <- structure(list(GazeID = 1:5, Sequ = c(55, 55, 55, 55, 55), Utterance = c("where...?", 
"where...?", "where...?", "where...?", "where...?"), Q_by = c("A", 
"A", "A", "A", "A"), Answ_by = c("B", "B", "B", "B", "B"), Gaze_by = c("A", 
"B", "B", "C", "C"), Gaze_to = c("B", "A", "C", "B", "A"), Role = c("Speaker", 
"Answerer", "Answerer", "NonAnswerer", "NonAnswerer"), start = c(100L, 
0L, 230L, 120L, 650L), end = c(700L, 200L, 500L, 620L, 700L), 
    Gaze_pair = c("AB", "AB", "BC", "BC", "AC")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -5L))

Solution

  • One approach is to split your data into a list of data frames by gaze pair and calculate mutual gazes for each data frame in that list. This uses data.table::foverlaps() for the fast overlap join. As you prefer dplyr it's written in such a way that it plugs in to its grouping functions that take and return a data frame, specifically dplyr::group_modify().

    library(data.table)
    get_mutual_gazes_pair <- function(dt, grp) {
        # split into the pair involved
        person_list <- dt |>
            group_split(Gaze_by, Gaze_to)
    
        # if no pair, no overlaps
        if (length(person_list) == 1) {
            return(data.frame(NULL))
        }
    
        # get the overlaps
        dt1 <- setDT(person_list[[1]]) |> setkey(start, end)
        dt2 <- setDT(person_list[[2]]) |> setkey(start, end)
        overlaps_dt <- foverlaps(dt1, dt2, nomatch = NA)
        overlaps_dt <- overlaps_dt[complete.cases(overlaps_dt)]
    
        # get start, end and duration
        overlaps_dt |> reframe(
            MG_start = pmax(start, i.start),
            MG_end = pmin(end, i.end),
            MG_dur = MG_end - MG_start
        )
    }
    

    Then it's just a case of calculating all the mutual gaze pairs:

    library(dplyr)
    gazes |>
        group_by(Sequ, Utterance, Gaze_pair) |>
        group_modify(get_mutual_gazes_pair)
    
    #    Sequ Utterance Gaze_pair MG_start MG_end MG_dur
    #   <dbl> <chr>     <chr>        <int>  <int>  <int>
    # 1    55 where...? AB             100    200    100
    # 2    55 where...? BC             230    500    270
    

    A note on multiple gazes

    Note the use of pmin() and pmax() rather than min() and max(). In your example data, there is one mutual gaze per utterance. For example, A looks at B from 100 to 700 and B looks at A from 0 to 200, hence the overlap from 100 to 200. But if we imagine that person B looked at person A again from 300 to 400, we'd have two mutual gazes per utterance. This approach will work in that case:

    gazes2 <- gazes |>
        add_row(
            GazeID = 3, Sequ = 55, Utterance = "where...?", Q_by = "A",
            Answ_by = "A", Gaze_by = "B", Gaze_to = "A", Role = "Answerer", start = 300,
            end = 400, Gaze_pair = "AB"
        )
    gazes2 |>
        group_by(Utterance, Gaze_pair) |>
        group_modify(get_mutual_gazes_pair)
    
    #   Utterance Gaze_pair MG_start MG_end MG_dur
    #   <chr>     <chr>        <dbl>  <dbl>  <dbl>
    # 1 where...? AB             100    200    100
    # 2 where...? AB             300    400    100
    # 3 where...? BC             230    500    270
    

    Including sequences with no gazes

    In response to your comment asking how to ensure all Sequences are included, even if they have no mutual gazes, the function does not know whether each Sequ has previously had a mutual gaze. Rather than introducing state, I think the simplest approach is to join afterwards with any sequences that are missing. First let's add a new Sequ, 66, with no mutual gaze:

    gazes3 <- gazes |>
        add_row(
            GazeID = 3, Sequ = 66, Utterance = "where...?", Q_by = "A",
            Answ_by = "A", Gaze_by = "B", Gaze_to = "A", Role = "Answerer", start = 300,
            end = 450, Gaze_pair = "AB"
        )
    

    Then we can just check which Sequences are not included in the output and add them in, with all other values NA:

    gazes3 %>%
        group_by(Sequ, Utterance, Gaze_pair) %>%
        group_modify(get_mutual_gazes_pair) %>%
        # ^^ as before, but below is new
        bind_rows(
            anti_join(distinct(gazes3, Sequ), .)
        ) %>%
        arrange(Sequ)
    
    #    Sequ Utterance Gaze_pair MG_start MG_end MG_dur
    #   <dbl> <chr>     <chr>        <dbl>  <dbl>  <dbl>
    # 1    55 where...? AB             100    200    100
    # 2    55 where...? BC             230    500    270
    # 3    66 NA        NA              NA     NA     NA
    

    Note we need %>% for this rather than |> as we need to use the pipe placeholder . in a place where the base pipe equivalent _ would not be allowed. In any case, this join ensures that there will be a row for all values of Sequ.