Compute mutual gaze from individual gazes

I work with gaze data in three party conversation and want to compute phases of mutual gaze. I know where each speaker (labelled A, B, or C) is looking at any time and when these gazes start and end. The data looks roughly like this (see below for reproducible data):

# A tibble: 10 × 5
   Utterance                  Role      Gaze_pair start   end
   <chr>                      <chr>     <chr>     <int> <int>
 1 what's steam punk exactly? Speaker   AC          299   700
 2 what's steam punk exactly? Recipient AC            0   601
 3 what's steam punk exactly? Recipient BC            0   355
 4 what's steam punk exactly? Recipient BC          411   700
 5 where are you guys from?   Speaker   AC           10   109
 6 where are you guys from?   Speaker   AC          678   750
 7 where are you guys from?   Recipient AC           50   900
 8 where are you guys from?   Speaker   BC          509   568
 9 where are you guys from?   Speaker   BC          800   900
10 where are you guys from?   Recipient BC            0   900

For example, in the first Utterance "what's steam punk exactly?", the Speaker (A) gazes at Recipient (C) from 299 until 700, and that Recipientgazes back at the Speaker from 0 to 601, which results in a mutual gaze from 299 to 601.

The desired output is something like this:

# A tibble: 10 × 5
   Utterance  Role      Gaze_pair start   end  MutG_start  MutG_end
   <chr>      <chr>     <chr>     <int> <int>
 1 what's...? Speaker   AC          299   700  299         601
 3 what's...? Recipient BC            0   355  0           0

 5 where...?  Speaker   AC           10   109  50          109
 6 where...?  Speaker   AC          678   750 678          750
 8 where...?  Speaker   BC          509   568 509          568
 9 where...?  Speaker   BC          800   900 800          900

How can that be computed? (note that there can be more than one mutual gaze between any two speakers during any one Utterance)

Reproducible data:

structure(list(Utterance = c("what's steam punk exactly?", "what's steam punk exactly?", 
"what's steam punk exactly?", "what's steam punk exactly?", "where are you guys from?", 
"where are you guys from?", "where are you guys from?", "where are you guys from?", 
"where are you guys from?", "where are you guys from?"), Role = c("Speaker", 
"Recipient", "Recipient", "Recipient", "Speaker", "Speaker", 
"Recipient", "Speaker", "Speaker", "Recipient"), Gaze_pair = c("AC", 
"AC", "BC", "BC", "AC", "AC", "AC", "BC", "BC", "BC"), start = c(299L, 
0L, 0L, 411L, 10L, 678L, 50L, 509L, 800L, 0L), end = c(700L, 
601L, 355L, 700L, 109L, 750L, 900L, 568L, 900L, 900L)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

Solution

Here is an approach using data.table::foverlaps(), a fast binary-search based overlap join of two data.tables.

First, define a function that takes a data frame of the speaker of an utterance and a corresponding one of the recipient, and returns a data frame of mutual gazes:

library(data.table)
get_mutual_gazes <- function(speaker_dt, recipient_dt) {
    # set keys and do overlapping join
    setkey(setDT(speaker_dt), start, end) # no need for setDT()...
    setkey(setDT(recipient_dt), start, end) # if calling from data.table
    overlaps_dt <- foverlaps(
        speaker_dt,
        recipient_dt,
        nomatch = NA_integer_
    )

    # return 0 and start/end time if no overlaps
    if (nrow(overlaps_dt) == 0) {
        return(data.table(
            start = min(speaker_dt$start, recipient_dt$start, na.rm = TRUE),
            end = min(speaker_dt$end, recipient_dt$end, na.rm = TRUE),
            Mutual_G_start = 0L,
            Mutual_G_end = 0L
        ))
    }

    # get data into the right shape
    overlaps_dt[, Mutual_G_start := pmax(start, i.start)]
    overlaps_dt[, Mutual_G_end := pmin(end, i.end)]
    overlaps_dt[, .(start = i.start, end = i.end, Mutual_G_start, Mutual_G_end)]
}

Then apply this function to each group of utterances and pairs:

`data.table` approach

setDT(gazes)
gazes[,
    get_mutual_gazes(
        .SD[Role == "Speaker"],
        .SD[Role == "Recipient"]
    ),
    by = .(Utterance, Gaze_pair),
    .SDcols = c("start", "end")
]

#                     Utterance Gaze_pair start   end Mutual_G_start Mutual_G_end
#                        <char>    <char> <int> <int>          <int>        <int>
# 1: what's steam punk exactly?        AC   299   700            299          601
# 2: what's steam punk exactly?        BC     0   355              0            0
# 3:   where are you guys from?        AC    10   109             50          109
# 4:   where are you guys from?        AC   678   750            678          750
# 5:   where are you guys from?        BC   509   568            509          568
# 6:   where are you guys from?        BC   800   900            800          900

`dplyr` approach

As you prefer dplyr, I think this is a good time for group_modify():

Use group_modify() when summarize() is too limited, in terms of what you need to do and return for each group. group_modify() is good for "data frame in, data frame out".

library(dplyr)
gazes  |>
    group_by(Utterance, Gaze_pair)   |>
    group_modify( 
        ~ get_mutual_gazes(
            filter(.x, Role == "Speaker"),
            filter(.x, Role == "Recipient")
        )
    )

# # A tibble: 6 × 6
# # Groups:   Utterance, Gaze_pair [4]
#   Utterance                  Gaze_pair start   end Mutual_G_start Mutual_G_end
#   <chr>                      <chr>     <int> <int>          <int>        <int>
# 1 what's steam punk exactly? AC          299   700            299          601
# 2 what's steam punk exactly? BC            0   355              0            0
# 3 where are you guys from?   AC           10   109             50          109
# 4 where are you guys from?   AC          678   750            678          750
# 5 where are you guys from?   BC          509   568            509          568
# 6 where are you guys from?   BC          800   900            800          900

Compute mutual gaze from individual gazes

data.table approach

dplyr approach

`data.table` approach

`dplyr` approach