Search code examples
rdplyr

Compute mutual gaze from individual gazes


I work with gaze data in three party conversation and want to compute phases of mutual gaze. I know where each speaker (labelled A, B, or C) is looking at any time and when these gazes start and end. The data looks roughly like this (see below for reproducible data):

# A tibble: 10 × 5
   Utterance                  Role      Gaze_pair start   end
   <chr>                      <chr>     <chr>     <int> <int>
 1 what's steam punk exactly? Speaker   AC          299   700
 2 what's steam punk exactly? Recipient AC            0   601
 3 what's steam punk exactly? Recipient BC            0   355
 4 what's steam punk exactly? Recipient BC          411   700
 5 where are you guys from?   Speaker   AC           10   109
 6 where are you guys from?   Speaker   AC          678   750
 7 where are you guys from?   Recipient AC           50   900
 8 where are you guys from?   Speaker   BC          509   568
 9 where are you guys from?   Speaker   BC          800   900
10 where are you guys from?   Recipient BC            0   900

For example, in the first Utterance "what's steam punk exactly?", the Speaker (A) gazes at Recipient (C) from 299 until 700, and that Recipientgazes back at the Speaker from 0 to 601, which results in a mutual gaze from 299 to 601.

The desired output is something like this:

# A tibble: 10 × 5
   Utterance  Role      Gaze_pair start   end  MutG_start  MutG_end
   <chr>      <chr>     <chr>     <int> <int>
 1 what's...? Speaker   AC          299   700  299         601
 3 what's...? Recipient BC            0   355  0           0

 5 where...?  Speaker   AC           10   109  50          109
 6 where...?  Speaker   AC          678   750 678          750
 8 where...?  Speaker   BC          509   568 509          568
 9 where...?  Speaker   BC          800   900 800          900
      

How can that be computed? (note that there can be more than one mutual gaze between any two speakers during any one Utterance)

Reproducible data:

structure(list(Utterance = c("what's steam punk exactly?", "what's steam punk exactly?", 
"what's steam punk exactly?", "what's steam punk exactly?", "where are you guys from?", 
"where are you guys from?", "where are you guys from?", "where are you guys from?", 
"where are you guys from?", "where are you guys from?"), Role = c("Speaker", 
"Recipient", "Recipient", "Recipient", "Speaker", "Speaker", 
"Recipient", "Speaker", "Speaker", "Recipient"), Gaze_pair = c("AC", 
"AC", "BC", "BC", "AC", "AC", "AC", "BC", "BC", "BC"), start = c(299L, 
0L, 0L, 411L, 10L, 678L, 50L, 509L, 800L, 0L), end = c(700L, 
601L, 355L, 700L, 109L, 750L, 900L, 568L, 900L, 900L)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

Solution

  • Here is an approach using data.table::foverlaps(), a fast binary-search based overlap join of two data.tables.

    First a function to join each group:

    library(data.table)
    get_mutual_gazes <- function(speaker_dt, recipient_dt) {
        # set keys and do overlapping join
        setkey(setDT(speaker_dt), start, end) # no need for setDT()...
        setkey(setDT(recipient_dt), start, end) # if calling from data.table
        overlaps_dt <- foverlaps(
            speaker_dt,
            recipient_dt,
            nomatch = NA_integer_
        )
    
        # return 0 and start/end time if no overlaps
        if (nrow(overlaps_dt) == 0) {
            return(data.table(
                start = min(speaker_dt$start, recipient_dt$start, na.rm = TRUE),
                end = min(speaker_dt$end, recipient_dt$end, na.rm = TRUE),
                Mutual_G_start = 0L,
                Mutual_G_end = 0L
            ))
        }
    
        # get data into the right shape
        overlaps_dt[, Mutual_G_start := pmax(start, i.start)]
        overlaps_dt[, Mutual_G_end := pmin(end, i.end)]
        overlaps_dt[, .(start = i.start, end = i.end, Mutual_G_start, Mutual_G_end)]
    }
    

    Then simply call it by group:

    data.table approach

    setDT(gazes)
    gazes[,
        get_mutual_gazes(
            .SD[Role == "Speaker"],
            .SD[Role == "Recipient"]
        ),
        by = .(Utterance, Gaze_pair),
        .SDcols = c("start", "end")
    ]
    
    #                     Utterance Gaze_pair start   end Mutual_G_start Mutual_G_end
    #                        <char>    <char> <int> <int>          <int>        <int>
    # 1: what's steam punk exactly?        AC   299   700            299          601
    # 2: what's steam punk exactly?        BC     0   355              0            0
    # 3:   where are you guys from?        AC    10   109             50          109
    # 4:   where are you guys from?        AC   678   750            678          750
    # 5:   where are you guys from?        BC   509   568            509          568
    # 6:   where are you guys from?        BC   800   900            800          900
    

    dplyr approach

    As you prefer dplyr, I think this is a good time for group_modify():

    Use group_modify() when summarize() is too limited, in terms of what you need to do and return for each group. group_modify() is good for "data frame in, data frame out".

    library(dplyr)
    gazes  |>
        group_by(Utterance, Gaze_pair)   |>
        group_modify( 
            ~ get_mutual_gazes(
                filter(.x, Role == "Speaker"),
                filter(.x, Role == "Recipient")
            )
        )
    
    # # A tibble: 6 × 6
    # # Groups:   Utterance, Gaze_pair [4]
    #   Utterance                  Gaze_pair start   end Mutual_G_start Mutual_G_end
    #   <chr>                      <chr>     <int> <int>          <int>        <int>
    # 1 what's steam punk exactly? AC          299   700            299          601
    # 2 what's steam punk exactly? BC            0   355              0            0
    # 3 where are you guys from?   AC           10   109             50          109
    # 4 where are you guys from?   AC          678   750            678          750
    # 5 where are you guys from?   BC          509   568            509          568
    # 6 where are you guys from?   BC          800   900            800          900