I work with gaze data in three party conversation and want to compute phases of mutual gaze. I know where each speaker (labelled A, B, or C) is looking at any time and when these gazes start and end. The data looks roughly like this (see below for reproducible data):
# A tibble: 10 × 5
Utterance Role Gaze_pair start end
<chr> <chr> <chr> <int> <int>
1 what's steam punk exactly? Speaker AC 299 700
2 what's steam punk exactly? Recipient AC 0 601
3 what's steam punk exactly? Recipient BC 0 355
4 what's steam punk exactly? Recipient BC 411 700
5 where are you guys from? Speaker AC 10 109
6 where are you guys from? Speaker AC 678 750
7 where are you guys from? Recipient AC 50 900
8 where are you guys from? Speaker BC 509 568
9 where are you guys from? Speaker BC 800 900
10 where are you guys from? Recipient BC 0 900
For example, in the first Utterance
"what's steam punk exactly?", the Speaker
(A) gazes at Recipient
(C) from 299 until 700, and that Recipient
gazes back at the Speaker
from 0 to 601, which results in a mutual gaze from 299 to 601.
The desired output is something like this:
# A tibble: 10 × 5
Utterance Role Gaze_pair start end MutG_start MutG_end
<chr> <chr> <chr> <int> <int>
1 what's...? Speaker AC 299 700 299 601
3 what's...? Recipient BC 0 355 0 0
5 where...? Speaker AC 10 109 50 109
6 where...? Speaker AC 678 750 678 750
8 where...? Speaker BC 509 568 509 568
9 where...? Speaker BC 800 900 800 900
How can that be computed? (note that there can be more than one mutual gaze between any two speakers during any one Utterance
)
Reproducible data:
structure(list(Utterance = c("what's steam punk exactly?", "what's steam punk exactly?",
"what's steam punk exactly?", "what's steam punk exactly?", "where are you guys from?",
"where are you guys from?", "where are you guys from?", "where are you guys from?",
"where are you guys from?", "where are you guys from?"), Role = c("Speaker",
"Recipient", "Recipient", "Recipient", "Speaker", "Speaker",
"Recipient", "Speaker", "Speaker", "Recipient"), Gaze_pair = c("AC",
"AC", "BC", "BC", "AC", "AC", "AC", "BC", "BC", "BC"), start = c(299L,
0L, 0L, 411L, 10L, 678L, 50L, 509L, 800L, 0L), end = c(700L,
601L, 355L, 700L, 109L, 750L, 900L, 568L, 900L, 900L)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
Here is an approach using data.table::foverlaps()
, a fast binary-search based overlap join of two data.tables
.
First, define a function that takes a data frame of the speaker of an utterance and a corresponding one of the recipient, and returns a data frame of mutual gazes:
library(data.table)
get_mutual_gazes <- function(speaker_dt, recipient_dt) {
# set keys and do overlapping join
setkey(setDT(speaker_dt), start, end) # no need for setDT()...
setkey(setDT(recipient_dt), start, end) # if calling from data.table
overlaps_dt <- foverlaps(
speaker_dt,
recipient_dt,
nomatch = NA_integer_
)
# return 0 and start/end time if no overlaps
if (nrow(overlaps_dt) == 0) {
return(data.table(
start = min(speaker_dt$start, recipient_dt$start, na.rm = TRUE),
end = min(speaker_dt$end, recipient_dt$end, na.rm = TRUE),
Mutual_G_start = 0L,
Mutual_G_end = 0L
))
}
# get data into the right shape
overlaps_dt[, Mutual_G_start := pmax(start, i.start)]
overlaps_dt[, Mutual_G_end := pmin(end, i.end)]
overlaps_dt[, .(start = i.start, end = i.end, Mutual_G_start, Mutual_G_end)]
}
Then apply this function to each group of utterances and pairs:
data.table
approachsetDT(gazes)
gazes[,
get_mutual_gazes(
.SD[Role == "Speaker"],
.SD[Role == "Recipient"]
),
by = .(Utterance, Gaze_pair),
.SDcols = c("start", "end")
]
# Utterance Gaze_pair start end Mutual_G_start Mutual_G_end
# <char> <char> <int> <int> <int> <int>
# 1: what's steam punk exactly? AC 299 700 299 601
# 2: what's steam punk exactly? BC 0 355 0 0
# 3: where are you guys from? AC 10 109 50 109
# 4: where are you guys from? AC 678 750 678 750
# 5: where are you guys from? BC 509 568 509 568
# 6: where are you guys from? BC 800 900 800 900
dplyr
approachAs you prefer dplyr
, I think this is a good time for group_modify()
:
Use
group_modify()
whensummarize()
is too limited, in terms of what you need to do and return for each group.group_modify()
is good for "data frame in, data frame out".
library(dplyr)
gazes |>
group_by(Utterance, Gaze_pair) |>
group_modify(
~ get_mutual_gazes(
filter(.x, Role == "Speaker"),
filter(.x, Role == "Recipient")
)
)
# # A tibble: 6 × 6
# # Groups: Utterance, Gaze_pair [4]
# Utterance Gaze_pair start end Mutual_G_start Mutual_G_end
# <chr> <chr> <int> <int> <int> <int>
# 1 what's steam punk exactly? AC 299 700 299 601
# 2 what's steam punk exactly? BC 0 355 0 0
# 3 where are you guys from? AC 10 109 50 109
# 4 where are you guys from? AC 678 750 678 750
# 5 where are you guys from? BC 509 568 509 568
# 6 where are you guys from? BC 800 900 800 900