This question is a follow-up on this: Compute mutual gaze from individual gazes.
I'm working with gaze data in Q&A sequences in tradic conversation and want to know when exactly and for how long there is mutual gaze, that is, one person looking at one other person and that other person looking back. The answer to the previous question dealt beautifully with Q&A sequences in which there was only mutual gaze between the speaker and the listeners but it cannot handle a situation where there is also mutual gaze between the two listeners.
The kind of data I have is illustrated here:
GazeID Sequ Utterance Q_by Answ_by Gaze_by Gaze_to Role start end Gaze_pair
1: 1 55 where...? A B A B Speaker 100 700 AB
2: 2 55 where...? A B B A Answerer 0 200 AB
3: 3 55 where...? A B B C Answerer 230 500 BC
4: 4 55 where...? A B C B NonAnswerer 120 620 BC
5: 5 55 where...? A B C A NonAnswerer 650 700 AC
Here, there is mutual gaze (i) between Speaker
and Answerer
but also between Answerer
and NonAnswerer
, such that the desired output is this:
Sequ Utterance MG_start MG_end MG_dur Gaze_pair
1: 55 where...? 100 200 100 AB
2: 55 where...? 230 500 270 BC
I tried to adapt the solution of the prior question, which was built on the data.table
function foverlaps
, to this scenario but received an error.
NB: There can be, in any one Q&A sequence, multiple mutual gazes and there can be mutual gaze between (i) Speaker_Answerer
, (ii) Speaker-NonAnswerer
, and (iii) Answerer-NonAnswerer
.
Any help with this question is much appreciated.
Reproducible data:
gazes <- structure(list(GazeID = 1:5, Sequ = c(55, 55, 55, 55, 55), Utterance = c("where...?",
"where...?", "where...?", "where...?", "where...?"), Q_by = c("A",
"A", "A", "A", "A"), Answ_by = c("B", "B", "B", "B", "B"), Gaze_by = c("A",
"B", "B", "C", "C"), Gaze_to = c("B", "A", "C", "B", "A"), Role = c("Speaker",
"Answerer", "Answerer", "NonAnswerer", "NonAnswerer"), start = c(100L,
0L, 230L, 120L, 650L), end = c(700L, 200L, 500L, 620L, 700L),
Gaze_pair = c("AB", "AB", "BC", "BC", "AC")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -5L))
One approach is to split your data into a list of data frames by gaze pair and calculate mutual gazes for each data frame in that list. This uses data.table::foverlaps()
for the fast overlap join. As you prefer dplyr
it's written in such a way that it plugs in to its grouping functions that take and return a data frame, specifically dplyr::group_modify()
.
library(data.table)
get_mutual_gazes_pair <- function(dt, grp) {
# split into the pair involved
person_list <- dt |>
group_split(Gaze_by, Gaze_to)
# if no pair, no overlaps
if (length(person_list) == 1) {
return(data.frame(NULL))
}
# get the overlaps
dt1 <- setDT(person_list[[1]]) |> setkey(start, end)
dt2 <- setDT(person_list[[2]]) |> setkey(start, end)
overlaps_dt <- foverlaps(dt1, dt2, nomatch = NA)
overlaps_dt <- overlaps_dt[complete.cases(overlaps_dt)]
# get start, end and duration
overlaps_dt |> reframe(
MG_start = pmax(start, i.start),
MG_end = pmin(end, i.end),
MG_dur = MG_end - MG_start
)
}
Then it's just a case of calculating all the mutual gaze pairs:
library(dplyr)
gazes |>
group_by(Sequ, Utterance, Gaze_pair) |>
group_modify(get_mutual_gazes_pair)
# Sequ Utterance Gaze_pair MG_start MG_end MG_dur
# <dbl> <chr> <chr> <int> <int> <int>
# 1 55 where...? AB 100 200 100
# 2 55 where...? BC 230 500 270
Note the use of pmin()
and pmax()
rather than min()
and max()
. In your example data, there is one mutual gaze per utterance. For example, A
looks at B
from 100
to 700
and B looks at A
from 0
to 200
, hence the overlap from 100
to 200
. But if we imagine that person B
looked at person A
again from 300
to 400
, we'd have two mutual gazes per utterance. This approach will work in that case:
gazes2 <- gazes |>
add_row(
GazeID = 3, Sequ = 55, Utterance = "where...?", Q_by = "A",
Answ_by = "A", Gaze_by = "B", Gaze_to = "A", Role = "Answerer", start = 300,
end = 400, Gaze_pair = "AB"
)
gazes2 |>
group_by(Utterance, Gaze_pair) |>
group_modify(get_mutual_gazes_pair)
# Utterance Gaze_pair MG_start MG_end MG_dur
# <chr> <chr> <dbl> <dbl> <dbl>
# 1 where...? AB 100 200 100
# 2 where...? AB 300 400 100
# 3 where...? BC 230 500 270
In response to your comment asking how to ensure all Sequ
ences are included, even if they have no mutual gazes, the function does not know whether each Sequ
has previously had a mutual gaze. Rather than introducing state, I think the simplest approach is to join afterwards with any sequences that are missing. First let's add a new Sequ
, 66
, with no mutual gaze:
gazes3 <- gazes |>
add_row(
GazeID = 3, Sequ = 66, Utterance = "where...?", Q_by = "A",
Answ_by = "A", Gaze_by = "B", Gaze_to = "A", Role = "Answerer", start = 300,
end = 450, Gaze_pair = "AB"
)
Then we can just check which Sequ
ences are not included in the output and add them in, with all other values NA
:
gazes3 %>%
group_by(Sequ, Utterance, Gaze_pair) %>%
group_modify(get_mutual_gazes_pair) %>%
# ^^ as before, but below is new
bind_rows(
anti_join(distinct(gazes3, Sequ), .)
) %>%
arrange(Sequ)
# Sequ Utterance Gaze_pair MG_start MG_end MG_dur
# <dbl> <chr> <chr> <dbl> <dbl> <dbl>
# 1 55 where...? AB 100 200 100
# 2 55 where...? BC 230 500 270
# 3 66 NA NA NA NA NA
Note we need %>%
for this rather than |>
as we need to use the pipe placeholder .
in a place where the base pipe equivalent _
would not be allowed. In any case, this join ensures that there will be a row for all values of Sequ
.