I have a dataset that I would like to query and transform into an adjacency matrix using R.
An example data set would be as follows:
> track_df
track sound start end
1 track1A car 1000 2000
2 track1A person 1200 1500
3 track1A car 1500 1700
4 track1A dog 2300 3000
5 track1B cat 5000 8000
6 track1B car 5500 8500
7 track1B car 7500 10000
8 track1B person 8000 9000
9 track1C dog 1300 1600
10 track1C car 1500 1800
11 track1C person 1700 2000
The example shows sounds recorded on a track with the start and end times of each sound. Tracks contain multiple sounds.
Code to produce example:
> track <- c('track1A', 'track1A', 'track1A', 'track1A', 'track1B', 'track1B', 'track1B', 'track1B', 'track1C', 'track1C', 'track1C')
> sound <- c('car', 'person', 'car', 'dog', 'cat', 'car', 'car', 'person', 'dog', 'car', 'person')
> start <- c(1000, 1200, 1500, 2300, 5000, 5500, 7500, 8000, 1300, 1500, 17000)
> end <- c(2000, 1500, 1700, 3000, 8000, 8500, 10000, 8000, 1300, 1500, 1700)
> end <- c(2000, 1500, 1700, 3000, 8000, 8500, 10000, 900, 1600, 1800, 2000)
> track_df <- data.frame(track, sound, start, end)
Using the dataset above I need to find the number of times two sounds overlap/intersect (based on their start and end times).
If a sound starts or finishes during another sound within a track it is considered overlapping.
The desired output would be something like this that I can turn into a heatmap or network.
> matrix
car person dog cat
car 2 4 1 2
person 4 0 0 0
dog 1 0 0 0
cat 2 0 0 0
I am not sure what would be the best way to approach this or the best way to transform the initial dataset into something that can be easily iterated over and compared.
Perhaps I could use dplyr
and group_by
track and then summarise
using a separate function to create the output matrix? I'm not sure I fully understand how summarise
works and whether it would iterate over every combination of sounds within the track.
Any help would be appreciated.
I have only a non-vectorized solution to offer, which indeed essentially does iterate over every combination of sounds within the track.
track <- c('track1A', 'track1A', 'track1A', 'track1A', 'track1B', 'track1B', 'track1B', 'track1B', 'track1C', 'track1C', 'track1C')
sound <- c('car', 'person', 'car', 'dog', 'cat', 'car', 'car', 'person', 'dog', 'car', 'person')
start <- c(1000, 1200, 1500, 2300, 5000, 5500, 7500, 8000, 1300, 1500, 1700)
end <- c(2000, 1500, 1700, 3000, 8000, 8500, 10000, 9000, 1600, 1800, 2000)
track_df <- data.frame(track, sound, start, end)
names = levels(track_df$sound)
m = matrix(0, length(names), length(names), F, list(names, names))
for (track in split(track_df, track_df$track))
{
n = nrow(track)
for (i in 1:(n-1)) for (j in (i+1):n)
if (track[i,]$start < track[j,]$end)
if (track[j,]$start < track[i,]$end)
m[track[j,]$sound, track[i,]$sound] =
m[track[i,]$sound, track[j,]$sound] =
m[track[i,]$sound, track[j,]$sound] + 1
}
print(m)