Search code examples
rmatrixtransformationadjacency-matrix

R: How do I transform a data frame into an adjacency matrix if values meet a certain condition?


I have a dataset that I would like to query and transform into an adjacency matrix using R.

An example data set would be as follows:

> track_df
     track  sound start   end
1  track1A    car  1000  2000
2  track1A person  1200  1500
3  track1A    car  1500  1700
4  track1A    dog  2300  3000
5  track1B    cat  5000  8000
6  track1B    car  5500  8500
7  track1B    car  7500 10000
8  track1B person  8000  9000
9  track1C    dog  1300  1600
10 track1C    car  1500  1800
11 track1C person  1700  2000

The example shows sounds recorded on a track with the start and end times of each sound. Tracks contain multiple sounds.

Code to produce example:

> track <- c('track1A', 'track1A', 'track1A', 'track1A', 'track1B', 'track1B', 'track1B', 'track1B', 'track1C', 'track1C', 'track1C')
> sound <- c('car', 'person', 'car', 'dog', 'cat', 'car', 'car', 'person', 'dog', 'car', 'person')
> start <- c(1000, 1200, 1500, 2300, 5000, 5500, 7500, 8000, 1300, 1500, 17000)
> end <- c(2000, 1500, 1700, 3000, 8000, 8500, 10000, 8000, 1300, 1500, 1700)
> end <- c(2000, 1500, 1700, 3000, 8000, 8500, 10000, 900, 1600, 1800, 2000)
> track_df <- data.frame(track, sound, start, end)

Using the dataset above I need to find the number of times two sounds overlap/intersect (based on their start and end times).

If a sound starts or finishes during another sound within a track it is considered overlapping.

The desired output would be something like this that I can turn into a heatmap or network.

> matrix
       car person dog cat
car      2      4   1   2
person   4      0   0   0
dog      1      0   0   0
cat      2      0   0   0

I am not sure what would be the best way to approach this or the best way to transform the initial dataset into something that can be easily iterated over and compared.

Perhaps I could use dplyr and group_by track and then summarise using a separate function to create the output matrix? I'm not sure I fully understand how summarise works and whether it would iterate over every combination of sounds within the track.

Any help would be appreciated.


Solution

  • I have only a non-vectorized solution to offer, which indeed essentially does iterate over every combination of sounds within the track.

    track <- c('track1A', 'track1A', 'track1A', 'track1A', 'track1B', 'track1B', 'track1B', 'track1B', 'track1C', 'track1C', 'track1C')
    sound <- c('car', 'person', 'car', 'dog', 'cat', 'car', 'car', 'person', 'dog', 'car', 'person')
    start <- c(1000, 1200, 1500, 2300, 5000, 5500, 7500, 8000, 1300, 1500, 1700)
    end <- c(2000, 1500, 1700, 3000, 8000, 8500, 10000, 9000, 1600, 1800, 2000)
    track_df <- data.frame(track, sound, start, end)
    names = levels(track_df$sound)
    m = matrix(0, length(names), length(names), F, list(names, names))
    for (track in split(track_df, track_df$track))
    {
        n = nrow(track)
        for (i in 1:(n-1)) for (j in (i+1):n)
            if (track[i,]$start < track[j,]$end)
            if (track[j,]$start < track[i,]$end)
                m[track[j,]$sound, track[i,]$sound] =
                m[track[i,]$sound, track[j,]$sound] =
                m[track[i,]$sound, track[j,]$sound] + 1
    }
    print(m)