Search code examples
rnearest-neighborpearson-correlation

Is there a way in R to create a matrix of nearest neighbours and variable values inR?


I have data which looks like this:

   identity  growth x-pos y-pos
1:     Z      0.1   0.5   0.7
2:     B      0.1   0.1   0.0
3:     C      0.2   4.6   2.5
4:     D      0.3   5.6   5.0
5:     A      0.4   0.2   1.0
6:     P      0.1   0.4   2.0

I would like to compare if growth values are correlated between n nearest neighbours for each object with a unique identity. So basically create a matrix which identifies the 5 nearest neighbours for each unique identity row based on the locations denoted by x-pos and y-pos and perform corelations between the growth value of object (e.g. Z) and the growth value of the 1st, 2nd, 3rd, 4th and 5th nearest neighbour of Z.

I tried making a euclidian matrix and then using a measure of autocorrelation using the ADE package but was wondering is there is an simpler way to construct such a matrix.


Solution

  • perform corelations between the growth value of object (e.g. Z) and the growth value of the 1st, 2nd, 3rd, 4th and 5th nearest neighbour of Z

    You can't compute a correlation between two points.

    The most similar things I can think of is computing the correlation between your points and their average neighbor, or do a pairwise test to compare them. But that would be for all "objects" together, not a correlation per object (since only 1 point per object).

    create a matrix which identifies the 5 nearest neighbours for each unique identity row based on the locations denoted by x-pos and y-pos

    # read in data
    df <- tribble(
      ~identity,  ~growth, ~`x-pos`, ~`y-pos`,
           "Z",      0.1,   0.5,   0.7,
           "B",      0.1,   0.1,   0.0,
           "C",      0.2,   4.6,   2.5,
           "D",      0.3,   5.6,   5.0,
           "A",      0.4,   0.2,   1.0,
           "P",      0.1,   0.4,   2.0)
    
    # here with 3 neighbors since we have only 6 points
    n_neighbors <- 3
    
    # make matrix of coordinates
    mat <- as.matrix(df[,3:4])
    rownames(mat) <- df$identity
    
    # compute [euclidian] distances
    dmat <- as.matrix(dist(mat))
    
    # find neighbors (by name)
    nei_mat <- apply(dmat, 1,
                     function(crow) {names(sort(crow))[seq_len(n_neighbors+1)]})[-1,]
    
    # match names to initial data frame to make matrix of growth
    ref_growth_mat <- matrix(df$growth, dimnames=list(df$identity))
    growth_mat <- matrix(ref_growth_mat[nei_mat,], nrow = n_neighbors)
    colnames(growth_mat) <- df$identity
    
    # done
    growth_mat
    #>        Z   B   C   D   A   P
    #> [1,] 0.4 0.1 0.3 0.2 0.1 0.4
    #> [2,] 0.1 0.4 0.1 0.1 0.1 0.1
    #> [3,] 0.1 0.1 0.1 0.1 0.1 0.1