Search code examples
rdataframesubsetdistancesymmetric

Subset a distance matrix in R by values


I have a very large distance matrix (3678 x 3678) currently encoded as a data frame. Columns are named "1", "2", "3" and so on, the same for rows. So what I need to do is to find values <26 and different from 0 and to have the results in a second dataframe with two columns: the first one with index and the second one with the value. For example:

            value
318-516   22.70601
... 

where 318 is the row index and 516 is the column index.


Solution

  • Ok, I'm trying to recreate your situation (note: if you can, it's always helpful to include a few lines of your data with a dput command).

    You should be able to use filter and some simple tidyverse commands (if you don't know how they work, run them line by line, always selecting commands up to the %>% to check what they are doing):

    library(tidyverse)
    library(tidylog) # gives you additional output on what each command does
    # Creating some data that looks similar
    data <- matrix(rnorm(25,mean = 26),ncol=5)
    data <- as_tibble(data)
    data <- setNames(data,c(1:5))
    
    data %>% 
      mutate(row = row_number()) %>% 
      pivot_longer(-row, names_to = "column",values_to = "values", names_prefix = "V") %>% 
      # depending on how your column names look like, you might need to use a separate() command first
      filter(values > 0 & values < 26) %>% 
      
      # if you want you can create an index column as well
      mutate(index = paste0(row,"-",column)) %>% 
      
      # then you can get rid of the row and column
      select(-row,-column) %>% 
      # move index to the front
      relocate(index)