Search code examples
rdataframefilterdatasetdata-cleaning

How to combine row and column name of df when value in target cell of df is not NA


I have a df like this:

df1 <- data.frame(c(3,NA,5), c(NA, NA, 3), c(3, 2, NA))
names(df1) <- c('number1', 'number2', 'number3')
rownames(df1) <- c('siteA', 'siteB', 'siteC')

I want to make a new df where row and column name are combined from the values that are not NA. The resulting df should look like this:

df2 <- data.frame('number' = c('number1', 'number1', 'number2', 'number3', 'number3'), 
                  'site' = c('siteA', 'siteC', 'siteC', 'siteA', 'siteB'))

How do I do this as efficiently as possible? (my real df is massive)


Solution

  • In a matrix or table each axis can have a name which is a character string labelling that axis. Convert df1 to a matrix m and add those.

    Then convert that to long form using as.data.frame.table and remove the rows with NA's. as.data.frame.table will add a third column named Freq which we don't need so select out the first 2 columns only and since the question put number first reverse the order of the two columns extracted.

     m <- as.matrix(df1)
     names(dimnames(m)) <- c("site", "number")
    
     m |>
      as.data.frame.table() |>
      na.omit() |>
      subset(select = 2:1)
    ##    number  site
    ## 1 number1 siteA
    ## 3 number1 siteC
    ## 6 number2 siteC
    ## 7 number3 siteA
    ## 8 number3 siteB
    

    m looks like this. Note the dimnames.

    m
    ##        number
    ## site    number1 number2 number3
    ##   siteA       3      NA       3
    ##   siteB      NA      NA       2
    ##   siteC       5       3      NA