I have a df like this:
df1 <- data.frame(c(3,NA,5), c(NA, NA, 3), c(3, 2, NA))
names(df1) <- c('number1', 'number2', 'number3')
rownames(df1) <- c('siteA', 'siteB', 'siteC')
I want to make a new df where row and column name are combined from the values that are not NA. The resulting df should look like this:
df2 <- data.frame('number' = c('number1', 'number1', 'number2', 'number3', 'number3'),
'site' = c('siteA', 'siteC', 'siteC', 'siteA', 'siteB'))
How do I do this as efficiently as possible? (my real df is massive)
In a matrix or table each axis can have a name which is a character string labelling that axis. Convert df1 to a matrix m and add those.
Then convert that to long form using as.data.frame.table
and remove the rows with NA's. as.data.frame.table
will add a third column named Freq
which we don't need so select out the first 2 columns only and since the question put number first reverse the order of the two columns extracted.
m <- as.matrix(df1)
names(dimnames(m)) <- c("site", "number")
m |>
as.data.frame.table() |>
na.omit() |>
subset(select = 2:1)
## number site
## 1 number1 siteA
## 3 number1 siteC
## 6 number2 siteC
## 7 number3 siteA
## 8 number3 siteB
m
looks like this. Note the dimnames.
m
## number
## site number1 number2 number3
## siteA 3 NA 3
## siteB NA NA 2
## siteC 5 3 NA