Search code examples
rdplyrmatchrbindmutate

How to match column names across two dataframes and add a new variable?


data <- rbind(c(1,2,3,4,5,6),
              c(1,2,3,4,5,6),
              c(1,2,3,4,5,6))

colnames(data) <- c("A", "B", "C", "D", "E", "F")

category_vector <- cbind(c("A", "B", "C", "D", "E"),
                         c("Cat1", "Cat2", "Cat3", "Cat2", "Cat5"))
colnames(category_vector) <- c("Name", "Category")

My goal is to match column names in data, to the names in category_vector and then assign each column of data the corresponding category name in Row 1.

My desired output is as follows:

output <- rbind(c("Cat1", "Cat2", "Cat3", "Cat2", "Cat5", "NA"),
                c(1,2,3,4,5,6),
                c(1,2,3,4,5,6),
                c(1,2,3,4,5,6))
colnames(output) <- colnames(data)

I know, that I can use match(names(x), names(y)) to show the intersect, but I am not sure how to get to the desired output.


Solution

  • Here is one example:

    library(dplyr)
    
    data <- rbind(c(1,2,3,4,5,6),
                 c(1,2,3,4,5,6),
                 c(1,2,3,4,5,6))
    
    colnames(data) <- c("A", "B", "C", "D", "E", "F")
    
    category_vector <- cbind(c("A", "B", "C", "D", "E"),
                             c("Cat1", "Cat2", "Cat3", "Cat2", "Cat5"))
    
    #colnames(category_vector) <- c("Name", "Category")
    
    # turn everything in data into a character
    data <- data |> 
        as.data.frame() |> 
        mutate(across(everything(), as.character))
    
    category_vector <- as.data.frame(t(category_vector))
    
    names(category_vector) <- category_vector[1, ]
    
    category_vector <- category_vector[-1, ]
    
    category_vector |> 
        bind_rows(data)
    
         A    B    C    D    E    F
    1 Cat1 Cat2 Cat3 Cat2 Cat5 <NA>
    2    1    2    3    4    5    6
    3    1    2    3    4    5    6
    4    1    2    3    4    5    6