Search code examples
rdplyrplyr

Most efficient way to write a mapping function given a large CSV of recoded data


Imagine I have a dataframe that is loaded from a large csv someone has given me, containing a mapping/recode of data that I want to apply to other datasets. Here's a small reproducible example of what might be in the csv:

library(wakefield)
csv_mapping <- data.frame(
  from = as.character(name(30)),
  to = as.character(likert_7(30))  
)

What is the quickest way to create a mapping function from this dataframe in a way that is independent of the csv data source? I would usually do it by running:

dput(csv_mapping$from)
dput(csv_mapping$to)

in my console and then I would copy and paste the vectors into a function and use plyr::mapvalues() as follows:

mapping_fn <- function(x) {

  fromvec <- c("Kameira", "Sanavi", "Avangelene", "Maryonna", "Wyvonna", "Enam", 
               "Yain", "Tyonna", "Shekira", "Eleanna", "Azriela", "Saajida", 
               "Chantee", "Julieanne", "Genisha", "Delesha", "Macenzi", "Alyasia", 
               "Latonga", "Josuhe", "Arter", "Stone", "Ramaj", "Lilinoe", "Zacharie", 
               "Joshuamichael", "Desseray", "Colorado", "Jaidn", "Verline")

  tovec <- c("Agree", "Somewhat Disagree", "Agree", "Agree", "Neutral", 
          "Somewhat Disagree", "Neutral", "Strongly Agree", "Somewhat Disagree", 
          "Disagree", "Strongly Disagree", "Disagree", "Somewhat Agree", 
          "Strongly Disagree", "Strongly Disagree", "Somewhat Agree", "Strongly Agree", 
          "Somewhat Agree", "Disagree", "Disagree", "Strongly Agree", "Strongly Disagree", 
          "Disagree", "Somewhat Agree", "Strongly Disagree", "Strongly Disagree", 
          "Neutral", "Somewhat Agree", "Agree", "Disagree")

  plyr::mapvalues(x, from = fromvec, to = tovec, warn_missing = F)

}

Is there a cleverer or quicker way to do this without using mapvalues, given that plyr is considered retired now?


Solution

  • A very simple solution using recode from dplyr package

    level_key <- setNames(csv_mapping$to, csv_mapping$from)
    dplyr::recode(csv_mapping$from, !!!level_key)
    

    Basically we create the named vector level_key that contains the key-value pairs, and afterwards we use unquote splicing inside the recode function.


    Example

    library(wakefield)
    set.seed(42)
    csv_mapping <- data.frame(
      from = as.character(name(5)),
      to = as.character(likert_7(5))  
    )
    csv_mapping
    
    #       from                to
    # 1 Merrissa Strongly Disagree
    # 2  Lilbert           Neutral
    # 3  Rudelle    Strongly Agree
    # 4  Kaymani Somewhat Disagree
    # 5   Kenadi          Disagree
    
    level_key <- setNames(csv_mapping$to, csv_mapping$from)
    dplyr::recode(csv_mapping$from, !!!level_key)
    # [1] "Strongly Disagree" "Neutral"           "Strongly Agree"    "Somewhat Disagree" "Disagree"