Search code examples
rjoinz-indexlookup

In R I have a column that contains the index (or column name) I want to pull from in a new column. How do I do this?


I basically want to do a lookup or similar to an INDEX/MATCH in Excel but in R. My data contains a column which refers to the row number I want from the lookup table and a column which refers to the column number that I want from the lookup table. Here's an example of my data:

ID Desired Row Desired Column
1 4 1
2 2 3
3 5 2

and then the lookup table goes:

name city food
john boston pizza
mike new york pasta
sarah chicago hoagie
sam los angeles sushi
kelly dallas ice cream

I want to create a new column that returns row 4, col 1 = sam; row 2 col 3 = pasta; row 5 col 2 = dallas.

If it would be easier I also have a field that contains that exact column name that I want to look in and have the lookup table joined in. so for example

ID Desired Row Desired Column Desired Output
1 this row name john
2 this row food pasta
3 this row city chicago

I've tried using df[row,column] where I've used variables as the row and column and instead of giving me one new column it gives me a column for every possible outcome

I've also tried using paste() to reference a specific column using text, and adding a variable at the end which would get me to the correct column (makes sense in my exact context where the columns names are numbered).


Solution

  • There may be more elegant/faster ways, but in base R, you could index using sapply or a for loop. I am also including an approach using these data based off of @Onyambu's comment (which is more elegant).

    If you have these data:

    lu <- data.frame(id = 1:3, 
                     row = c(4,2,5), 
                     col = c(1,3,2))
    
    df <- read.table(text = "name   city    food
    john    boston  pizza
    mike    new_york    pasta
    sarah   chicago hoagie
    sam los_angeles sushi
    kelly   dallas  ice_cream", h = TRUE)
    

    You can do:

    want <- data.frame(id = lu$id)
    
    for(i in seq_len(nrow(want))){
      want[i,"desired_value"] <- df[lu$row[i], lu$col[i]]
    }
    
    # or
    
    want$desired_value <- sapply(seq_len(nrow(want)), function(i) df[lu$row[i], lu$col[i]])
    
    # Or per Onyambu's comment:
    want$desired_value <- df[as.matrix(lu[,c("row", "col")])]
    

    All return the following output:

    #> want
    #  id desired_value
    #1  1           sam
    #2  2         pasta
    #3  3        dallas