Search code examples
rdataframematchoffset

R: How to offset and match within a dataframe?


I would like to use something similar to the OFFSET and MATCH functions of Excel, here is an example data set: data=

Which Test?|Test1   |Test2  |Test3  |RESULT
Test1      |TRUE    |80%    |0      |
Test2      |FALSE   |25%    |0      |
Test1      |TRUE    |16%    |0      |
Test3      |FALSE   |12%    |1      |

Result column should read:

Which Test?|Test1   |Test2  |Test3  |RESULT
Test1      |TRUE    |80%    |0      |TRUE
Test2      |FALSE   |25%    |0      |25%
Test1      |TRUE    |16%    |0      |TRUE
Test3      |FALSE   |12%    |1      |1

In the final RESULT column I would like the test result of searching the Which test? column. In this example the RESULT column could return, for example, numbers or strings. In the Excel formula would be:

=OFFSET($A$1, ROW()-1,MATCH(A2,$B$1:$D$1,0))

I have tried to list the Tests using sapply so far and return this to another function such as which(colnames... and this is where I am stuck.


Solution

  • I'll go with sapply:

    data <- read.table(text="Which Test?|Test1   |Test2  |Test3  |RESULT
    Test1      |TRUE    |80%    |0      |
    Test2      |FALSE   |25%    |0      |
    Test1      |TRUE    |16%    |0      |
    Test3      |FALSE   |12%    |1      |", 
     header=T, 
     sep="|",
     stringsAsFactors=F,
     strip.white=T)
    
    data$RESULT <- sapply( 1:nrow(data), function(x) { data[x,data[x,1]] })
    

    For each row, get the target column data[x,1] (the inner access), and for this column get the row value data[x,...].

    Output:

    > data
      Which.Test. Test1 Test2 Test3 RESULT Result
    1       Test1  TRUE   80%     0     NA   TRUE
    2       Test2 FALSE   25%     0     NA    25%
    3       Test1  TRUE   16%     0     NA   TRUE
    4       Test3 FALSE   12%     1     NA      1
    

    With two vars the function in the sapply would be:

    function(x) {
     tcol <- data[x,1] # First column value of row x
     data[x,tcol]) # Get the value at row x and column tcol
    }
    

    An approach using Map/mapply would be to provide the 'i' (seq(nrow(data))), 'j' (match(data$Which.Test., names(data))) row/column index and use [ to extract the elements from the 'data'. We wrap with list so that the 'data' remains as a single data.frame and will recycle through the lengths of 'i', 'j'.

     mapply(`[`, list(data), seq(nrow(data)), match(data$Which.Test., names(data) ) )
     #[1] "TRUE" "25%"  "TRUE" "1"   
    

    Though, a possible vectorized approach would be just

    data[cbind(1:nrow(data), match(data$Which.Test., names(data)))]
    ## [1] " TRUE" "25%"   " TRUE" "1"  
    

    This is matching the values in Which.Test. against the column names of data and returning the index of the matched column. Then, we subset these columns per each row by combining it with 1:nrow(data) using cbind.

    More detailed explanation of @DavidArenburg solution above (as I had to spend some time to understand it fully):

    The subset operator accepts a matrix so we do:

    1. 1:nrow(data) easy it gives a vector [1] 1 2 3 4 corresponding to the number of rows in our dataset
    2. match(data$Which.Test., names(data))) giving the index of each matching test [1] 1 2 3 4
    3. cbind(..,..)bind our two preceding point to build a matrix:

           [,1] [,2]
      [1,]    1    2
      [2,]    2    3
      [3,]    3    2
      [4,]    4    4
      

      We see this matrix match for each row the column we wish to take the value of. So when giving this matrix as the selector of our dataset we get the correct results. Then we can assign it to a new variable or to a new column of the df.