Search code examples
rrow

Delete a row and recalculate R^2


I have coded the following in R: User chooses a file that contains 2 columns (V1 and V2), with numerous rows (number of rows will vary depending on input file) The script calculates the rsq of the relationship between 2 the variables. There can be anything from 10 to 1000 rows of data depending on the input file.

I want to code the following: The code should loop through all rows, removing/omitting/ignoring one row at a time and calculating the new rsq with this row missing. So, for example:

There are 10 rows of data and the total rsq = 0.97 Step1: The first row of data are removed from the equation The rsq is calculated again, but this time for 9 rows, giving rsq = 0.98.
Step 2:The 1st row is re-added and the 2nd row is removed rsq is calculated again Step 3: The second row is re-added and the 3rd row is removed rsq is calculated again

After each loop the "new rsq" will be placed in a new column next to the row that was removed.

Can anyone advise how to do this? I have this coded in excel and it works well but is cumbersome and therefore not ideal.


Solution

  • Do you want to do something like this?

    # Make some sample data
    set.seed(1095)
    data <- data.frame( V1 = 1:10 , V2 = sample.int(5 ,10 ,repl = TRUE ) )
    
    # Use sapply to get r2 removing each row at a time
    r2 <- sapply( 1:nrow(data) , function(x) ( cor( data[-x,1] , data[-x,2] ) )^2 )
    # Combine into a data frame
    newdata <- cbind( data , r2 )
    newdata
    #      V1 V2        r2
    #   1   1  5 0.2526316
    #   2   2  3 0.4657601
    #   3   3  5 0.3204721
    #   4   4  5 0.3691612
    #   5   5  1 0.5405405
    #   6   6  3 0.3769480
    #   7   7  3 0.3840426
    #   8   8  2 0.3409425
    #   9   9  1 0.2725806
    #   10 10  3 0.4986702