Search code examples
rrowsum

Choose rows in which the absolute value of subtraction is less a specified value


Let's say I have this dataframe:

ID   X1           X2
1    1            2
2    2            1
3    3            1
4    4            1
5    5            5
6    6            20
7    7            20
8    9            20
9    10           20


dataset <- structure(list(ID = 1:9, X1 = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 9L, 
                                           10L), X2 = c(2L, 1L, 1L, 1L, 5L, 20L, 20L, 20L, 20L)),
                     class = "data.frame", row.names = c(NA, 
                                                         -9L))

And I want to select rows in which the absolute value of the subtraction of rows are more or equal to 2 (based on columns X1 and X2).

For example, row 4 value is 4-1, which is 3 and should be selected.

Row 9 value is 10-20, which is -10. Absolute value is 10 and should be selected.

In this case it would be rows 3, 4, 6, 7, 8 and 9

I tried:

dataset2 = dataset[,abs(dataset- c(dataset[,2])) > 2] 

But I get an error.

The operation:

abs(dataset- c(dataset[,2])) > 2

Does give me rows that the sum are more than 2, but the result only works for my second column and does not select properly


Solution

  • We can get the difference between the 'X1', 'X2' columns, create a logical expression in subset to subset the rows

    subset(dataset, abs(X1 - X2) >= 2)
    #  ID X1 X2
    #3  3  3  1
    #4  4  4  1
    #6  6  6 20
    #7  7  7 20
    #8  8  9 20
    #9  9 10 20
    

    Or using index

    subset(dataset, abs(dataset[[2]] - dataset[[3]]) >= 2)
    

    data

    dataset <- structure(list(ID = 1:9, X1 = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 9L, 
    10L), X2 = c(2L, 1L, 1L, 1L, 5L, 20L, 20L, 20L, 20L)),
    class = "data.frame", row.names = c(NA, 
    -9L))