Search code examples
rdataframedplyrsubset

Complementary subset of a dataframe regarding columns


I want to create a dataframe that takes the whole dataframe (here called example) except the columns already taken in the dataframe ( here called data1).

I am looking for a funtion that takes as output the whole dataframe and the first subset and returns the second dataframe whish is the complementary subset of the firts one.

Here is an example :

set.seed(10022023)

A = runif(5, min = 0, max = 50)
B = runif(5, min = 0, max = 50)
C = runif(5, min = 0, max = 50)
D = runif(5, min = 0, max = 50)
E = runif(5, min = 0, max = 50)

example = data.frame(A,B,C,D,E)

sum_of_example <- apply(example, 2, sum)
data1 <- example[,which(sum_of_example < 110)]



data2 <- setdiff(data1, example)
Error in `setdiff()`:
! `x` and `y` are not compatible.
✖ Different number of columns: 2 vs 5.

I thought the setdiff function what was I wanted but apparently not.


Solution

  • You have the arguments the wrong way round to use base::setdiff, but in any case the error you are getting is because you have dplyr loaded and therefore you are using dplyr::setdiff, which wants the same number of columns in each data frame. Therefore you can do:

    base::setdiff(example, data1)
    #>            A         D         E
    #> 1 30.3949746 22.338276 45.205095
    #> 2  0.7545649 32.507727 16.596549
    #> 3 49.4556825 39.636089 32.164900
    #> 4 21.2279878 12.875228 31.125093
    #> 5 18.3556967  9.626054  1.783301
    

    or you may subset by names:

    example[setdiff(names(example), names(data1))]
    #>            A         D         E
    #> 1 30.3949746 22.338276 45.205095
    #> 2  0.7545649 32.507727 16.596549
    #> 3 49.4556825 39.636089 32.164900
    #> 4 21.2279878 12.875228 31.125093
    #> 5 18.3556967  9.626054  1.783301
    

    Or invert your selection criteria:

    example[,-which(sum_of_example < 110)]
    #>            A         D         E
    #> 1 30.3949746 22.338276 45.205095
    #> 2  0.7545649 32.507727 16.596549
    #> 3 49.4556825 39.636089 32.164900
    #> 4 21.2279878 12.875228 31.125093
    #> 5 18.3556967  9.626054  1.783301