Complementary subset of a dataframe regarding columns

I want to create a dataframe that takes the whole dataframe (here called example) except the columns already taken in the dataframe ( here called data1).

I am looking for a funtion that takes as output the whole dataframe and the first subset and returns the second dataframe whish is the complementary subset of the firts one.

Here is an example :

set.seed(10022023)

A = runif(5, min = 0, max = 50)
B = runif(5, min = 0, max = 50)
C = runif(5, min = 0, max = 50)
D = runif(5, min = 0, max = 50)
E = runif(5, min = 0, max = 50)

example = data.frame(A,B,C,D,E)

sum_of_example <- apply(example, 2, sum)
data1 <- example[,which(sum_of_example < 110)]



data2 <- setdiff(data1, example)
Error in `setdiff()`:
! `x` and `y` are not compatible.
✖ Different number of columns: 2 vs 5.

I thought the setdiff function what was I wanted but apparently not.

Solution

You have the arguments the wrong way round to use base::setdiff, but in any case the error you are getting is because you have dplyr loaded and therefore you are using dplyr::setdiff, which wants the same number of columns in each data frame. Therefore you can do:

base::setdiff(example, data1)
#>            A         D         E
#> 1 30.3949746 22.338276 45.205095
#> 2  0.7545649 32.507727 16.596549
#> 3 49.4556825 39.636089 32.164900
#> 4 21.2279878 12.875228 31.125093
#> 5 18.3556967  9.626054  1.783301

or you may subset by names:

example[setdiff(names(example), names(data1))]
#>            A         D         E
#> 1 30.3949746 22.338276 45.205095
#> 2  0.7545649 32.507727 16.596549
#> 3 49.4556825 39.636089 32.164900
#> 4 21.2279878 12.875228 31.125093
#> 5 18.3556967  9.626054  1.783301

Or invert your selection criteria:

example[,-which(sum_of_example < 110)]
#>            A         D         E
#> 1 30.3949746 22.338276 45.205095
#> 2  0.7545649 32.507727 16.596549
#> 3 49.4556825 39.636089 32.164900
#> 4 21.2279878 12.875228 31.125093
#> 5 18.3556967  9.626054  1.783301