I want to create a dataframe that takes the whole dataframe (here called example) except the columns already taken in the dataframe ( here called data1).
I am looking for a funtion that takes as output the whole dataframe and the first subset and returns the second dataframe whish is the complementary subset of the firts one.
Here is an example :
set.seed(10022023)
A = runif(5, min = 0, max = 50)
B = runif(5, min = 0, max = 50)
C = runif(5, min = 0, max = 50)
D = runif(5, min = 0, max = 50)
E = runif(5, min = 0, max = 50)
example = data.frame(A,B,C,D,E)
sum_of_example <- apply(example, 2, sum)
data1 <- example[,which(sum_of_example < 110)]
data2 <- setdiff(data1, example)
Error in `setdiff()`:
! `x` and `y` are not compatible.
✖ Different number of columns: 2 vs 5.
I thought the setdiff function what was I wanted but apparently not.
You have the arguments the wrong way round to use base::setdiff
, but in any case the error you are getting is because you have dplyr
loaded and therefore you are using dplyr::setdiff
, which wants the same number of columns in each data frame. Therefore you can do:
base::setdiff(example, data1)
#> A D E
#> 1 30.3949746 22.338276 45.205095
#> 2 0.7545649 32.507727 16.596549
#> 3 49.4556825 39.636089 32.164900
#> 4 21.2279878 12.875228 31.125093
#> 5 18.3556967 9.626054 1.783301
or you may subset by names:
example[setdiff(names(example), names(data1))]
#> A D E
#> 1 30.3949746 22.338276 45.205095
#> 2 0.7545649 32.507727 16.596549
#> 3 49.4556825 39.636089 32.164900
#> 4 21.2279878 12.875228 31.125093
#> 5 18.3556967 9.626054 1.783301
Or invert your selection criteria:
example[,-which(sum_of_example < 110)]
#> A D E
#> 1 30.3949746 22.338276 45.205095
#> 2 0.7545649 32.507727 16.596549
#> 3 49.4556825 39.636089 32.164900
#> 4 21.2279878 12.875228 31.125093
#> 5 18.3556967 9.626054 1.783301