Search code examples
rdataframedata-cleaning

Remove columns based on specific criteria from multiple data frames with same column structures


I have 4 data frames for 4 different data groups (total 16 data frames) with the same column structure each having column a, b, c, d etc. (over hundreds of columns), but the values are different for each data frame. The only thing that are the same are the number of variables and column names (to some degree, but there is no pattern. The column names are names for items, not a, b, c etc.) for each "data group".

For example:

dat1 = data.frame(x = c(0.1,0.2,0.3,0.4,0.5),
                  y = c(0.6,0.7,0.8,0.9,0.10), 
                  z = c(0.12,0.13,0.14,0.15,0.16))    

which produces

   x   y    z
1 0.1 0.6 0.12
2 0.2 0.7 0.13
3 0.3 0.8 0.14
4 0.4 0.9 0.15
5 0.5 0.1 0.16

and second data frame

dat2 = data.frame(x = c(1,2,3,4,5), y = c(6,7,8,9,10), z = c(12,13,14,15,16))

  x  y  z
1 1  6 12
2 2  7 13
3 3  8 14
4 4  9 15
5 5 10 16

I want to do my data cleaning in dat1 based on certain criteria, such that if I remove column x in dat1 then column x will also be removed in dat2. These specific criteria could be

dat1[,tail(dat1, n = 1) < 0.2] 

   y    z
1 0.6 0.12
2 0.7 0.13
3 0.8 0.14
4 0.9 0.15
5 0.1 0.16

such that dat2 also automatically deletes colunm x.

   y  z
1  6 12
2  7 13
3  8 14
4  9 15
5 10 16

Is there a way to do this? I have been trying to search for it on StackOverflow, but I couldn't find anything useful. Thanks.


Solution

  • Something like this?
    With the data you posted, it works as expected.

    cols.to.remove <- function(DF1, DF2) {
        d <- setdiff(names(DF1), names(DF2))
        -which(d %in% names(DF1))
    }
    
    
    dat2 <- dat2[cols.to.remove(dat2, dat1)]
    dat2
    #   y  z
    #1  6 12
    #2  7 13
    #3  8 14
    #4  9 15
    #5 10 16