I have 4 data frames
named w, x, y, z
each with 3 columns and identical column names. I now execute an operation that removes rows until the column named Type
is identical over all four data frames.
To achieve this I am using a while
loop with the following code:
list_df <- list(z, w, x, y)
tmp <- lapply(list_df, `[[`, 'Type')
i <- as.integer(as.logical(all(sapply(tmp, function(x) all(x == tmp[[1]])))))
while (i == 0) {
z <- z[(z$Type %in% x$Type),]
y <- y[(y$Type %in% x$Type),]
w <- w[(w$Type %in% x$Type),]
z <- z[(z$Type %in% w$Type),]
y <- y[(y$Type %in% w$Type),]
x <- x[(x$Type %in% w$Type),]
z <- z[(z$Type %in% y$Type),]
x <- x[(x$Type %in% y$Type),]
w <- w[(w$Type %in% y$Type),]
x <- x[(x$Type %in% z$Type),]
w <- w[(w$Type %in% z$Type),]
y <- y[(y$Type %in% z$Type),]
list_df <- list(z, w, x, y)
tmp <- lapply(list_df, `[[`, 'Type')
i <- as.integer(as.logical(all(sapply(tmp, function(x) all(x == tmp[[1]])))))
}
In this code, a list is created for the column Type
of every data frame. Then the value i
tests for identicality and produces 0 if false and 1 if true. The while loop
then performs the deletion of rows not included in every data frame and only stops until i
becomes 1.
This code works, but applying it to bigger data can result in a long time for the code to go through. Does anybody have an idea on how to simplify this execution?
For reproducible example:
w <- structure(list(Type = c("26809D", "28503C", "360254", "69298N",
"32708V", "680681", "329909", "696978", "32993F", "867609", "51206K",
"130747"), X1980 = c(NA, NA, NA, 271835, NA, NA, NA, NA, NA,
NA, NA, NA), X1981 = c(NA, NA, NA, 290314, NA, NA, NA, NA, NA,
NA, NA, NA)), row.names = c("2", "4", "7", "8", "10", "11", "13",
"16", "17", "21", "22", "23"), class = "data.frame")
x <- structure(list(Type = c("26809D", "28503C", "360254", "69298N",
"32708V", "680681", "329909"), X1980 = c(NA, NA, NA, 1026815,
NA, NA, NA), X1981 = c(NA, NA, NA, 826849, NA, NA, NA)), row.names = c("2",
"4", "7", "8", "10", "11", "13"), class = "data.frame")
y <- structure(list(Type = c("26809D", "28503C", "360254", "69298N",
"32708V"), X1980 = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_), X1981 = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_)), row.names = c("2", "4", "7", "8", "10"), class = "data.frame")
z <- structure(list(Type = c("26809D", "28503C", "360254", "69298N",
"32708V", "680681", "329909", "696978", "32993F", "867609", "51206K",
"130747", "50610H"), X1980 = c(NA, NA, NA, 0.264736101439889,
NA, NA, NA, NA, NA, NA, NA, NA, NA), X1981 = c(NA, NA, NA, 0.351108848169376,
NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c("2", "4",
"7", "8", "10", "11", "13", "16", "17", "21", "22", "23", "24"
), class = "data.frame")
We assume that the question is how to get the values of Type
that are common to 4 data frames each of which has a Type
column containing unique values.
Form a list L
of the data frames, extract the Type
column using lapply
and [
and iterate merge
over that using Reduce
:
L <- list(w, x, y, z)
L.Type <- lapply(L, "[", TRUE, "Type", drop = FALSE) # list of DFs w only Type col
Reduce(merge, L.Type)$Type
## [1] "26809D" "28503C" "32708V" "360254" "69298N"
or replace last line with this giving the same result except for order:
Reduce(intersect, L.Type)$Type
## [1] "26809D" "28503C" "360254" "69298N" "32708V"
Another approach which is a bit tedious but does reduce the calulation to one line is to manually iterate intersect
:
intersect(w$Type, intersect(x$Type, intersect(y$Type, z$Type)))
## [1] "26809D" "28503C" "360254" "69298N" "32708V"
The example data is not very good to illustrate this because every data frame has the same values of Type so let us create another example. BOD
is a built-in data frame has 6 rows. We assign it to X
and rename the columns so that the first one has the name Type
. Then for i
equals 1, 2, 3, 4 we remove the i-th row giving 4 data frames with 5 rows each and 2 values of Type
common to all 4. The result correctly shows that 5 and 7 are the only common Type
values.
# set up input L, a list of 4 data frames
X <- BOD
names(X) <- c("Type", "X")
L <- lapply(1:4, function(i) X[-i, ])
L.Type <- lapply(L, "[", TRUE, "Type", drop = FALSE)
Reduce(merge, L.Type)$Type
## [1] 5 7