I have a dataframe with empty and NA values, like that:
> DF_datos
V1 V2 V3 V4 V5 V6
3 aaa aaa NA
4 bb bb 70 80 NA
5 cc cc 80 80 NA
6 ddd ddd NA
I would like to delete all columns that contains an empty or NA value on its second or third row, without a loop over the dataframe. With subset function or something similar... The result I want would be:
> DF_datos
V1 V2 V3 V4
3 aaa aaa
4 bb bb 70 80
5 cc cc 80 80
6 ddd ddd
We can use tidyverse
with select
from dplyr
. In select
, specify the logical expression in where
to check if there are any
non-NA (!is.na(.)
) and any
non-blank (nzchar
) elements in the column to be selected
library(dplyr)
DF_datos %>%
select(where(~ any(!is.na(.))&any(nzchar(.))))
-output
# V1 V2 V3 V4
#3 aaa aaa
#4 bb bb 70 80
#5 cc cc 80 80
#6 ddd ddd
If we need to only check the 2nd or 3rd row
DF_datos %>%
slice(2:3) %>%
select(where(~ any(!is.na(.))&any(nzchar(.)))) %>%
names %>%
select(DF_datos, .)
Or with Filter
from base R
(R 4.1.0
) using the same logic
Filter(\(x) any(!is.na(x)) & any(nzchar(x)), DF_datos)
-output
# V1 V2 V3 V4
#3 aaa aaa
#4 bb bb 70 80
#5 cc cc 80 80
#6 ddd ddd
Or for rows 2 and 3
Filter(\(x) any(!is.na(x)) & any(nzchar(x)), DF_datos[2:3,]) |>
names() |>
{\(x) subset(DF_datos, select = x)}()
Or use sum
instead of any
by checking if the sum
of the compound logical expression is greater than 0
Filter(\(x) sum(!is.na(x) & nzchar(x)) > 0, DF_datos)
In earlier R
versions use
Filter(function(x) any(!is.na(x)) & any(nzchar(x)), DF_datos)
NOTE: All of the above options are efficient as this loops over the columns and is memory efficient as this won't apply the expression on the whole dataset
Based on the comments, the OP wanted to delete columns if there are any NA or blank in rows 2 or 3.
DF_datos$V6 <- c(NA, NA, 80, NA)
DF_datos %>%
slice(2:3) %>%
select(where(~ all(!is.na(.)) & all(nzchar(.)))) %>% names %>%
select(DF_datos, .)
-output
V1 V2 V3 V4
3 aaa aaa
4 bb bb 70 80
5 cc cc 80 80
6 ddd ddd
Or using Filter
Filter(\(x) all(!is.na(x)) & all(nzchar(x)), DF_datos[2:3,]) |>
names() |>
{\(x) subset(DF_datos, select = x)}()
# V1 V2 V3 V4
#3 aaa aaa
#4 bb bb 70 80
#5 cc cc 80 80
#6 ddd ddd
DF_datos <- structure(list(V1 = c("aaa", "bb", "cc", "ddd"), V2 = c("aaa",
"bb", "cc", "ddd"), V3 = c("", "70", "80", ""), V4 = c("", "80",
"80", ""), V5 = c("", "", "", ""), V6 = c(NA, NA, NA, NA)), row.names = c("3",
"4", "5", "6"), class = "data.frame")