I am trying to make a simple line of code to detect where there are incorrect entries in a dataframe. Consider the following example:
author val1 val2 val3 val4
A 1 B 1 NA
A NA NA NA NA
NA 2 B NA B
NA NA NA NA B
NA NA NA NA NA
A 2 A NA B
A row always needs to have the author filled in, but this is sometimes forgotten. Also, sometimes row 2 has the author filled in, but by accident the rest of the data is entered on row 3.
What i want is to filter for rows that have NA
for author and after that filter for any data entrie in whatever column. So my expected output for the above example would be:
author val1 val2 val3 val4
NA 2 B NA B
NA NA NA NA B
Filtering for the rows with NA for author is easy, but i cant figure out what to do next. My code so far:
df %>%
filter(
is.na(author)
) %>%
filter(
across(
.cols = everything(),
.fns = ~ !is.na(.x)
)
)
I have the feeling i am pretty close, but after a few hours of trying and looking on stack my code still returns empty dataframes to me. I would prefer a solution in tidyverse syntax, but any help is much appreciated.
My code is not very efficient but it seems to work.
library(stringr)
library(rebus)
library(tidyverse)
library(magrittr)
df <- tibble(author = c('A', 'A', NA, NA, NA, 'A'),
val1 = c(1, NA, 2, NA, NA, 2),
val2 = c('B', NA, 'B', NA, NA, 'A'),
val3 = c(1, NA, NA, NA, NA , NA),
val4 = c(NA, NA, 'B', 'B', NA, 'B'))
df_na <- filter(df, is.na(author))
#map and str_which will cover each column
index <- map(df_na,~ str_which(.x, pattern = rebus::or(ANY_CHAR, DGT))) %>%
keep(~ length(.x) != 0) %>% #filter any columns that are all NA
unlist() %>%
unique()
df_na %>% extract(index, )