I have a list of files that I want to import as tibbles, then mutate a new column onto each tibble based on an ifelse statement for a threshold. I didn't want to use a loop, so I have been trying to make this work by using map()
wrapped around read_csv %>% mutate()
with no success. If possible, I would like to keep the pipes and/or do both the read csv and mutate steps within the same command.
Here are the two solutions I have tried:
This loop works, but I want to make it better.
filenames <- list.files(path = 'data-cleaned', full.names = TRUE)
for (x in 1:length(filenames)) {
file <- read_csv(filenames[[x]]) %>%
mutate(output_col = if_else(target_col >= threshold, "Positive", "Negative"))
filename <- filenames[[x]]
write_csv(file, filename)
}
This is my attempted improvement, but I have been getting errors every time I try to change it.
filenames <- list.files(path = 'data-cleaned', full.names = TRUE)
files <- filenames %>%
map(read_csv %>%
~ .x %>% mutate(output_col = if_else(x$target_col >= threshold, "Positive", "Negative"))
)
Error in `as_mapper()`:
! Can't convert `.f`, a two-sided formula, to a function.
In both of these cases, I am starting from the list of file names, but here is what I would expect the tibbles (simplified) to look like after being read in:
example1 <- tibble(id = 1:50,
target_col = sample(1:2000, 50),
other_col = sample(1:200, 50))
id target_col other_col
<int> <int> <int>
1 1 860 118
2 2 582 120
3 3 1514 190
4 4 1266 14
5 5 1921 137
6 6 804 1
7 7 222 139
8 8 1748 26
9 9 472 107
10 10 1013 189
And this is what I would expect each tibble to look like after the mutation (except the Positive or Negative would be assigned based on if the target_col met the numeric threshold criteria):
example2 <- tibble(id = 1:50,
target_col = sample(1:2000, 50),
other_col = sample(1:200, 50),
output_col = sample(rep(c("Positive", "Negative"), each = 25))
id target_col other_col output_col
<int> <int> <int> <chr>
1 1 1886 174 Negative
2 2 1794 55 Negative
3 3 472 178 Positive
4 4 1583 195 Negative
5 5 1037 110 Positive
6 6 1652 158 Negative
7 7 152 69 Negative
8 8 1396 113 Positive
9 9 1016 150 Positive
10 10 1922 157 Negative
I also want to mention I have tried looking at these existing questions and didn't find them helpful:
How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs
How to mutate multiple columns with dynamic variable using purrr:map function?
R using dplyr::mutate() within purrr::map without duplicating rows
Iterating over listed data frames within a piped purrr anonymous function call
Using purrr to iterate over two lists and then pipe into dplyr::filter across a list of data frames
Your use of the formula syntax ~ .x
is syntactically incorrect - the formula should start with ~
. Another x
shows up later in the pipeline which isn't defined anywhere.
The following is correct:
files <- filenames %>%
map(~ read_csv(.x) %>%
mutate(output_col = if_else(target_col >= threshold, "Positive", "Negative")))
)
purrr
only recommends the formula syntax if you require backward compatibility, I would personally use an anonymous function here:
files <- filenames %>% map(\(x) read_csv(x) %>% ...)