I have a large data frame (6 million rows, 20 columns) where data in one column corresponds to data in another column. I created a key that I now want to use to fix rows that have the wrong value. As a small example:
key = data.frame(animal = c('dog', 'cat', 'bird'),
sound = c('bark', 'meow', 'chirp'))
The data frame looks like this (minus the other columns of data):
df = data.frame(id = c(1, 2, 3, 4),
animal = c('dog', 'cat', 'bird', 'cat'),
sound = c('meow', 'bark', 'chirp', 'chirp'))
I swear I have done this before but can't remember my solution. Any ideas?
Using dplyr
. If you want to fix sound according to animal,
library(dplyr)
df <- df %>%
mutate(sound = sapply(animal, function(x){key %>% filter(animal==x) %>% pull(sound)}))
should do the trick. If you want to fix animal according to sound:
df <- df %>%
mutate(animal = sapply(sound, function(x){key %>% filter(sound==x) %>% pull(animal)}))