How do I use recode()
in order to "clean/strip" certain parts of a column in my data frame? The original data frame looks like this:
df <- data.frame(duration = c("concentration, up to 2 minutes", "concentration, up to 4 minutes", "up to 6 hours"), name = c("Earth", "Water", "Fire"))
The improved version looks this this:
df <- data.frame(duration = c("2 minutes", "4 minutes", "6 hours"), name = c("Earth", "Water", "Fire"))
So, I should delete "concentration," and "up to" or replace it by an empty string using the recode
function.
Please find both solutions with dplyr::recode()
and with strings::str_remove()
.
My advice though is to learn the latter too. That way you will be able to learn much more powerful ways of transforming your strings through regular expressions.
dplyr::recode()
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- data.frame(duration = c("concentration, up to 2 minutes",
"concentration, up to 4 minutes",
"up to 6 hours"),
name = c("Earth", "Water", "Fire"))
df$duration = recode(df$duration,
"concentration, up to 2 minutes" = "2 minutes",
"concentration, up to 4 minutes" = "4 minutes",
"up to 6 hours" = "6 hours" )
df
#> duration name
#> 1 2 minutes Earth
#> 2 4 minutes Water
#> 3 6 hours Fire
Created on 2020-05-04 by the reprex package (v0.3.0)
stringr::str_remove()
library(stringr)
df <- data.frame(duration = c("concentration, up to 2 minutes",
"concentration, up to 4 minutes",
"up to 6 hours"),
name = c("Earth", "Water", "Fire"))
df$duration = str_remove( df$duration, "^.*(?=\\d)")
df
#> duration name
#> 1 2 minutes Earth
#> 2 4 minutes Water
#> 3 6 hours Fire
Created on 2020-05-04 by the reprex package (v0.3.0)