My data is in the format below. (Code for data input at the very end, below question).
#> df
#> id amount description
#> 1 10 electricity
#> 2 100 rent
#> 3 4 fees
I would like to be able to classify the transactions (rows), based on whether certain strings are in the description.
So for example:
library(tidyverse)
df <- df %>%
mutate(category = ifelse(str_detect(description, "elec"), "bills", description))
which gives:
#> id amount description category
#> 1 1 10 electricity bills
#> 2 2 100 rent
#> 3 3 4 fees
I'd like to be able to define a vector of keywords and the associated categories, as below:
keywords <- c(electric = "bills",
rent = "bills",
fees = "misc")
What is the next step to be able to create the categories column with the correct labels?
Desired Output:
#> id amount description category
#> 1 1 10 electricity bills
#> 2 2 100 rent bills
#> 3 3 4 fees misc
I've tried map2_df
, but I must be doing something wrong, because the code below creates three versions of the df stacked on top of each other:
categorise_transactions <- function(keyword, category){df <- df %>%
mutate(category = ifelse(str_detect(description, keyword), category, description))}
library(purrr)
map2_df(names(keywords), keywords, categorise_transactions)
code for data input below:
df <- data.frame(
stringsAsFactors = FALSE,
id = c(1L, 2L, 3L),
amount = c(10L, 100L, 4L),
description = c("electricity", "rent", "fees")
)
df
str_replace_all
almost gives what you need :
library(dplyr)
library(stringr)
str_replace_all(df$description, keywords)
#[1] "billsity" "bills" "misc"
However, as suggested by @Russ Thomas case_when
gives exactly what you need.
library(dplyr)
library(stringr)
df %>%
mutate(category = case_when(str_detect(description, 'electric') ~ 'bills',
str_detect(description, 'rent') ~ 'bills',
str_detect(description, 'fees') ~ 'misc'))
# id amount description category
#1 1 10 electricity bills
#2 2 100 rent bills
#3 3 4 fees misc