I have a dataframe containing two columns: 1st column is the keyword and 2nd is the associated category.
keywords <- c("keyword1", "keyword2", "keyword3")
categories <- c("category1", "category2", "category3")
lookup_table <- data.frame(keywords, categories)
I would like that each time I have a new label, I check whether there is a category corresponding to it and if so, attach the category.
So for the following example below, there would be the value 'category1' attached to the first row in a new column:
new_labels <- c("keyword1 qefjhqek", "hfaef", "fihiz")
Help much appreciated!
Here just use str_extract
to get the relevant text and join the reference table.
keywords <- c("keyword1", "keyword2", "keyword3")
categories <- c("category1", "category2", "category3")
lookup_table <- data.frame(keywords, categories)
new_labels <- c("keyword1 qefjhqek", "hfaef", "fihiz")
library(data.table)
library(tidyverse)
ref_tbl <-
# data.table(
# For the AntoniosK's sugguestion, recommend dplyr-like function.
tibble(
keywords = keywords
,categories = categories
)
# as.data.table(
# For the AntoniosK's sugguestion, recommend dplyr-like function.
as_tibble(
new_labels
) %>%
mutate(ref_key = str_extract(new_labels
# ,'keyword[:digit:]'
,(
keywords %>%
str_flatten('|')
# regular expression
)
)) %>%
left_join(
ref_tbl
,by=c('ref_key'='keywords')
)
#> # A tibble: 3 x 3
#> value ref_key categories
#> <chr> <chr> <chr>
#> 1 keyword1 qefjhqek keyword1 category1
#> 2 hfaef <NA> <NA>
#> 3 fihiz <NA> <NA>
Created on 2018-11-10 by the reprex package (v0.2.1)
From @AntoniosK's question, I do the comparison between data.table
and tibble
. And the fact is there is a significant sign supporting tibble
is better than data.table
.