i have a following example:
dat <- read.table(text="index string
1 'I have first and second'
2 'I have first, first'
3 'I have second and first and thirdeen'", header=TRUE)
toMatch <- c('first', 'second', 'third')
dat$count <- stri_count_regex(dat$string, paste0('\\b',toMatch,'\\b', collapse="|"))
dat
index string count
1 1 I have first and second 2
2 2 I have first, first 2
3 3 I have second and first and thirdeen 2
I want to add to the dataframe a column count, which will tell me how many UNIQUE words does each row have. The desired output would in this case be
index string count
1 1 I have first and second 2
2 2 I have first, first 1
3 3 I have second and first and thirdeen 2
Could you please give me a hint how to modify the original formula? Thank you very much
With base R you could do the following:
sapply(dat$string, function(x)
{sum(sapply(toMatch, function(y) {grepl(paste0('\\b', y, '\\b'), x)}))})
which returns
[1] 2 1 2
Hope this helps!