I am trying to use agrep command for fuzzy matching. I have a data frame in which one column contains the audience response and another dataframe in which segment and subsegment are listed. the column audience response contains the words that are the name of the subsegment. For example:
pattern$audience
[1] "(Deleted) Semasio » DE: Intent » Christmas Shopping"
[2] "(Old) AddThis - UK » Auto » General » Auto Enthusiasts"
[3] "(Old) AddThis - UK » Auto » General » Auto Intenders"
[4] "(Old) AddThis - UK » Financial » Social » Financial Shoppers"
[5] "(Old) AddThis - UK » Food » Social"
[6] "(Old) AddThis - UK » Health » Social » Health Influencers"
Similarly I have another data frame called x that conatins the segment and sub-segment
x$segment x$subsegment
Shopping Financial shoppers
Travel Travel Europe
Shopping Christmas shopping
I want to write a function that does the fuzzy matching between pattern$Audience and x$subsegment and returns the subsegment for each of the audience response in a new column as pattern$subseg
The resulting data set I need should be like this:
pattern$audience x$segment x$subsegment
[1] "(Deleted) Semasio » DE: Intent » Christmas C" Shopping Christmas shopping
[2] "(Old) AddThis - UK » Auto » General » Auto Enthusiasts"
[3] "(Old) AddThis - UK » Auto » General » Auto Intenders"
[4] "(Old) AddThis - UK » Financial » Social » Financial Shoppers" Shopping Financial shoppers
[5] "(Old) AddThis - UK » Food » Social"
[6] "(Old) AddThis - UK » Health » Social » Health Influencers"
Here's the code that I tried to write but it is not returning me the desired output:
x <- rename(x, c("Segment" = "segment", "Sub Segment" = "subseg"))
names(x)
y <- as.data.frame(x$subseg)
y <- rename(y, c("x$subseg" = "subseg"))
n.match <- function(pattern, x, ...) {
for (i in 1:nrow(pattern)) {
x <- (agrep(y,pattern$audience[i],
ignore.case=TRUE, value = TRUE))
x <- paste0(x,"")
pattern$subseg[i] <- x
}
head(pattern)
}
Can someone please help me correct my mistake. I would really appreciate your answer. Many thanks
We could try this:
pattern <- c("(Deleted) Semasio » DE: Intent » Christmas C",
"(Old) AddThis - UK » Auto » General » Auto Enthusiasts",
"(Old) AddThis - UK » Auto » General » Auto Intenders",
"(Old) AddThis - UK » Financial » Social » Financial Shoppers",
"(Old) AddThis - UK » Food » Social",
"(Old) AddThis - UK » Financial » Social » Financial Shoppers",
"(Old) AddThis - UK » Health » Social » Health Influencers")
pattern <- data.frame(audiance=pattern)
x <- read.csv(text='segment, subsegment
Shopping, Financial shoppers
Travel, Travel Europe
Enthusiasts, Auto Enthusiasts
Shopping, Christmas shopping', stringsAsFactors=FALSE)
vagrep <- Vectorize(agrep, 'pattern', SIMPLIFY = TRUE)
pattern$subsegment <- ''
matches <- vagrep(x$subsegment, pattern$audiance)
invisible(lapply(1:length(matches), function(i) if (length(matches[[i]] > 0)) pattern$subsegment[matches[[i]]] <<- x$subsegment[i]))
pattern
# audiance subsegment
#1 (Deleted) Semasio » DE: Intent » Christmas C
#2 (Old) AddThis - UK » Auto » General » Auto Enthusiasts Auto Enthusiasts
#3 (Old) AddThis - UK » Auto » General » Auto Intenders
#4 (Old) AddThis - UK » Financial » Social » Financial Shoppers Financial shoppers
#5 (Old) AddThis - UK » Food » Social
#6 (Old) AddThis - UK » Financial » Social » Financial Shoppers Financial shoppers
#7 (Old) AddThis - UK » Health » Social » Health Influencers