I'm trying to conduct the sentiment analysis in German in R. However, the output does not seem promising as I could not find a way to make it in German language.
Would you have any suggestions for me?
#libraries
library(tidyverse)
library(tokenizers)
library(stopwords)
library(sentimentr)
#load data
data <- tribble(
~content,
"Nimmt euch in Acht✌️#tage #periode #blu #hände #rot #blute #wald #fy #viral",
"ich liebe uns #wortwitze #Periode #Tage #couplegoals",
"Mit KadeZyklus bei Krämpfen gibt es jetzt endlich ein pflanzliches Helferlein gegen leichte Unterleibskrämpfe!",
"Es ist wie es ist Jungs"
)
# count freq of words
words_as_tokens <- setNames(lapply(sapply(data$content,
tokenize_words,
stopwords = stopwords(language = "en", source = "smart")),
function(x) as.data.frame(sort(table(x), TRUE), stringsAsFactors = F)), data$content)
# tidyverse's job
stop_german <- data.frame(word = stopwords::stopwords("de"), stringsAsFactors = FALSE)
df <- words_as_tokens %>%
bind_rows(, .id = "content") %>%
rename(word = x) %>%
anti_join(stop_german, by = c("word"))
#sentiment
df$sentiment_score <- sapply(df$content, function(x)
mean(sentiment(x)$sentiment))
You have specified the wrong source for stopwords and the wrong language. smart
as source
does not contain de
as language. If you do stopwords_getsources()
you get all available sources for stopwords
. With stopwords_getlanguages(source = 'snowball')
you'll see that this contains de
.
Change your stopwords
accordingly and it will work.
# count freq of words
words_as_tokens <- setNames(lapply(
sapply(data$content,
tokenize_words,
stopwords = stopwords(language = "de", source = "snowball")
),
function(x) as.data.frame(sort(table(x), TRUE), stringsAsFactors = F)
), data$content)