Search code examples
rdplyrsentiment-analysis

Sentiment Analysis in R for German language


I'm trying to conduct the sentiment analysis in German in R. However, the output does not seem promising as I could not find a way to make it in German language.

Would you have any suggestions for me?

#libraries
library(tidyverse)
library(tokenizers)
library(stopwords)
library(sentimentr)

#load data
data <- tribble(
  ~content, 
  "Nimmt euch in Acht✌️#tage #periode #blu #hände #rot #blute #wald #fy #viral",
  "ich liebe uns #wortwitze #Periode #Tage #couplegoals",
  "Mit KadeZyklus bei Krämpfen gibt es jetzt endlich ein pflanzliches Helferlein gegen leichte Unterleibskrämpfe!",
  "Es ist wie es ist Jungs"
)

# count freq of words
words_as_tokens <- setNames(lapply(sapply(data$content, 
                                          tokenize_words, 
                                          stopwords = stopwords(language = "en", source = "smart")), 
                                   function(x) as.data.frame(sort(table(x), TRUE), stringsAsFactors = F)), data$content) 

# tidyverse's job
stop_german <- data.frame(word = stopwords::stopwords("de"), stringsAsFactors = FALSE)
df <- words_as_tokens %>%
  bind_rows(, .id = "content") %>%
  rename(word = x) %>% 
  anti_join(stop_german, by = c("word"))

#sentiment
df$sentiment_score <- sapply(df$content, function(x) 
  mean(sentiment(x)$sentiment))

Solution

  • You have specified the wrong source for stopwords and the wrong language. smart as source does not contain de as language. If you do stopwords_getsources() you get all available sources for stopwords. With stopwords_getlanguages(source = 'snowball') you'll see that this contains de.

    Change your stopwords accordingly and it will work.

    # count freq of words
    words_as_tokens <- setNames(lapply(
      sapply(data$content,
        tokenize_words,
        stopwords = stopwords(language = "de", source = "snowball")
      ),
      function(x) as.data.frame(sort(table(x), TRUE), stringsAsFactors = F)
    ), data$content)