Here is my df:
df <- structure(list(id = 1:50, strain_id = c(6L, 6L, 7L, 12L, 19L,
35L, 81L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L,
100L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L,
123L, 202L, 202L, 202L, 202L, 202L, 202L, 202L, 202L, 202L, 202L,
202L, 246L, 246L, 246L, 246L, 246L, 246L, 246L, 246L, 246L, 246L,
246L), name = c("Anorexia and Cachexia", "Autoimmune Diseases and Inflammation",
"Psychiatric Symptoms", "Autoimmune Diseases and Inflammation",
"Pain", "Autoimmune Diseases and Inflammation", "Dependency and Withdrawal",
"Anorexia and Cachexia", "Spasticity", "Movement Disorders",
"Pain", "Glaucoma", "Epilepsy", "Asthma", "Dependency and Withdrawal",
"Psychiatric Symptoms", "Autoimmune Diseases and Inflammation",
"Nausea and Vomiting", "Anorexia and Cachexia", "Spasticity",
"Movement Disorders", "Pain", "Glaucoma", "Epilepsy", "Asthma",
"Dependency and Withdrawal", "Psychiatric Symptoms", "Autoimmune Diseases and Inflammation",
"Nausea and Vomiting", "Anorexia and Cachexia", "Spasticity",
"Movement Disorders", "Pain", "Glaucoma", "Epilepsy", "Asthma",
"Dependency and Withdrawal", "Psychiatric Symptoms", "Autoimmune Diseases and Inflammation",
"Nausea and Vomiting", "Anorexia and Cachexia", "Spasticity",
"Movement Disorders", "Pain", "Glaucoma", "Epilepsy", "Asthma",
"Dependency and Withdrawal", "Psychiatric Symptoms", "Autoimmune Diseases and Inflammation"
), rating = c(4, 4, 5, 5, 4, 5, 5, 5, 4, 5, 5, 4, 4, 3, 5, 5,
5, 3, 3, 5, 5, 4, 3, 4, 4, 4, 3, 4, 3, 3, 2, 3, 4, 4, 3, 2, 5,
3, 3, 3, 3, 4, 4, 3, 5, 3, 1, 3, 4, 3), dose = c(3, 3, 3, 3,
3, 3, 1, 3, 2, 1, 2, 2, 2, 3, 2, 2, 2, 2, 2, 3, 3, 2, 2, 2, 3,
3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 1, 2, 2, 1, 3, 2,
3, 2, 2, 3), info = c("Affects / helps even in small doses very well at / against Anorexia and Cachexia.",
"Affects / helps even in small doses very well at / against Autoimmune Diseases and Inflammation.",
"Affects / helps even in small doses extremly well at / against Psychiatric Symptoms.",
"Affects / helps even in small doses extremly well at / against Autoimmune Diseases and Inflammation.",
"Affects / helps even in small doses very well at / against Pain.",
"Affects / helps even in small doses extremly well at / against Autoimmune Diseases and Inflammation.",
"Affects / helps only in heavy doses extremly well at / against Dependency and Withdrawal.",
"Affects / helps even in small doses extremly well at / against Anorexia and Cachexia.",
"Affects / helps in average doses very well at / against Spasticity.",
"Affects / helps only in heavy doses extremly well at / against Movement Disorders.",
"Affects / helps in average doses extremly well at / against Pain.",
"Affects / helps in average doses very well at / against Glaucoma.",
"Affects / helps in average doses very well at / against Epilepsy.",
"Affects / helps even in small doses well at / against Asthma.",
"Affects / helps in average doses extremly well at / against Dependency and Withdrawal.",
"Affects / helps in average doses extremly well at / against Psychiatric Symptoms.",
"Affects / helps in average doses extremly well at / against Autoimmune Diseases and Inflammation.",
"Affects / helps in average doses well at / against Nausea and Vomiting.",
"Affects / helps in average doses well at / against Anorexia and Cachexia.",
"Affects / helps even in small doses extremly well at / against Spasticity.",
"Affects / helps even in small doses extremly well at / against Movement Disorders.",
"Affects / helps in average doses very well at / against Pain.",
"Affects / helps in average doses well at / against Glaucoma.",
"Affects / helps in average doses very well at / against Epilepsy.",
"Affects / helps even in small doses very well at / against Asthma.",
"Affects / helps even in small doses very well at / against Dependency and Withdrawal.",
"Affects / helps in average doses well at / against Psychiatric Symptoms.",
"Affects / helps in average doses very well at / against Autoimmune Diseases and Inflammation.",
"Affects / helps in average doses well at / against Nausea and Vomiting.",
"Affects / helps in average doses well at / against Anorexia and Cachexia.",
"Affects / helps in average doses low at / against Spasticity.",
"Affects / helps in average doses well at / against Movement Disorders.",
"Affects / helps in average doses very well at / against Pain.",
"Affects / helps in average doses very well at / against Glaucoma.",
"Affects / helps in average doses well at / against Epilepsy.",
"Affects / helps even in small doses low at / against Asthma.",
"Affects / helps in average doses extremly well at / against Dependency and Withdrawal.",
"Affects / helps in average doses well at / against Psychiatric Symptoms.",
"Affects / helps in average doses well at / against Autoimmune Diseases and Inflammation.",
"Affects / helps in average doses well at / against Nausea and Vomiting.",
"Affects / helps only in heavy doses well at / against Anorexia and Cachexia.",
"Affects / helps in average doses very well at / against Spasticity.",
"Affects / helps in average doses very well at / against Movement Disorders.",
"Affects / helps only in heavy doses well at / against Pain.",
"Affects / helps even in small doses extremly well at / against Glaucoma.",
"Affects / helps in average doses well at / against Epilepsy.",
"Affects / helps even in small doses very low at / against Asthma.",
"Affects / helps in average doses well at / against Dependency and Withdrawal.",
"Affects / helps in average doses very well at / against Psychiatric Symptoms.",
"Affects / helps even in small doses well at / against Autoimmune Diseases and Inflammation."
), votes = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L)), row.names = c(NA, 50L), class = "data.frame")
And I need to work on the name
column.
df %>%
tidytext::unnest_tokens(input = name,
output = word,
token = "words",
format = "text",
drop = T,
to_lower = T) %>%
dplyr::mutate(word = sapply(word, tm::removePunctuation, ucp = T),
word = tm::removeWords(word, stopwords("en")),
word = tm::stripWhitespace(word)) %>%
dplyr::filter(!word == "")
Please advise which function or setting should I use to avoid filtering (dplyr::filter(!word == "")
) and remove rows with blank values.
In other words I want my code automatically (using a setting or function) to do filtering of rows with empty values in specific columns.
I can recreate your outcome with only funcitons from tidytext.
The functions from tm are not needed as tidytext with unnest_tokens already takes care of punctuation and whitespace removal (unless specified otherwise). And you can use dplyr's antijoin
with the stop_words
from tidytext to remove the unwanted stopwords.
df %>%
tidytext::unnest_tokens(input = name,
output = word,
token = "words",
format = "text",
drop = T,
to_lower = T) %>%
anti_join(tidytext::stop_words)