ID What color is this item? What color is this item?_2 What is the shape of this item? What is the shape of this item?_2 size
55 red blue circle triangle small
83 blue yellow circle NA large
78 red yellow square circle large
43 green NA square circle small
29 yellow green circle triangle medium
I would like to get a frequency table like this:
Variable Level Freq Percent
color blue 2 22.22
red 2 22.22
yellow 3 33.33
green 2 22.22
total 9 100.00
shape circle 5 50.0
triangle 3 30.0
square 2 20.0
total 10 100.0
size small 2 33.3
medium 2 33.3
large 2 33.3
total 6 100.0
But I am having trouble matching the names of my columns when I try to convert to long since they are long strings. From a previous question, I know I can do something like:
options(digits = 3)
df1 <- df2 %>%
pivot_longer(
-ID,
names_to = "Question",
values_to = "Response"
) %>%
mutate(Question = str_extract(Question, '')) %>%
group_by(Question, Response) %>%
count(Response, name = "Freq") %>%
na.omit() %>%
group_by(Question) %>%
mutate(Percent = Freq/sum(Freq)*100) %>%
group_split() %>%
adorn_totals() %>%
bind_rows() %>%
mutate(Response = ifelse(Response == last(Response), last(Question), Response)) %>%
mutate(Question = ifelse(duplicated(Question) |
Question == "Total", NA, Question))
But I'm having trouble finding the right regular expression to put in the line:
mutate(Question = str_extract(Question, '')) %>%
If anyone knows another way to do this that would be great as well!
If the intention is to extract a custom list of words, we could paste the elements together to create a single string and use that as regex
in str_extract
library(dplyr)
library(tidyr)
library(janitor)
library(stringr)
library(flextable)
pat_words <- c("color", "shape", "size")
out <- df %>%
pivot_longer(
-ID,
names_to = "Question",
values_to = "Response"
) %>% mutate(Question = str_extract(Question, str_c(pat_words, collapse="|"))) %>% group_by(Question, Response) %>%
count(Response, name = "Freq") %>%
na.omit() %>%
group_by(Question) %>%
mutate(Percent = round(Freq/sum(Freq)*100, 2)) %>%
group_split() %>%
adorn_totals() %>%
bind_rows() %>%
mutate(Response = ifelse(Response == last(Response), last(Question), Response)) %>%
mutate(Question = ifelse(duplicated(Question) |
Question == "Total", NA, Question)) %>%
as.data.frame
flextable(out)
-output
df <- structure(list(ID = c(55L, 83L, 78L, 43L, 29L), `What color is this item?` = c("red",
"blue", "red", "green", "yellow"), `What color is this item?_2` = c("blue",
"yellow", "yellow", NA, "green"), `What is the shape of this item?` = c("circle",
"circle", "square", "square", "circle"), `What is the shape of this item?_2` = c("triangle",
NA, "circle", "circle", "triangle"), size = c("small", "large",
"large", "small", "medium")), class = "data.frame", row.names = c(NA,
-5L))