In short, I kind of want to do the reverse process of model.matrix()
. I have the following data obtained from a survey.
Question 1: pick one from 'Cat', 'Dog' or 'Sheep'
Question 2: pick a random number
Question 3: pick one from 'Big' or 'Small'
The results are put in a dataframe in the following way:
id ans_1 ans_2 ans_3 num size_1 size_2
1 Cat 0 0 0.76 0 Small
2 0 Dog 0 0.44 0 Small
3 0 0 Sheep 0.52 Big 0
4 0 0 Sheep 0.52 Big 0
5 0 Dog 0 0.59 0 Small
6 Cat 0 0 0.97 0 Small
7 0 Dog 0 0.5 0 Small
8 0 Dog 0 0.19 0 Small
9 0 0 Sheep 0.01 Big 0
10 Cat 0 0 0.24 0 Small
What I want to do is to combine the answers of every question in a single column. In this example, ans_1, ans_2 and ans_3 are from the same question and size_1 and size_2 as well. Combining them, the results would look like:
id ans num size
1 Cat 0.76 Small
2 Dog 0.44 Small
3 Sheep 0.52 Big
4 Sheep 0.52 Big
5 Dog 0.59 Small
6 Cat 0.97 Small
7 Dog 0.5 Small
8 Dog 0.19 Small
9 Sheep 0.01 Big
10 Cat 0.24 Small
I have multiple datasets, each of which contains around 100 columns. This makes it too much work to do by hand. Note that there are also columns like 'num', which stand by themselves. The answers to the same question are always right next to each other like in this example.
Thank you!
Data
id = 1:10
ans_1 = c('Cat', 0, 0, 0, 0, 'Cat', 0, 0, 0, 'Cat')
ans_2 = c(0, 'Dog', 0, 0, 'Dog', 0, 'Dog', 'Dog', 0, 0)
ans_3 = c(0, 0, 'Sheep', 'Sheep', 0, 0, 0, 0, 'Sheep', 0)
num = round(runif(10),2)
size_1 = c(0, 0, 'Big', 'Big', 0, 0, 0, 0, 'Big', 0)
size_2 = c('Small', 'Small', 0, 0, 'Small', 'Small', 'Small', 'Small', 0,
'Small')
data <- noquote(cbind(id, ans_1, ans_2, ans_3, num, size_1, size_2))
You might like to use unite()
from tidyr, and then mutate_if()
from dplyr:
library(dplyr)
library(tidyr)
library(stringr)
data <- data_frame(id, ans_1, ans_2, ans_3, num, size_1, size_2)
data %>%
unite(ans, ans_1, ans_2, ans_3) %>%
unite(size, size_1, size_2) %>%
mutate_if(is.character, str_extract, "[a-zA-Z]+")
#> # A tibble: 10 × 4
#> id ans num size
#> <int> <chr> <dbl> <chr>
#> 1 1 Cat 0.92 Small
#> 2 2 Dog 0.79 Small
#> 3 3 Sheep 0.44 Big
#> 4 4 Sheep 0.67 Big
#> 5 5 Dog 0.00 Small
#> 6 6 Cat 0.61 Small
#> 7 7 Dog 0.67 Small
#> 8 8 Dog 0.95 Small
#> 9 9 Sheep 0.18 Big
#> 10 10 Cat 0.76 Small