I have a survey form and I need to group this dataset to a single row, but I have some problems with the use of spread and group.
My dataset has the next format: data
country date_ user_id int_id user_name ext_name q_order questions answers
AR 2019 AR-100 XP200 jhon foo damian, khon 1 Question1 … yes
AR 2019 AR-100 XP200 jhon foo damian, khon 2 Question2 … 0
AR 2019 AR-100 XP200 jhon foo damian, khon 3 Question3 … no apply
AR 2019 AR-100 XP200 jhon foo damian, khon 4 Question4 … 0
AR 2019 AR-100 XP200 jhon foo damian, khon 5 Question5 … 0
AR 2019 AR-100 XP200 jhon foo damian, khon 6 Question6 … yes
US 2018 US-100 PP300 Peter fields jhon voigh 1 Question1 … no
US 2018 US-100 PP300 Peter fields jhon voigh 2 Question2 … 0
US 2018 US-100 PP300 Peter fields jhon voigh 3 Question3 … yes apply
US 2018 US-100 PP300 Peter fields jhon voigh 4 Question4 … 0
US 2018 US-100 PP300 Peter fields jhon voigh 5 Question5 … 0
US 2018 US-100 PP300 Peter fields jhon voigh 6 Question6 … no
I tried to group the resulting dataset, but always get 14 rows instead of 2.
Code:
data %>%
group_by(country=.$country ,
date_ = .$date_,
medic_id=.$user_id,
user_id= .$int_id,
user_name= .$user_name,
ext_name= .$ext_name,
q_order=.$q_order
) %>%
spread(questions, answers)
The code above , give me an out of memory.
I even tried with dcast
data %>%
select(-q_order) %>%
dcast( ... ~ questions, value.var = "answers")
And i get the following:
Country.Code Created.Date user_id int_id user_name ext_name Question1 … Question2 … Question3 … Question4 … Question5 … Question6 …
AR 3/28/2019 AR-100 XP200 jhon foo damian, khon 1 2 0 1 1 1
US 4/28/2019 US-100 PP300 Peter fields jhon voigh 0 1 1 2 1 2
but i need :
Country.Code Created.Date user_id int_id user_name ext_name Question1 … Question2 … Question3 … Question4 … Question5 … Question6 …
AR 3/28/2019 AR-100 XP200 jhon foo damian, khon yes 0 no apply 0 0 yes
US 4/28/2019 US-100 PP300 Peter fields jhon voigh no 0 yes apply 0 0 no
Why dcast convert to numerical al the values from answers variable? (I even tried with var.values='answers')?
My question is very similar to this link!
But I cant make it run, always give out out memory or generates with numerical values instead of the values from answers variable.
I finally found the answer!
The problem was (that im newby in R), that i want to have the values of some columns in rows , but, this values are characters and mostly of solutions handle numerical instead of characters!
At the other hand, my solution (example with 5 rows) works greats with RESHAPE!, but with a (small --medium) real dataset i get an out of memory (never end).
For example the next code never end (and yes, i tried with group too, like i said)
b<-reshape(data=a %>% select(-q_order) ,
direction="wide",
idvar = c("Country.Code","Created.Date", "user_id", "int_id", "user_name",
"ext_name"),
timevar="questions" )
This solution run in 2 seconds:
b<-dcast( a, Country.Code+Created.Date+user_id+int_id +user_name+ ext_name ~ questions,
toString, value.var="answers")
Finally
Country.Code Created.Date user_id int_id user_name ext_name Question1 … Question2 … Question3 … Question4 … Question5 … Question6 …
AR 3/28/2019 AR-100 XP200 jhon foo damian, khon yes 0 no apply 0 0 yes
US 4/28/2019 US-100 PP300 Peter fields jhon voigh no 0 yes apply 0 0 no