I have no doubt this question has been asked before, but I cannot for the life of me figure out how to word it in a way that I can find the response.
I have the following data coming in from a .csv
1 Q1. Do you run on trails? NA NA
2 YES 97.17% 2507
3 NO 2.83% 73
4 Q2. Do you participate in organized trail work, maintenance, building, or cleaning up trails (ie: plogging)? NA NA
5 YES 49.88% 1283
6 NO 50.12% 1289
The questions and possible responses aren't all the same, so the workflow I imagine is:
Ideally, the end result would be:
Q1... YES 10% 435
Q1... NO 90% 783
Q2... YES 10% 435
Q2... NO 90% 783
Sorry I had to edit, I finally got it
Save your sheet as csv using , as separator and ' as a string delimiter.
run this code
Please communicate any concern or doubt. Notice that I read the file as text using readLines()
and then use the colon char to break them, except in the question, where I use the string delimiter. It is dirty but it workts.
Best
JA
library(data.table)
library(stringr)
dat <- readLines("~/Documents/test.SO/test1.csv")
qlines <- grep("Q[0-9]\\.", dat)
all.questions <- list()
i <- 1
-Now here is the sweet stuff: by steps:
dat[q]
since we are looping q, because we already knew the lines that are questions. Remember, from this line to the next all lines are answers, unless this is the last question then from this line to the last all lines are answers, that's why the if is there. The sub is just extracting between the field delimiters you used to store, i.e. 'unlist(str_split(dat[a], ","))
we break the line into a character vector each ",", which is the field delimiter. then we have a character vector that we know it contains ordered pieces of info as stated above. From here we do ans.dat[1]
we know is the answer itself, then the next element is the percent and so on. we are doing the percent <- ans.dat[2]
thing, assigning to a variable just slowly extracting the information from that text line so at the end we can construct a table with the elements like we like it.Internal cycle will exhaust answers for this question external cycle will exhaust questions for the text.
Side note, I you can eliminate the remaining colons with by adding a second sub: question <- gsub("( |,)$", "", question)
after the internal loop closes.
for(q in qlines){
question <- sub(".*'([^']*)'.*", "\\1", dat[q]) #S1
if(which(q==qlines) == length(qlines)){
ans.lines <- (q+1):length(dat)
}else{
ans.lines <- (q+1) : (qlines[which(qlines==q)+1] - 1)
}
all.answers <- data.table()
for(a in ans.lines){
ans.dat <- unlist(str_split(dat[a], ",")) #S2
ans <- ans.dat[1]
percent <- ans.dat[2]
responders <- ans.dat[3]
ans.row <- data.table("ans"=ans, "percent"=percent, "responders"=responders) #S3
all.answers <- rbind(all.answers, ans.row)
}
all.questions[[i]] <- question.table <- cbind(question, all.answers)
i <- i+1
}
all.questions
[[1]]
question ans percent responders
1: Q1. Do you run on trails? ,, YES 50 100
2: Q1. Do you run on trails? ,, NO 50 100
[[2]]
question ans percent responders
1: Q2. Do you participate in organized trail work, maintenance, building, or cleaning up trails (ie: plogging)? YES 50 100
2: Q2. Do you participate in organized trail work, maintenance, building, or cleaning up trails (ie: plogging)? NO 50 100
[[3]]
question ans percent responders
1: Q3. What is your gender,, MALE 50 100
2: Q3. What is your gender,, FEMALE 49 99
3: Q3. What is your gender,, OTHER 1 1