I have conducted a large survey (consisting of 42 subsurveys for different treatments) and have trouble getting my data in shape.
I have ~ 16 000 answers, each answer (i.e. what is an alternative use for a newspaper) is a cell in a data frame. These answers are in the form of Data 1 (below).
Depending on how many times an answer is given, it is worth 0-6 points (the more points, the less people have thought of it, the more creative the answer). This list is identical in form to Data 2 (listed below).
Now I want to sum each row of each of the 42 surveys (=participant) according to the breakdown in Data 2. This score should be an extra column in the dataframe called "score".
Simple example:
Participant 1 answers: "schuhe", "basteln", ... => score = 1 + 0 points = 1
Participant 2 answers: "brennmaterial", "schiff", ... => score = 1 + 1 points = 2
So the code should do:
Data 1: The answers given in 1 of 42 surveys (snippet):
structure(list(id = c("1", "2", "3", "4", "7"), kreazeitung_SQ001 = c("fensterglasersatz",
"dämmmaterial", "klopapier", NA, NA), kreazeitung_SQ002 = c("einwickeln",
"brennmaterial", "feueranzünder", "putzlappen", "schlagen"),
kreazeitung_SQ003 = c("mülleimer", "flieger", "brennmaterial",
"brennmaterial", "abdecken"), kreazeitung_SQ004 = c("schuhe",
"regenschirm", "basteln", "pappmaschee", "unterlage")), class = "data.frame", row.names = c(NA,
-5L))
Data 2: Here are the points that each answer is worth, e.g. if in data 1 the answer is "hut", the column "points" in this data tell me that it is worth 0 points, if it is "schuhe", it should be scored as 1.
structure(list(Var1 = c("basteln", "einwickeln", "abdecken",
"falten", "schlagen", "feueranzünder", "hut", "unterlage", "collage",
"fliegenklatsche", "geschenkpapier", "pappmaschee", "zerreißen",
"brennmaterial", "schiff", "schuhe"), Freq = c(57L, 55L, 46L,
45L, 43L, 42L, 42L, 42L, 41L, 41L, 41L, 41L, 40L, 39L, 39L, 39L
), points = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1)), row.names = c(9L,
30L, 1L, 42L, 151L, 47L, 81L, 192L, 20L, 53L, 67L, 126L, 211L,
16L, 150L, 156L), class = "data.frame")
I am happy about all solutions, either base or tidyverse. Unfortunately, this code complexity is over my head, so I would be thrilled about any help!! Thank you!
if I've understood, then the values in d2$Var1
match up to all the column Q's in d1
. If so I think this will work, where d1 is data 1 and d2 is data2
# using data.table package for operations
library(data.table)
d1 <- as.data.table(d1)
d2 <- as.data.table(d2)
# convert from wide format to long
d1_long <- melt(d1, id.vars = "id")
# then can use merge operations to pull the points across
d1_long <- merge(d1_long, d2, by.x="value", by.y="Var1", all.x=TRUE)
# lots of missing values in the e.g., so filled with 0
d1_long[is.na(points), points := 0]
# aggregate the scores, by id
scores <- d1_long[, .(score=sum(points)), by=id]
# add them back in to the original data, sort=FALSE preserves order
d1 <- merge(d1, scores, by="id", sort=FALSE)
d1
id kreazeitung_SQ001 kreazeitung_SQ002 kreazeitung_SQ003
1: 1 fensterglasersatz einwickeln mülleimer
2: 2 dämmmaterial brennmaterial flieger
3: 3 klopapier feueranzünder brennmaterial
4: 4 <NA> putzlappen brennmaterial
5: 7 <NA> schlagen abdecken
kreazeitung_SQ004 score
1: schuhe 1
2: regenschirm 1
3: basteln 1
4: pappmaschee 1
5: unterlage 0
# to convert back to data.frame
d1.df <- as.data.frame(d1)