I have data which currently looks like this:
subID | Trialtype | Freq |
---|---|---|
100 | WPC | 10 |
100 | BPC | 20 |
100 | BPT | 15 |
100 | WPT | 16 |
101 | WPC | 9 |
101 | BPC | 7 |
101 | WPT | 10 |
For example, in the above table, subID is a subject ID number, Trialtype is categorical with 4 levels (WPC, BPC, BPT, and WPT), and Freq is the frequency of the trial type. During data entry, if a subject had 0 of a trial type, the category was left out. Referencing the table, sub 101 is completely missing a row for category BPC, as their trial count for BPC was 0. This is causing some issues with some analyses I am running, and I now need a version of the dataframe where, rather than missing rows, the row exists with a Freq of 0, as below:
subID | Trialtype | Freq |
---|---|---|
100 | WPC | 10 |
100 | BPC | 20 |
100 | BPT | 15 |
100 | WPT | 16 |
101 | WPC | 9 |
101 | BPC | 7 |
101 | BPT | 0 |
101 | WPT | 10 |
I am attempting a for-loop to accomplish this, but I am stuck with how to add these rows to the dataframe. I have so far:
for (myperson in unique(data$subID)){
#Create a list of all trial types
trials=c("WPC", "BPC", "BPT", "WPT")
#Does this person have all trial types?
person_trial_list=unique(data$Trialtype[data$subID==myperson])
trials=person_trial_list
#How to act on this information?
}
I've seen quite a few threats on dropping factor levels/rows, but none on adding them like this in a way that I was able to understand well enough to implement. Does anyone have a solution? I am open to tidyverse/dplyr options that may work as well.
Suggested by user akrun in a comment, this worked well:
library(tidyr)
complete(data, subID, Trialtype, fill = list(Freq = 0))