Search code examples
rfor-loopdata-manipulation

Add in a level of a factor where it is missing


I have data which currently looks like this:

subID Trialtype Freq
100 WPC 10
100 BPC 20
100 BPT 15
100 WPT 16
101 WPC 9
101 BPC 7
101 WPT 10

For example, in the above table, subID is a subject ID number, Trialtype is categorical with 4 levels (WPC, BPC, BPT, and WPT), and Freq is the frequency of the trial type. During data entry, if a subject had 0 of a trial type, the category was left out. Referencing the table, sub 101 is completely missing a row for category BPC, as their trial count for BPC was 0. This is causing some issues with some analyses I am running, and I now need a version of the dataframe where, rather than missing rows, the row exists with a Freq of 0, as below:

subID Trialtype Freq
100 WPC 10
100 BPC 20
100 BPT 15
100 WPT 16
101 WPC 9
101 BPC 7
101 BPT 0
101 WPT 10

I am attempting a for-loop to accomplish this, but I am stuck with how to add these rows to the dataframe. I have so far:

for (myperson in unique(data$subID)){

  #Create a list of all trial types
  trials=c("WPC", "BPC", "BPT", "WPT")
  
  #Does this person have all trial types?
  person_trial_list=unique(data$Trialtype[data$subID==myperson])
  
  trials=person_trial_list 

  #How to act on this information?
  
}

I've seen quite a few threats on dropping factor levels/rows, but none on adding them like this in a way that I was able to understand well enough to implement. Does anyone have a solution? I am open to tidyverse/dplyr options that may work as well.


Solution

  • Suggested by user akrun in a comment, this worked well:

    library(tidyr)
    
    complete(data, subID, Trialtype, fill = list(Freq = 0))