Search code examples
rdataframelevels

Converting counts of levels to separate variables, including 0s for missing counts


I have a dataframe (table), that includes frequency counts (Freq) of 2 levels (F, I) of a categorical variable (Fert).

table[1:10]

    FemID Sperm  Week Fert Freq
1:   269  High    1    F    4
2:   269  High    1    I    5
3:   273  High    1    F    6
4:   274  High    1    I    1
5:   275  High    1    I    1
6:   276  High    1    I    1
7:   278   Low    1    I    1
8:   280   Low    1    I    1
9:   281   Low    1    I    1
10:   282   Low    1    I    5

I would like to convert this to a dataframe in which the two levels of Fert (I and F) are separate variables for each value of FemID, with 0 for missing counts of a level, like so:

    FemID Sperm  Week Fert Infert
1:   269  High    1    4    5
2:   273  High    1    6    0
3:   274  High    1    1    0
4:   275  High    1    1    0
5:   276  High    1    1    0

Thoughts or suggestions? I feel like a loop is required, but I'm not sure how to go about setting it up for this. Perhaps there are two parts, one that creates the two new variables and one that fills in the 0's?


Solution

  • You can use spread in tidyr for this:

    > library(tidyr)
    > df %>% spread(Fert,Freq)
      FemID Sperm Week  F  I
    1   269  High    1  4  5
    2   273  High    1  6 NA
    3   274  High    1 NA  1
    4   275  High    1 NA  1
    5   276  High    1 NA  1
    6   278   Low    1 NA  1
    7   280   Low    1 NA  1
    8   281   Low    1 NA  1
    9   282   Low    1 NA  5
    

    You can also adjust the variable names:

    > df %>% spread(Fert,Freq) %>% 
          setNames(c("FemID","Sperm","Week","Fert","Infert"))
      FemID Sperm Week Fert Infert
    1   269  High    1    4      5
    2   273  High    1    6     NA
    3   274  High    1   NA      1
    4   275  High    1   NA      1
    .... the rest is truncated
    

    And can filter by NAs:

    > df %>% spread(Fert,Freq) %>% 
        setNames(c("FemID","Sperm","Week","Fert","Infert")) %>% 
        filter(!is.na(Fert))
      FemID Sperm Week Fert Infert
    1   269  High    1    4      5
    2   273  High    1    6     NA