I have a dataframe (table), that includes frequency counts (Freq) of 2 levels (F, I) of a categorical variable (Fert).
table[1:10]
FemID Sperm Week Fert Freq
1: 269 High 1 F 4
2: 269 High 1 I 5
3: 273 High 1 F 6
4: 274 High 1 I 1
5: 275 High 1 I 1
6: 276 High 1 I 1
7: 278 Low 1 I 1
8: 280 Low 1 I 1
9: 281 Low 1 I 1
10: 282 Low 1 I 5
I would like to convert this to a dataframe in which the two levels of Fert (I and F) are separate variables for each value of FemID, with 0 for missing counts of a level, like so:
FemID Sperm Week Fert Infert
1: 269 High 1 4 5
2: 273 High 1 6 0
3: 274 High 1 1 0
4: 275 High 1 1 0
5: 276 High 1 1 0
Thoughts or suggestions? I feel like a loop is required, but I'm not sure how to go about setting it up for this. Perhaps there are two parts, one that creates the two new variables and one that fills in the 0's?
You can use spread
in tidyr
for this:
> library(tidyr)
> df %>% spread(Fert,Freq)
FemID Sperm Week F I
1 269 High 1 4 5
2 273 High 1 6 NA
3 274 High 1 NA 1
4 275 High 1 NA 1
5 276 High 1 NA 1
6 278 Low 1 NA 1
7 280 Low 1 NA 1
8 281 Low 1 NA 1
9 282 Low 1 NA 5
You can also adjust the variable names:
> df %>% spread(Fert,Freq) %>%
setNames(c("FemID","Sperm","Week","Fert","Infert"))
FemID Sperm Week Fert Infert
1 269 High 1 4 5
2 273 High 1 6 NA
3 274 High 1 NA 1
4 275 High 1 NA 1
.... the rest is truncated
And can filter by NAs:
> df %>% spread(Fert,Freq) %>%
setNames(c("FemID","Sperm","Week","Fert","Infert")) %>%
filter(!is.na(Fert))
FemID Sperm Week Fert Infert
1 269 High 1 4 5
2 273 High 1 6 NA