I have a dataset with multiple rows of data for each subject. This dataset is very long, and each person has dozens of rows of each session they are in. I want to collapse this dataset into a more manageable format. The current dataset looks something like this:
name | session | personal vote | party vote | Voted with party? |
---|---|---|---|---|
John | 114 | yea | nay | no |
John | 114 | yea | yea | yes |
John | 114 | nay | nay | yes |
John | 115 | yea | yea | yes |
John | 116 | yea | nay | no |
John | 116 | yea | yea | yes |
Tim | 114 | nay | yea | no |
Tim | 115 | nay | yea | no |
Tim | 115 | nay | nay | yes |
Tim | 116 | yea | nay | no |
I want to use this data to create a new dataframe where each person gets one row for each session. In this new dataset I want to create two new variables. The first, votecount
, is just a raw count of the number of votes that person had in that session. The second new variable, party loyalty
, should indicate what proportion of the person's votes were aligned with the how the party voted. It would look something like this:
name | session | votecount | party loyalty |
---|---|---|---|
John | 114 | 3 | .66 |
John | 115 | 1 | 1 |
John | 116 | 2 | .5 |
Tim | 114 | 1 | 0 |
Tim | 115 | 2 | .5 |
Tim | 116 | 1 | 0 |
Any help would be greatly appreciated!
With dplyr
, use n
to count the number of votes, and mean
to get the loyalty measures, grouped by name and session:
library(dplyr)
df |>
summarise(votecount = n(),
party_loyalty = mean(`Voted with party?` == "yes"),
.by = c(name, session))
# name session votecount party_loyalty
# 1 John 114 3 0.6666667
# 2 John 115 1 1.0000000
# 3 John 116 2 0.5000000
# 4 Tim 114 1 0.0000000
# 5 Tim 115 2 0.5000000
# 6 Tim 116 1 0.0000000
reproducible data:
df <- read.table(h=T,text="name session 'personal vote' 'party vote' 'Voted with party?'
John 114 yea nay no
John 114 yea yea yes
John 114 nay nay yes
John 115 yea yea yes
John 116 yea nay no
John 116 yea yea yes
Tim 114 nay yea no
Tim 115 nay yea no
Tim 115 nay nay yes
Tim 116 yea nay no", check.names = FALSE)