Search code examples
rdataframesummarydata-wrangling

How to collapse dataset and create new variables?


I have a dataset with multiple rows of data for each subject. This dataset is very long, and each person has dozens of rows of each session they are in. I want to collapse this dataset into a more manageable format. The current dataset looks something like this:

name session personal vote party vote Voted with party?
John 114 yea nay no
John 114 yea yea yes
John 114 nay nay yes
John 115 yea yea yes
John 116 yea nay no
John 116 yea yea yes
Tim 114 nay yea no
Tim 115 nay yea no
Tim 115 nay nay yes
Tim 116 yea nay no

I want to use this data to create a new dataframe where each person gets one row for each session. In this new dataset I want to create two new variables. The first, votecount, is just a raw count of the number of votes that person had in that session. The second new variable, party loyalty, should indicate what proportion of the person's votes were aligned with the how the party voted. It would look something like this:

name session votecount party loyalty
John 114 3 .66
John 115 1 1
John 116 2 .5
Tim 114 1 0
Tim 115 2 .5
Tim 116 1 0

Any help would be greatly appreciated!


Solution

  • With dplyr, use n to count the number of votes, and mean to get the loyalty measures, grouped by name and session:

    library(dplyr)
    df |> 
      summarise(votecount = n(),
                party_loyalty = mean(`Voted with party?` == "yes"), 
                .by = c(name, session))
    
    #   name session votecount party_loyalty
    # 1 John     114         3     0.6666667
    # 2 John     115         1     1.0000000
    # 3 John     116         2     0.5000000
    # 4  Tim     114         1     0.0000000
    # 5  Tim     115         2     0.5000000
    # 6  Tim     116         1     0.0000000
    

    reproducible data:

    df <- read.table(h=T,text="name session 'personal vote' 'party vote'    'Voted with party?'
    John    114 yea nay no
    John    114 yea yea yes
    John    114 nay nay yes
    John    115 yea yea yes
    John    116 yea nay no
    John    116 yea yea yes
    Tim 114 nay yea no
    Tim 115 nay yea no
    Tim 115 nay nay yes
    Tim 116 yea nay no", check.names = FALSE)