Search code examples
rif-statementconditional-statementsmutate

Using group_by() to collapse a dataset in R based on conditions


I am trying to collapse a dataset based on conditions and groupings from another dataset. My current dataframe looks like this

For every 'RollNo' in every 'congress' I want a new variable indicating if the two senators in the same state voted together (1,0) and are in the same party (1,0)

congress sen RollNo state Vote Party
106 Jay 1 Ark 1 Rep
106 Mary 1 Ark 1 Dem
106 Bill 2 Ten 2 Dem
106 Kevin 2 Ten 1 Dem
108 Sue 1 Ore 2 Rep
108 Sally 1 Ore 2 Rep
108 Lisa 3 SDak 1 Rep
108 Penny 3 SDak 2 Rep
109 Jay 1 Mich 1 Dem
109 Mary 1 Mich 9 Rep
109 Rudy 5 Cal 1 Dem
109 Niles 5 Cal 1 Dem

The new dataframe should look like this:

congress RollNo state Pair_Vote Pair_Party
106 1 Ark 1 0
106 2 Ten 0 1
108 1 Ore 1 1
108 3 SDak 0 1
109 1 Mich 0 0
109 5 Cal 1 1

I have tried the code below, tweaked it several times. My my dataset returns with the same observation and two new columns empty vectors for my new variables.

library(dplyr)
dataframe['Pair_Vote'] <- NA
dataframe['Pair_Party'] <- NA
newdata <- dataframe %>% group_by(congress, RollNo, state) %>% 
  mutate(Pair_Vote - case_when(any(Vote == Vote) ~ 1, FALSE ~ 0))

I'm at a loss.


Solution

  • Use mutate when you want to add columns to a data frame as-is, use summarize when you want the result to have one row per group. Your output has one row per group, so we will use summarize.

    And vote == vote won't do much useful, let's use n_distinct to count distinct values.

    dataframe %>% 
      group_by(congress, RollNo, state) %>% 
      summarize(
        Pair_Vote = ifelse(n_distinct(Vote) == 1, 1, 0),
        Pair_Party = ifelse(n_distinct(Party) == 1, 1, 0)
      )