Search code examples
rpivotpivot-tabledata-manipulationspread

Combining Rows in R with Pivot or Spread?


Here, I am manipulating election data, and the current data is in the following format. Both a visual and coded example are included (while visual is a bit condensed). Moreover, values have been edited from their originals.

# Representative Example
library(tidyverse)
test.df <- tibble(yr=rep(1956),mn=rep(11),
             sub=rep("Alabama"),
             unit_type=rep("County"),
             unit_name=c("Autauga","Baldwin","Barbour"),
             TotalVotes=c(1000,2000,3000),
             RepVotes=c(500,1000,1500),
             RepCandidate=rep("Eisenhower"),
             DemVotes=c(500,1000,1500),
             DemCandidate=rep("Stevenson"),
             ThirdVotes=c(0,0,0),
             ThirdCandidate=rep("Uncommitted"),
             RepVotesTotalPerc=rep(50.00),
             DemVotesTotalPerc=rep(50.00),
             ThirdVotesTotalPerc=rep(0.00)
             )
----------------------------------------------------------------------------------------------------
yr   | mn  | sub  | unit_type | unit_name | TotalVotes | RepVotes |  RepCan  | DemVotes | DemCan 
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County    Autauga        1000        500      EisenHower   500     Stevenson
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County    Baldwin        2000       1000      EisenHower   1000    Stevenson
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County    Barbour        3000       2000      EisenHower   2000    Stevenson
----------------------------------------------------------------------------------------------------

I am trying to get a table that looks like the following:

----------------------------------------------------------------------------------------------------
yr   | mn  | sub  | unit_type | unit_name |   pty_n   |   can    |   TotalVotes   | CanVotes
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County     Autauga    Republican   Eisenhower     1000          500 
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County     Autauga    Democrat     Stevenson      1000          500 
----------------------------------------------------------------------------------------------------
1956   11   Alabama  County     Autauga    Independent  Uncommitted    1000            0 
----------------------------------------------------------------------------------------------------
# and etc. for other counties in example (Baldwin, Barbour, etc)

As you can see, I pretty much want three observations per county, where candidates are all in one column, as well as their respective votes in another (CanVotes, or the like).

I have tried using things like pivot_longer() or spread(), but I am having a hard time visualizing these in code. Any help here would be greatly appreciated in sort of reorienting my data to get a candidate column, but also moving the rest of the data with it!


Solution

  • Here is a solution that first uses pivot_longer to bring the Votes into a long format. Then I use mutate with case_when to substitute the former column names with the actual candidate names and delete the single candidate columns:

    long_table <- pivot_longer(test.df,
                               cols = c(RepVotes, DemVotes, ThirdVotes),
                               names_to = "pty_n",
                               values_to = "CanVotes") %>% 
      mutate(can = case_when(
        pty_n == "RepVotes" ~ RepCandidate,
        pty_n == "DemVotes" ~ DemCandidate,
        pty_n == "ThirdVotes" ~ ThirdCandidate
      ),
      pty_n = case_when(
        pty_n == "RepVotes" ~ "Republican",
        pty_n == "DemVotes" ~ "Democrat",
        pty_n == "ThirdVotes" ~ "Independent"
      )) %>% 
      select(-c(RepCandidate, DemCandidate, ThirdCandidate))
    # A tibble: 9 x 12
         yr    mn sub     unit_type unit_name TotalVotes RepVotesTotalPerc DemVotesTotalPerc ThirdVotesTotalPe~ pty_n      CanVotes can       
      <dbl> <dbl> <chr>   <chr>     <chr>          <dbl>             <dbl>             <dbl>              <dbl> <chr>         <dbl> <chr>     
    1  1956    11 Alabama County    Autauga         1000                50                50                  0 Republican      500 Eisenhower
    2  1956    11 Alabama County    Autauga         1000                50                50                  0 Democrat        500 Stevenson 
    3  1956    11 Alabama County    Autauga         1000                50                50                  0 Independe~        0 Uncommitt~
    4  1956    11 Alabama County    Baldwin         2000                50                50                  0 Republican     1000 Eisenhower
    5  1956    11 Alabama County    Baldwin         2000                50                50                  0 Democrat       1000 Stevenson 
    6  1956    11 Alabama County    Baldwin         2000                50                50                  0 Independe~        0 Uncommitt~
    7  1956    11 Alabama County    Barbour         3000                50                50                  0 Republican     1500 Eisenhower
    8  1956    11 Alabama County    Barbour         3000                50                50                  0 Democrat       1500 Stevenson 
    9  1956    11 Alabama County    Barbour         3000                50                50                  0 Independe~        0 Uncommitt~
    

    I tried to build a custom spec, but it seems that the names have to be derived from the column names and can't be directly conditional on other columns.