Here, I am manipulating election data, and the current data is in the following format. Both a visual and coded example are included (while visual is a bit condensed). Moreover, values have been edited from their originals.
# Representative Example
library(tidyverse)
test.df <- tibble(yr=rep(1956),mn=rep(11),
sub=rep("Alabama"),
unit_type=rep("County"),
unit_name=c("Autauga","Baldwin","Barbour"),
TotalVotes=c(1000,2000,3000),
RepVotes=c(500,1000,1500),
RepCandidate=rep("Eisenhower"),
DemVotes=c(500,1000,1500),
DemCandidate=rep("Stevenson"),
ThirdVotes=c(0,0,0),
ThirdCandidate=rep("Uncommitted"),
RepVotesTotalPerc=rep(50.00),
DemVotesTotalPerc=rep(50.00),
ThirdVotesTotalPerc=rep(0.00)
)
----------------------------------------------------------------------------------------------------
yr | mn | sub | unit_type | unit_name | TotalVotes | RepVotes | RepCan | DemVotes | DemCan
----------------------------------------------------------------------------------------------------
1956 11 Alabama County Autauga 1000 500 EisenHower 500 Stevenson
----------------------------------------------------------------------------------------------------
1956 11 Alabama County Baldwin 2000 1000 EisenHower 1000 Stevenson
----------------------------------------------------------------------------------------------------
1956 11 Alabama County Barbour 3000 2000 EisenHower 2000 Stevenson
----------------------------------------------------------------------------------------------------
I am trying to get a table that looks like the following:
----------------------------------------------------------------------------------------------------
yr | mn | sub | unit_type | unit_name | pty_n | can | TotalVotes | CanVotes
----------------------------------------------------------------------------------------------------
1956 11 Alabama County Autauga Republican Eisenhower 1000 500
----------------------------------------------------------------------------------------------------
1956 11 Alabama County Autauga Democrat Stevenson 1000 500
----------------------------------------------------------------------------------------------------
1956 11 Alabama County Autauga Independent Uncommitted 1000 0
----------------------------------------------------------------------------------------------------
# and etc. for other counties in example (Baldwin, Barbour, etc)
As you can see, I pretty much want three observations per county, where candidates are all in one column, as well as their respective votes in another (CanVotes
, or the like).
I have tried using things like pivot_longer()
or spread()
, but I am having a hard time visualizing these in code. Any help here would be greatly appreciated in sort of reorienting my data to get a candidate column, but also moving the rest of the data with it!
Here is a solution that first uses pivot_longer
to bring the Votes into a long format. Then I use mutate
with case_when
to substitute the former column names with the actual candidate names and delete the single candidate columns:
long_table <- pivot_longer(test.df,
cols = c(RepVotes, DemVotes, ThirdVotes),
names_to = "pty_n",
values_to = "CanVotes") %>%
mutate(can = case_when(
pty_n == "RepVotes" ~ RepCandidate,
pty_n == "DemVotes" ~ DemCandidate,
pty_n == "ThirdVotes" ~ ThirdCandidate
),
pty_n = case_when(
pty_n == "RepVotes" ~ "Republican",
pty_n == "DemVotes" ~ "Democrat",
pty_n == "ThirdVotes" ~ "Independent"
)) %>%
select(-c(RepCandidate, DemCandidate, ThirdCandidate))
# A tibble: 9 x 12
yr mn sub unit_type unit_name TotalVotes RepVotesTotalPerc DemVotesTotalPerc ThirdVotesTotalPe~ pty_n CanVotes can
<dbl> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
1 1956 11 Alabama County Autauga 1000 50 50 0 Republican 500 Eisenhower
2 1956 11 Alabama County Autauga 1000 50 50 0 Democrat 500 Stevenson
3 1956 11 Alabama County Autauga 1000 50 50 0 Independe~ 0 Uncommitt~
4 1956 11 Alabama County Baldwin 2000 50 50 0 Republican 1000 Eisenhower
5 1956 11 Alabama County Baldwin 2000 50 50 0 Democrat 1000 Stevenson
6 1956 11 Alabama County Baldwin 2000 50 50 0 Independe~ 0 Uncommitt~
7 1956 11 Alabama County Barbour 3000 50 50 0 Republican 1500 Eisenhower
8 1956 11 Alabama County Barbour 3000 50 50 0 Democrat 1500 Stevenson
9 1956 11 Alabama County Barbour 3000 50 50 0 Independe~ 0 Uncommitt~
I tried to build a custom spec
, but it seems that the names have to be derived from the column names and can't be directly conditional on other columns.