Say that there is descriptive data on candidates across election years, districts (or states), and party. The data are currently dis-aggregated at the 'sub-district' level (say, voting precincts).
Currently, when I try to aggregate the data to the district-level the various methods return counts that are inaccurate. In other words, the aggregation is not adequately taking into account that the candidates appear in the data multiple times per year, per district. What I need is an aggregate count of the number of times a particular party appear in a particular district, regardless of the repeated/duplicated information at the precinct level. In other words, I need a result that shows the party count for the district-year dyad for each unique candidate-year dyad. (Note: candidates may be repeated across election-years and/or districts, but may have different parties; Henry Clay in 1836 and 1840).
My question is: How do I aggregate data to obtain a count of a factor (party) at each level of another factor (district) by grouping two other factors (year and candidate-name [ID])?
year<-rbind("1836", "1836", "1836", "1836",
"1840", "1840", "1840", "1840",
"1844", "1844", "1844", "1844",
"1848", "1848", "1848", "1848")
candidate<-rbind("Henry Clay", "Henry Clay",
"Daniel Webster",
"Daniel Webster", "Henry Clay",
"Henry Clay", "Daniel Webster",
"Daniel Webster",
"Millard Fillmore",
"Millard Fillmore",
"Martin Van Buren",
"Martin Van Buren",
"Millard Fillmore",
"Millard Fillmore",
"Martin Van Buren",
"Martin Van Buren")
party<-rbind("Democratic-Republican",
"Democratic-Republican", "Whig",
"Whig", "National Republican",
"National Republican", "Whig",
"Whig", "Know-Nothing",
"Know-Nothing", "Democrat",
"Democrat", "Know-Nothing",
"Know-Nothing", "Democrat",
"Democrat")
district<-rbind("Alaska", "Alaska", "Vermont",
"Vermont", "Alaska", "Alaska",
"Vermont", "Vermont", "Alaska",
"Alaska", "Vermont", "Vermont",
"Alaska", "Alaska", "Vermont",
"Vermont")
precinct<-rbind("Pre1", "Pre2", "Pre1", "Pre2",
"Pre1", "Pre2", "Pre1", "Pre2",
"Pre1", "Pre2", "Pre1", "Pre2",
"Pre1", "Pre2", "Pre1", "Pre2")
sample<-as.data.frame(cbind(year, candidate, party, district,
precinct))
Examples of Different Methods of Aggregating Data:
party.counts1<-data.frame(table(sample$V3, sample$V1, sample$V4))
Attempt 2a is close to final result needed, but returns counts that do not specify factor-level (party) and are still 'over-counting' party-district data based on precinct-level appearance of the party-candidate in a given year.
party.counts2<-aggregate(sample$V3, by=list(sample$V4, sample$V1), FUN=length)
party.counts2a<-aggregate(sample$V3~sample$V1:sample$V4:sample$V2, data=sample, FUN=length)
Reshape example displays similar problem as previous aggregate 2a attempt.
library(reshape2)
mdata <- melt(sample, id.vars=c("V1", "V2", "V4", "V5"), measure.vars=c("V3"))
party.counts3<-dcast(mdata, value~V1:V2:V4, length)
Again, my question is: How do I aggregate data to obtain a count of a factor (party) at each level of another factor (district) by grouping two other factors (year and candidate-name [ID])?
So far, this is a solution but it is not very tidy. For instance, the count variable that is constructed is mis-labeled in the final object as the omitted variable in the aggregation command (here; V2). Also, the result is contained in a separate object (party.counts) rather than merged with the original data (object labelled sample, above).
cross.tab<-unique(sample[c("V3", "V4", "V1", "V2")])
party.counts<-aggregate(. ~ V3:V4:V1, cross.tab, length)
Any assistance or advice for generalizability and/or vectorization as well as ease of incorporation into the prior (original) data structure is appreciated.