I have a dataset that looks like:
partyid coninc
Ind,Near Dem 25926
Not Str Democrat 33333
Not Str Democrat 41667
Strong Democrat 69444
Ind,Near Dem 60185
Ind,Near Dem 50926
Ind,Near Dem 18519
Strong Democrat 3704
Strong Democrat 25926
Strong Democrat 18519
Not Str Republican 18519
Strong Democrat 18519
Not Str Democrat 18519
What I want to do is format the dataset into something like this:
partyid 0-50,000 50,000-100,000 100,000-150,000 >150,000
Strong Democrat 2344 3423 4342 54
Not Str Democrat 2643 934 ..
Ind, Near Dem 7656 343 ..
Ind, Near Rep 7655 833 ..
Not Str Republican 2443 343
Strong Republican 3444 773
i.e Sort the rows by the levels of partyid variable and the columns by the count of range of coninc variable.
A dput
of my data:
structure(list(partyid = structure(c(3L, 2L, 2L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 5L, 1L, 2L, 1L, 1L, 4L, 4L, 3L, 4L, 3L), .Label = c("Strong Democrat", "Not Str Democrat", "Ind,Near Dem", "Ind,Near Rep", "Not Str Republican", "Strong Republican"), class = "factor"), coninc = c(25926L, 33333L, 41667L, 69444L, 60185L, 50926L, 18519L, 3704L, 25926L, 18519L, 18519L, 18519L, 18519L, 25926L, 18519L, 33333L, 25926L, 60185L, 69444L, 50926L)), .Names = c("partyid", "coninc"), row.names = c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L), class = "data.frame")
You can do that quite easily with the plyr
package (as your sample data are a bit hard to read, I deleted the commas and spaces in partyid
):
# creating sample data
dat <- structure(list(partyid = structure(c(3L, 2L, 2L, 1L, 3L, 3L, 3L, 1L, 1L, 1L, 5L, 1L, 2L, 1L, 1L, 4L, 4L, 3L, 4L, 3L), .Label = c("Strong Democrat", "Not Str Democrat", "Ind,Near Dem", "Ind,Near Rep", "Not Str Republican", "Strong Republican"), class = "factor"), coninc = c(25926L, 33333L, 41667L, 69444L, 60185L, 50926L, 18519L, 3704L, 25926L, 18519L, 18519L, 18519L, 18519L, 25926L, 18519L, 33333L, 25926L, 60185L, 69444L, 50926L)), .Names = c("partyid", "coninc"), row.names = c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L), class = "data.frame")
# summarising the data with plyr
require(plyr)
dat2 <- ddply(dat, .(partyid), summarise,
zero = sum(coninc < 50001),
fifty = sum(coninc > 50000 & coninc < 100001),
hundred = sum(coninc > 100000 & coninc < 150001),
hfifty = sum(coninc > 150000))
This results in the following output:
dat2 <- structure(list(partyid = structure(1:5, .Label = c("Strong Democrat", "Not Str Democrat", "Ind,Near Dem", "Ind,Near Rep", "Not Str Republican", "Strong Republican"), class = "factor"), zero = c(6L, 3L, 2L, 2L, 1L), fifty = c(1L, 0L, 4L, 1L, 0L), hundred = c(0L, 0L, 0L, 0L, 0L), hfifty = c(0L, 0L, 0L, 0L, 0L)), .Names = c("partyid", "zero", "fifty", "hundred", "hfifty"), row.names = c(NA, -5L), class = "data.frame")