I have df1
that looks like this:
state | zip code | type |
---|---|---|
NC | 00001 | person |
NC | 00001 | property |
NC | 00000 | person |
NC | 00001 | property |
And I want to create df2
where each zip code is a single observation, with columns counting the number of times each 'type' appears across each zip code from df1
:
zip code | person | property |
---|---|---|
00000 | 250 | 45 |
00001 | 115 | 150 |
I created df2
so far which is a list of all unique zip codes, and tried the following method, which resulted in the column 'person' being added but every value is the same number, being the sum of each row where "person" appears across the entire dataset of df1
.
df2 <- mutate(df2, person = sum(df1$type == "person"))
The end goal is to use a for loop to create multiple additional columns for each "type". I tried the previous method as well as the following method:
for(i in nrow(df2)){
df2 <- mutate(df2, person = sum(df1[i,"type"] == "person"))
}
But now, I get the column added with zeroes for every row.
Am I going about this the right way? How can I fix this?
You can use dplyr::count
followed by tidyr::pivot_wider
:
library(dplyr)
library(tidyr)
df %>%
group_by(zip_code, type) %>%
count() %>%
pivot_wider(names_from = type, values_from = n)
Output:
zip_code person property
<int> <int> <int>
1 12345 1 2
2 54321 1 NA
Data (note I changed your zip codes just for ease of identification)
df <- read.table(text = "state zip_code type
NC 12345 person
NC 12345 property
NC 54321 person
NC 12345 property", h = TRUE)