Search code examples
rmutate

Mutate one df based on the sum of a column in another df matching criteria


I have df1 that looks like this:

state zip code type
NC 00001 person
NC 00001 property
NC 00000 person
NC 00001 property

And I want to create df2 where each zip code is a single observation, with columns counting the number of times each 'type' appears across each zip code from df1:

zip code person property
00000 250 45
00001 115 150

I created df2 so far which is a list of all unique zip codes, and tried the following method, which resulted in the column 'person' being added but every value is the same number, being the sum of each row where "person" appears across the entire dataset of df1.

df2 <- mutate(df2, person = sum(df1$type == "person"))

The end goal is to use a for loop to create multiple additional columns for each "type". I tried the previous method as well as the following method:

for(i in nrow(df2)){
df2 <- mutate(df2, person = sum(df1[i,"type"] == "person"))
}

But now, I get the column added with zeroes for every row.

Am I going about this the right way? How can I fix this?


Solution

  • You can use dplyr::count followed by tidyr::pivot_wider:

    library(dplyr)
    library(tidyr)
    
    df %>% 
      group_by(zip_code, type) %>%
      count() %>%
      pivot_wider(names_from = type, values_from = n)
    

    Output:

      zip_code person property
         <int>  <int>    <int>
    1    12345      1        2
    2    54321      1       NA
    

    Data (note I changed your zip codes just for ease of identification)

    df <- read.table(text = "state  zip_code    type
    NC  12345   person
    NC  12345   property
    NC  54321   person
    NC  12345   property", h = TRUE)