Search code examples
rstringdplyrmergecharacter

I need help merging two rows based on certain string character, the string is complaint


I am trying to calculate the fraction of the construction noise per zip code across NY city. The data is from NYC 311.

I am using dplyr and have grouped the data per zip. However, I am finding difficulties merging the row for the complain column, I have to merge the data as per the string "construction" it appear anywhere meaning middle, front or end.

My solution, this is just the beginning

comp_types <- df %>% select(complaint_type,descriptor,incident_zip) %>% 
    group_by(incident_zip) 

can you help me merge the row if unique value in descriptor contains any construction value.


Solution

  • Can you clarify what you mean by "merging"? I don't think you actually want to merge because you only have one dataframe. The term "merging" is used to describe the joining of two dataframes.

    See ?base::merge:

    Merge two data frames by common columns or row names, or do other versions of database join operations.

    If I understand correctly, you want to look into the descriptor variable and see if it contains the string "construction" anywhere in the cell, so you can determine if the person's complaint was construction-related; same for "music". I don't believe you need to use complaint_type since complaint_type never contains the string "construction" or "music"; only descriptor does.

    You can use a combination of ifelse and grepl to create a new variable that indicates whether the complaint was construction-related, music-related, or other.

    library(tidyverse)
    library(janitor)
    url <- "https://data.cityofnewyork.us/api/views/p5f6-bkga/rows.csv"
    df <- read.csv(url, nrows = 10000) %>%
      clean_names() %>%
      select(complaint_type, descriptor, incident_zip)
    
    comp_types <- df %>% 
      select(complaint_type, descriptor, incident_zip) %>% 
      group_by(incident_zip) 
    head(comp_types)
    #> # A tibble: 6 × 3
    #> # Groups:   incident_zip [6]
    #>   complaint_type      descriptor                                   incident_zip
    #>   <chr>               <chr>                                               <int>
    #> 1 Noise - Residential Banging/Pounding                                    11364
    #> 2 Noise - Residential Loud Music/Party                                    11222
    #> 3 Noise - Residential Banging/Pounding                                    10033
    #> 4 Noise - Residential Loud Music/Party                                    11208
    #> 5 Noise - Residential Loud Music/Party                                    10037
    #> 6 Noise               Noise: Construction Before/After Hours (NM1)        11238
    
    table(df$complaint_type)
    #> 
    #>                    Noise       Noise - Commercial       Noise - Helicopter 
    #>                      555                      591                      145 
    #> Noise - House of Worship             Noise - Park      Noise - Residential 
    #>                       20                       72                     5675 
    #>  Noise - Street/Sidewalk          Noise - Vehicle 
    #>                     2040                      902
    
    df <- df %>%
      mutate(descriptor_misc = ifelse(grepl("Construction", descriptor), "Construction", 
                                ifelse(grepl("Music", descriptor), "Music", "Other")))
    
    df %>%
      group_by(descriptor_misc) %>%
      count()
    #> # A tibble: 3 × 2
    #> # Groups:   descriptor_misc [3]
    #>   descriptor_misc     n
    #>   <chr>           <int>
    #> 1 Construction      328
    #> 2 Music            6354
    #> 3 Other            3318
    
    head(df)
    #>        complaint_type                                   descriptor incident_zip
    #> 1 Noise - Residential                             Banging/Pounding        11364
    #> 2 Noise - Residential                             Loud Music/Party        11222
    #> 3 Noise - Residential                             Banging/Pounding        10033
    #> 4 Noise - Residential                             Loud Music/Party        11208
    #> 5 Noise - Residential                             Loud Music/Party        10037
    #> 6               Noise Noise: Construction Before/After Hours (NM1)        11238
    #>   descriptor_misc
    #> 1           Other
    #> 2           Music
    #> 3           Other
    #> 4           Music
    #> 5           Music
    #> 6    Construction