Search code examples
rif-statementapplyany

How to search for county names in a description column with multiple strings - R


I have a donation dataset with a field in it called "Description", where the donor described what they gave their gift for. This field has multiple words or strings in it (sometimes a full sentence), and several rows list specific counties where they wanted their donation to be designated.

I would like to identify which rows in this field have a county name in them, and indicate that somehow in a new field. I have a dataframe with the county names from the two states I need, but I'm struggling to know which code let me use the county field in the county dataframe as a basis for identifying county names in within the Description field.

I'm still at a low level in R but I'll try to give some sample code. I have over 1000 rows so it will take too long for me to search for specific counties in a string - it will be more helpful to use a list of counties as my basis for searching.

`df <- tibble(`Donor Type` = c("Single Donation", "Grant", "Recurring Donation"), Amount = c("10", "50", "100"), Description = c("This is for Person County", "Books for Beaufort County", "Brews for Books"))`

`Donor Type`       Amount Description              
  <chr>              <chr>  <chr>                    
1 Single Donation    10     This is for Person County
2 Grant              50     Books for Beaufort County
3 Recurring Donation 100    Brews for Books

I have a dataframe with county names in two states (named Carolina.Counties below)- what code should I use to make an additional column in my donor dataframe indicating which descriptions were limited to a specific county? I've been playing around with the following - but am not getting the right results.

Df <- 
  apply(Df, 1, function(x) 
    ifelse(any(Df$Description %in% Carolina.Counties$county), 'yes','no'))

Solution

  • %in% would look for an exact match. You may need some sort of regex match which can be achieved with the help of grepl.

    df$result <- ifelse(grepl(paste0(Carolina.Counties$county, collapse = '|'), 
                        df$Description), 'Yes', 'No')
    

    paste0(Carolina.Counties$county, collapse = '|') would create a single regex pattern to looking for all the counties. We look for this pattern in Description column if it exists assign "Yes" else "No".