Search code examples
rstringsummary

summarize from string matches


I have this df column:

df <- data.frame(Strings = c("ñlas onepojasd", "onenañdsl", "ñelrtwofkld", "asdthreeasp", "asdfetwoasd", "fouroqwke","okasdtwo", "acmofour", "porefour", "okstwo"))
> df
          Strings
1  ñlas onepojasd
2       onenañdsl
3     ñelrtwofkld
4     asdthreeasp
5     asdfetwoasd
6       fouroqwke
7        okasdtwo
8        acmofour
9        porefour
10         okstwo

I know that each value from df$Strings will match with the words one, two, three or four. And I also know that it will match with just ONE of those words. So to match them:

str_detect(df$Strings,"one")
str_detect(df$Strings,"two")
str_detect(df$Strings,"three")
str_detect(df$Strings,"four")

However, I'm stucked here, as I'm trying to do this table:

Homes  Quantity Percent
  One         2     0.3
  Two         4     0.4
Three         1     0.1
 Four         3     0.3
Total        10       1

Solution

  • With tidyverse and janitor you can do:

    df %>%
     mutate(Homes = str_extract(Strings, "one|two|three|four"),
            n = n()) %>%
     group_by(Homes) %>%
     summarise(Quantity = length(Homes),
               Percent = first(length(Homes)/n)) %>%
     adorn_totals("row")
    
     Homes Quantity Percent
      four        3     0.3
       one        2     0.2
     three        1     0.1
       two        4     0.4
     Total       10     1.0
    

    Or with just tidyverse:

     df %>%
     mutate(Homes = str_extract(Strings, "one|two|three|four"),
            n = n()) %>%
     group_by(Homes) %>%
     summarise(Quantity = length(Homes),
               Percent = first(length(Homes)/n)) %>%
     rbind(., data.frame(Homes = "Total", Quantity = sum(.$Quantity), 
                         Percent = sum(.$Percent)))
    

    In both cases the code, first, extracts the matching pattern and count the number of cases. Second, it groups by the matched words. Third, it computes the number of cases per word and the proportion of the given word from all words. Finally, it adds a "Total" row.