Search code examples
rcountdplyrsqldf

Vlookup and Count String Occurrences in Separate Table R to new Column


I have two data frames. Below are samples but should be easily reproducible for illustration.

df1 <- data.frame(School = c("Omaha South", "Omaha Central", "Grand Island"), 
                  Enrollment = c(2166, 2051, 1982))
df2 <- data.frame('Away Score' = c(25, 57, 76), 
                  'Away Team' = c("Omaha South", "Omaha Central", "Grand Island"),
                  'Away Score' = c(52, 88, 69), 
                  'Away Team' = c("Omaha Central", "Grand Island", "Omaha South"),                 
                  Date = c("1/11/2020", "1/12/2020", "1/13/2020"),
                  Winner = c("Omaha Central", "Grand Island", "Grand Island"),
                  Loser = c("Omaha South", "Omaha Central", "Omaha South"))

My goal is to create a new column in df1 called "Wins" that looks up the school in df1 and then counts the number of times that school is listed in the "Winner" column of df2.

So want df1 to look like this:

df1 <- data.frame(School = c("Omaha South", "Omaha Central", "Grand Island"), 
                  Enrollment = c(2166, 2051, 1982),
                  Wins = c(0, 1, 2))

I've tried a number of solutions to no avail, including sqldf. My latest attempt was the below but it gives me an error saying no applicable method for 'group_by_' applied to an object of class "NULL"

df$Wins %>%
     group_by(df2$Winner) %>%
     mutate(Count=n())

Solution

  • One way using dplyr and joins :

    library(dplyr)
    
    df1 %>%
      left_join(df2, by = c('School' = 'Winner')) %>%
      na.omit() %>%
      count(School, name = "wins") %>%
      right_join(df1) %>%
      mutate(wins = replace(wins, is.na(wins), 0))
    

    Using base R we calculate the frequency of wins using table, convert it into dataframe using stack and then merge to df1.

    merge(df1, stack(table(factor(df2$Winner, levels = df1$School))), 
               by.x = 'School', by.y = "ind")