Search code examples
rsumstrsplit

Total unique words in a column - R


I am interested in counting unique words that appear in a column. Rather than getting unique words per row as expained in Count unique/dinstinct words into a new column I'm interested in getting one answer which counts all unique entries in that column. In the following example the total unique countries are 3: China Australia and Korea

Is there a short way of getting this sum? I am still learning R therefore I have limited knowledge.

Countries

China  Australia

Australia

China China 

Korea Korea Korea Korea

Solution

  • We can split the column 'Countries' by space, unlist, and get the length of unique words

    length(unique(unlist(strsplit(df1$Countries, " "))))
    #[1] 3
    

    Or using tidyverse

    library(tidyverse)
    df1 %>% 
        separate_rows(Countries) %>% 
        distinct() %>%
        nrow
    #[1] 3
    

    data

    df1 <- structure(list(Countries = c("China Australia", "Australia", 
     "China China", "Korea Korea Korea Korea")), .Names = "Countries",
      class = "data.frame", row.names = c(NA, -4L))