Search code examples
rtext-miningtmword-frequency

Text mining - word frequency from a single column containing list


Here is my dataset:

https://app.box.com/s/yotsy58ud2k9yk7vs7sj8ksc0favhevv

I'm trying to create a frequency table of the tags from a single column with following structure:

enter image description here

I tried using qdap for simplicity, but the result is not correct

library(qdap)
tags_df <- read.csv(file.choose())
freq_terms(tags_df$tags)

Solution

Just improving (creating a data frame and sorting) the solution given by Rui:

sp <- unlist(strsplit(as.character(unlist(tags_df$tags)),'^c\\(|,|"|\\)'))

inx <- sapply(sp, function(y) nchar(trimws(y)) > 0 & !is.na(y))

data <- as_data_frame(table(tolower(sp[inx])))

data <- data[with(data,order(-n)),]

data <- data[1:10,]

Solution

  • If all you want or need is a frequency count, you can do without external packages, base R has a function table.

    sp <- unlist(strsplit(as.character(unlist(tags_df$tags)), '^c\\(|,|"|\\)'))
    inx <- sapply(sp, function(y) nchar(trimws(y)) > 0 & !is.na(y))
    table(sp[inx])
    #    Android        CSS3      Design      Hiring  JavaScript      NextJS 
    #          1           1           1           1           4           1 
    #     NodeJS programming Programming     ReactJS     Testing          UI 
    #          1           1           3           3           1           1 
    #         UX   WebDesign      webdev      WebDev 
    #          1           2           1           4
    

    EDIT.

    I have just realized that you have "programming" and "Programming", "webdev" and "WebDev" as tags, maybe you want to do a case-insensitive count. If this is the case, try instead

    table(tolower(sp[inx]))