Search code examples
rtextfrequencylda

How to convert frequency into text by using R?


I have dataframe like this (ID, Frequency A B C D E)

ID A B C D E    
1  5 3 2 1 0  
2  3 2 2 1 0  
3  4 2 1 1 1

I want to convert this dataframe into test based document like this (ID and their frequency ABCDE as words in a single column). Then I may use LDA algorithm to identify hot topics for each ID.

ID                     Text
1   "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
2   "A" "A" "A" "B" "B" "C" "C" "D"
3   "A" "A" "A" "A" "B" "B" "C" "D" "E"

Solution

  • We can use data.table

    library(data.table)
    DT <- setDT(df1)[,.(list(rep(names(df1)[-1], unlist(.SD)))) ,ID]
    DT$V1
    #[[1]]
    #[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
    
    #[[2]]
    #[1] "A" "A" "A" "B" "B" "C" "C" "D"
    
    #[[3]]
    #[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"
    

    Or a base R option is split

    lst <- lapply(split(df1[-1], df1$ID), rep, x=names(df1)[-1])
    lst
    #$`1`
    #[1] "A" "A" "A" "A" "A" "B" "B" "B" "C" "C" "D"
    
    #$`2`
    #[1] "A" "A" "A" "B" "B" "C" "C" "D"
    
    #$`3`
    #[1] "A" "A" "A" "A" "B" "B" "C" "D" "E"
    

    If we want to write the 'lst' to csv file, one option is convert the list to data.frame by appending NA at the end to make the length equal while converting to data.frame (as data.frame is a list with equal length (columns))

    res <- do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
    

    Or use a convenient function from stringi

    library(stringi)
    res <- stri_list2matrix(lst, byrow=TRUE)
    

    and then use the write.csv

    write.csv(res, "yourdata.csv", quote=FALSE, row.names = FALSE)