Search code examples
rnlpcran

Finding how many times the words in one array occur in another array in R?


Hi i have two arrays 'topWords' of length N (unique words), and 'observedWords' with length < N (repetitions of words).

I'd like an array of counts 'countArray' of length N containing the number of times each of the N words in 'topWords' occurs in the array 'observedWords'. What is an efficient way to do this in R?


Solution

  • Here's a simple example using match and unique. Then ifelse at the end to turn the NA values into 0.

    > topWords <- paste(LETTERS, letters, sep = "")
    > topWords
    ##  [1] "Aa" "Bb" "Cc" "Dd" "Ee" "Ff" "Gg" "Hh" "Ii" "Jj" "Kk" "Ll" "Mm" "Nn" "Oo"
    ## [16] "Pp" "Qq" "Rr" "Ss" "Tt" "Uu" "Vv" "Ww" "Xx" "Yy" "Zz"
    > observedWords <- c("Bb", rep("Mm", 2), rep("Pp", 3))
    > observedWords
    ## [1] "Bb" "Mm" "Mm" "Pp" "Pp" "Pp"
    > mm <- match(topWords, unique(observedWords))
    > ifelse(is.na(mm), 0, mm)
    ## [1] 0 1 0 0 0 0 0 0 0 0 0 0 2 0 0 3 0 0 0 0 0 0 0 0 0 0