Search code examples
nlptext-analysis

Compute the Euclidean distance using word counts


Consider the following two sentences.

Sentence 1: The quick brown fox jumps over the lazy dog.

Sentence 2: A quick brown dog outpaces a quick fox.

Compute the Euclidean distance using word counts.


Solution

  • You can use the package tm to find word counts and then compute the euclidean distance

    > library(tm)
    > s1 <- " The quick brown fox jumps over the lazy dog"
    > s2 <- "A quick brown dog outpaces a quick fox"
    > 
    > VS <- VectorSource(c(s1,s2))
    > corp <- Corpus(VS)
    > dtm <- DocumentTermMatrix(corp)
    > d <- dist(t(dtm), method = 'euclidean')
    > d
    
    
    
            brown      dog      fox    jumps     lazy outpaces     over    quick
    dog      0.000000                                                               
    fox      0.000000 0.000000                                                      
    jumps    1.000000 1.000000 1.000000                                             
    lazy     1.000000 1.000000 1.000000 0.000000                                    
    outpaces 1.000000 1.000000 1.000000 1.414214 1.414214                           
    over     1.000000 1.000000 1.000000 0.000000 0.000000 1.414214                  
    quick    1.000000 1.000000 1.000000 2.000000 2.000000 1.414214 2.000000         
    the      1.414214 1.414214 1.414214 1.000000 1.000000 2.236068 1.000000 2.236068