Search code examples
stanford-nlpinformation-retrievaltf-idf

Vector Space Model - query vector [0, 0.707, 0.707] calculated


I'm reading the book "Introduction to Information Retrieval "(Christopher Manning) and I'm stuck on the chapter 6 when it introduces the query "jealous gossip" for which it indicated that the vector unit associated is [0, 0.707, 0.707] ( https://nlp.stanford.edu/IR-book/html/htmledition/queries-as-vectors-1.html ) considering the terms affect, jealous and gossip. I tried to calculate it by computing the tfidf assuming that: - Tf is equal to 1 for jealous and gossip - Idf is always equal to 0 if we calculate it as log(N/df) with N=1(I have only 1 query and it is my document), df=1 for jealous and gossip => log(1)=0 Since the idf is 0, it turns out that the tfidf is 0. So I decided to compute every weight of the query vector with the raw tf divided by the euclidean length. In this case the Euclidean length is sqrt(1+1)=1. I can't obtain the formula by which it decided that [0, 0.707, 0.707] is the query vector. Can someone help me?


Solution

  • I haven't worked through the problem, but I think the issue might be that sqrt(1+1) is sqrt(2), so when you normalize, each of the 1s become 1/sqrt(2) = 0.707.