Search code examples
information-retrievaltf-idfdata-retrieval

What does it mean "IDF is just dependent on the term"?


it possible someone explain "Tf is dependent on term and document" and "IDF is just dependent on the term" with an example ?


Solution

  • Suppose that we have these two documents:

    d_1: "Tf is dependent on term and document"
    d_2: "IDF is just dependent on the term"
    

    The count of terms in each document is as follows:

    d_1: 
    {Tf: 1, is: 1, dependent: 1, on: 1, term: 1, and: 1, document: 1}
    d_2:
    {IDF: 1, is: 1, just: 1, dependent: 1, on: 1, the: 1, term: 1}
    

    The term frequencies (i.e., the ratio of times that term t appears in document d to the total count of terms of that document) for term "on" are:

    tf(on, d_1) = 1 / 7
    tf(on, d_2) = 1 / 7
    

    For calculating the term frequency of a term, you must specifiy which document you are talking about. tf(on, d_1) = 1/7 tells you that 1/7 of all words in d_1 is "on".

    The inverse document frequency (logarithm of ratio of documents that include the word "on") is:

    idf(on) = log(2/2) = 0
    

    As you see, the idf is constant for all documents in this corpus of two documents. It's just a measure of how common a term is in a set of documents. idf(on) = 0 tells you that "on" is not special at all and it appears in all documents.