Please tell me how to overcome the problem of negative weighting in IDF. Can someone give a small example?
IDF is defined as N/n(t) where n(t) is the number of documents that a term 't' occurs in and N is the total number of documents in the collection. Sometimes, a log() is applied around this fraction.
Please observe that this fraction N/n(t) is always >= 1. For a word which appears in all documents, a likely case of which is the English word "the", the value of idf is 1. Even if a log is applied around this fraction, the value is always >= zero. (Recall the graph of the log function which monotonically increases from -inf to +inf with log(x)<0 if x<1 log(1)=0 and log(x)>0 if x>1).
So, there's no way in which a standard definition of idf can be negative.