Search code examples
databasesearchfull-text-searchsearch-engine

Where to get "idf" coef for words?


I want to calculate tf-idf weight. So, for finding idf I need big database of different documents. Then I have make other db with colums-(word/count). So my question is "where can I find last database of "idf" (or count) coef for words"? Many search engines are using this db, maybe it is possible find this db in Internet for different languages? I don't want to make this db by myself.


Solution

  • idf is Inverse Document Frequency. In other words, the frequency of the term goes in the denominator. So what you want are word frequency tables. Wiktionary:Frequency lists should get you started. Keep in mind these lists treat inflected forms of a word as the same word e.g. be, is, am, are, ....