Search code examples
sphinx

Why doesn't Sphinx have BM25 with field weights?


The formula for Sphinx default ranker, SPH_RANK_PROXIMITY_BM25 looks like this:

SPH_RANK_PROXIMITY_BM25 = sum(lcs*user_weight)*1000+bm25

The Longest Common Subsequence is computed for each field separately and then multiplied by user_weight. However bm25 is just a document-wide variable and does not take user fields into account. Why is that so?


Solution

  • Just because it's faster and in many cases the quality is enough. There's a custom ranker and bm25f to be used there. Document length is also not accounted by default, it requires index_field_lengths=1 during indexing.