Search code examples
sphinx

Sphinx Mulit-Level Sort with Randomize


Here is my challenge with Sphinx Sort where I have Vendors who pay for premium placement and those who don't:

I already do a multi-level order including the PaidVendorStatus which is either 0 or 1 as:

order by PaidVendorStatus,Weight()

So in essence I end up with multiple sort groups:

  • PaidVendorStatus=1, Weight1
  • ....
  • PaidVendorStatus=1, WeightN
  • PaidVendorStatus=0, Weight1
  • ...
  • PaidVendorStatus=0, WeightN

The problem is I have three goals:

  1. Randomly prioritize each vendor in any given sort group
  2. Have each vendor's 'odds' of being randomly assigned top position be equal regardless of how many records they have returned in the group (so if Vendor A has 50 results and VendorB has 2 results they still both have 50% odds of being randomly assigned any given spot)
  3. Ideally, maintain the same results order in any given search (so that if the user searches again the same order will be displayed

I've tried various solutions:

Select CRC32(Vendor) as RANDOM...Order by PaidVendorStatus,Weight(),RANDOM

which solves 2 and 3 except due to the nature of CRC32 ALWAYS puts the same vendor first (and second, third, etc.) so in essence does not solve the issue at all.

I tried making a sphinx sql_attr_string in my Sphinx Configuration which was a concatenation of Vendor and the record Title (Select... concat(Vendor,Title) as RANDOMIZER..)` and then used that to randomize

Select CRC32(RANDOMIZER) as RANDOM...

which solves 1 and 3 as now the Title field gets thrown in the randomization mis so that the same Vendor does not always get first billing. However, it fails at 2 since in essence I am only sorting by Title and thus Vendor B with two results now has a very low change of being sorted first.

In an ideal world naturally I could just order this way;

Order by PaidVendorStatus,Weight(),RAND(Vendor)

but that is not possible.

Any thoughts on this appreciated. I did btw check out as per Barry Hunter's suggestion this thread on UDF but unless I am not understanding it at all (possible) it does not seem to be the solution for this problem.


Solution

  • Well one idea is:

    SELECT * FROM (
      SELECT *,uniqueserial(vendor_id) AS sorter FROM index WHERE MATCH(...) 
           ORDER BY PaidVendorStatus DESC ,Weight() DESC LIMIT 1000
    ) ORDER BY sorter DESC, WEIGHT() DESC:
    

    This exploits SPhixnes 'multiple sort' function with pysudeo subquery.

    This works wors becasuse the inner query is sorted by PaidVendor first, so their items are fist. Which works to affect the ordr that unqique serial is called in.

    Its NOT really 'randomising' the results as such, seems you jsut randomising them to mix up the vendors (so a single vendor doesnt domninate results. Uniqueserial works by 'spreading' the particular vendors results out - the results will tend to cycle through the vendors.

    This is tricky as it exploits a relative undocumented sphinx feature - subqueries.

    For the UDF see http://svn.geograph.org.uk/svn/modules/trunk/sphinx/