I am trying to use sphinxsearch to search on multiple fields, in essence to get around the restriction on numeric IDs used in attributes for search filtering (the database uses a lot of alphanumeric uniqIDs as ids instead).
Here's the main search used in the Sphinx config:
sql_query = \
SELECT text_page.id, text_page.document_id, documents.startdate, documents.enddate, documents.long_title, documents.volume,text_page.images_page_id, text_page.text, \
series.name, series.id AS series_id, series.white_label_id AS white_label_id, \
documents.date_created\
FROM text_page \
INNER JOIN documents ON text_page.document_id = documents.id \
INNER JOIN series ON documents.series_id = series.id
text_page.text is the main fulltext field.
I have added this line to the config to try to get this row fulltext indexed as well:
sql_field_string = white_label_id
I then tried to create a query narrowed by white_label_id by running the following query through the PHP Sphinx class.
"@text (search words) @white_label_id (some-uniq-id)"
As I understand it from here, this should mean both @text and @white_label_id have to produce hits on the database row to return a result.
However the query produces no results ever, and no errors or warnings.
Any suggestion as to what is going wrong here? Is it because white_label_id
and text
fields are on different tables? Is there a solution that avoids restructuring the database to use numeric IDs?
As requested, here is a full config file. Note at present the code is still using the PHP Sphinx Class, rather than SphinxQL via mysqli.
source src2
{
sql_host = localhost
sql_user = username
sql_pass = password
sql_db = databasename
sql_port = 3306 # optional, default is 3306
sql_query_pre = SET NAMES utf8
sql_query = \
SELECT text_page.id, text_page.document_id, documents.startdate, documents.enddate, documents.long_title, documents.volume,text_page.images_page_id, text_page.text, \
series.name, series.id AS series_id, series.white_label_id AS white_label_id, \
documents.date_created\
FROM text_page \
INNER JOIN documents ON text_page.document_id = documents.id \
INNER JOIN series ON documents.series_id = series.id
sql_attr_uint = startdate
sql_attr_uint = enddate
sql_attr_uint = volume
sql_attr_timestamp = date_created
sql_attr_string = long_title
sql_attr_string = name
#sql_attr_string = white_label_id #NB - does not work with nonnumeric ids
sql_attr_string = document_id
sql_attr_string = series_id
sql_field_string = white_label_id #currently appears to do nothing
sql_ranged_throttle = 0
}
source src2throttled : src2
{
sql_ranged_throttle = 100
}
index myindex11
{
source = src2
path = /var/data/mydata1
docinfo = extern
mlock = 0
morphology = none
min_word_len = 1
charset_type = utf-8
html_strip = 0
}
index myindex1stemmed : myindex1
{
path = /var/data/mydata1stemmed
morphology = stem_en
index_exact_words = 1
}
Eventually it turns out there's a much better solution to working around the 'numeric only' rule on Sphinx column ids.
The answer is to create a numeric hash of text-based uniq_id columns, which can then be used as sql_attr_uint
to narrow searches.
For example, the SQL query in the original post becomes:
sql_query = \
SELECT text_page.id, text_page.document_id, documents.startdate, documents.enddate, documents.long_title, documents.volume,text_page.images_page_id, text_page.text, \
series.name, series.id AS series_id, CRC32(series.white_label_id) AS white_label_id, \
documents.date_created\
FROM text_page \
INNER JOIN documents ON (text_page.document_id = documents.id AND documents.is_active = 1) \
INNER JOIN series ON documents.series_id = series.id