Search code examples
mysqlsphinx

Reality check on a SQL query for Sphinx


I'm working through implementing Sphinx as a search server for my website. The website allows teachers to build assessment items -- which I'm looking to index -- that can include:

  • A context (one per item)
  • One or more questions (many per item)
  • Answers to each question (many per question)
  • Categories for each question (many to many)

I'm new to both Sphinx and the direct use of MySQL, since my other searches have been through CakePHP's functions.

In building a view through which the questions in the database can be indexed, I've come up with the following SQL query (it's big, but I'm not going to ask you to debug it):

SELECT
    questions.id,
    CONCAT(
        IFNULL(questions.context_template, ''),
        IFNULL(context_answers_concat.context_answer_templates, ''),
        IFNULL(parts_concat.full_parts, '')
    ) as full_question,
    parts_concat.all_categories
FROM
    questions
LEFT JOIN(
    SELECT
        question_answers.question_id,
        GROUP_CONCAT(
            question_answers.answer_template
        ) AS context_answer_templates
    FROM
        question_answers
    GROUP BY
        question_answers.question_id
) AS context_answers_concat
ON
    context_answers_concat.question_id = questions.id
LEFT JOIN(
    SELECT
        question_parts.question_id,
        GROUP_CONCAT(
            CONCAT(
                question_parts.question_template,
                IFNULL(answers_concat.answer_templates, '')
            )
        ) AS full_parts,
        GROUP_CONCAT(
            categories_concat.part_categories
        ) AS all_categories
    FROM
        question_parts
    LEFT JOIN(
        SELECT
            question_answers.question_part_id,
            GROUP_CONCAT(
                question_answers.answer_template
            ) AS answer_templates
        FROM
            question_answers
        GROUP BY
            question_answers.question_part_id
    ) AS answers_concat
ON
    answers_concat.question_part_id = question_parts.id
LEFT JOIN(
    SELECT
        question_categories.question_part_id,
        GROUP_CONCAT(DISTINCT categories.type) AS part_categories
    FROM
        question_categories,
        categories
    WHERE
        question_categories.category_id = categories.id
    GROUP BY
        question_categories.question_part_id
) AS categories_concat
ON
    categories_concat.question_part_id = question_parts.id
GROUP BY
    question_parts.question_id
) AS parts_concat
ON
    parts_concat.question_id = questions.id

I've run the query in phpMyAdmin, and it generates the view I expected -- an id column, a column containing the text of the questions and answers, and a column for the categories.

My question is, because it's so big, is this a reasonable query to use for indexing something like this in Sphinx? If not, should I be using other parts of Sphinx to do some of this work? Or different SQL functions?


Solution

  • As long as you're good with the query's performance and the query returns what you need I don't see any problem. I saw queries even bigger and Sphinx / Manticore worked fine with them.

    Alternatively you can use real-time indexes, in that case you would need to insert data to the index rather than taking it from mysql using indexer.

    If you want to stick with plain indexes and using indexer think about I'd also recommend you think about splitting your index into main and delta parts so you can rebuild the delta part more frequently. Unless your index is small and takes insignificant time to rebuild, then you don't need to worry about that.