Search code examples
solrcassandradatastax-enterprisedatastax-startup

Sort solr response by value in subdocument collection


I'm using DSE solr to index a cassandra table that contains a collection of UDTs. I want to be able to sort search results based on a value inside those UDTs.

Given a simplistic example table...

create type test_score (
  test_name text,
  percentile double,
  score int,
  description text
);

create table students (
  id int,
  name text,
  test_scores set<frozen<test_score>>,
  ...
);

... and assuming I'm auto-generating the solr schema via dsetool, I want to be able to write a solr query that finds students who have taken a test (by a specific test_name), and sort them by that test's score (or percentile, or whatever).


Solution

  • Unfortunately you can't sort by UDT fields.

    However, I'm not sure what the value of a UDT is here. Perhaps I don't know enough about your use case. Another issue I see is that each partition key is a student id, so you can only store one test result per student. A better approach might be to use test id as a clustering column so you can store all the test results for a student in a single partition. Something like this:

    CREATE TABLE students (
    id int,
    student_name text,
    test_name text,
    score int,
    percentile double,
    description text,
    PRIMARY KEY (id, student_name, test_name)
    );
    

    Student name is kind of redundant (it should be the same for every row in each partition), but it doesn't have to be a clustering column.

    Then you can sort on any field like so:

    SELECT * FROM students WHERE solr_query='{"q":"test_name:Biology", "sort":"percentile desc"}' LIMIT 10;

    I've used the JSON syntax described here: https://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/srch/srchJSON.html