solr cassandra datastax-enterprise datastax-startup

Sort solr response by value in subdocument collection

I'm using DSE solr to index a cassandra table that contains a collection of UDTs. I want to be able to sort search results based on a value inside those UDTs.

Given a simplistic example table...

create type test_score (
  test_name text,
  percentile double,
  score int,
  description text
);

create table students (
  id int,
  name text,
  test_scores set<frozen<test_score>>,
  ...
);

... and assuming I'm auto-generating the solr schema via dsetool, I want to be able to write a solr query that finds students who have taken a test (by a specific test_name), and sort them by that test's score (or percentile, or whatever).

Solution

Unfortunately you can't sort by UDT fields.

However, I'm not sure what the value of a UDT is here. Perhaps I don't know enough about your use case. Another issue I see is that each partition key is a student id, so you can only store one test result per student. A better approach might be to use test id as a clustering column so you can store all the test results for a student in a single partition. Something like this:

CREATE TABLE students (
id int,
student_name text,
test_name text,
score int,
percentile double,
description text,
PRIMARY KEY (id, student_name, test_name)
);

Student name is kind of redundant (it should be the same for every row in each partition), but it doesn't have to be a clustering column.

Then you can sort on any field like so:

SELECT * FROM students WHERE solr_query='{"q":"test_name:Biology", "sort":"percentile desc"}' LIMIT 10;

I've used the JSON syntax described here: https://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/srch/srchJSON.html