Search code examples
solrcassandradatastax-enterprise

Partially indexing Cassandra table with SOLR


One of the tables inside our Cassandra (DSE 4.7) Cluster contains south of 15 billion records. With the number of servers we have - it would be impossible to index them all with Solr.

So, is it possible to somehow index the data partially/sample and/or start indexing and then "pause" indexing let's say after 500mm records?

I assume the other option would be to just dump 500mm records and reload them into another "temp" table and index that...?

The point is, I would like to start indexing and have the ability to search and as we grow and add more servers - have the ability to index more and pause again.

Is that even possible?

Thanks!


Solution

  • There is no way to index just a few rows. I agree that a parallel table (probably with TTL) is likely your best bet.

    Here are some (pretty effective) tactics to minimize the size of your DSE Search index. You can probably shrink it by ~50% if you're not using things like Highlighting (term...) or Boosts (omitnorms):

    • set termVectors="false"

    • set termPositions="false"

    • set termOffsets="false"

    • set omitNorms="true"

    • Only index fields you intend to search