Search code examples
solrsolrcloud

Solr Grouping and Unique values


I am trying to find a way to grab unique values based on a group. The idea would be to group by an id and then return that groups value.

Query params fl=valueIwant+myID&group=true&group.field=myId&q=:

 "grouped": {
     "myID": {
         "matches": 7520236,
         "groups": [{
                 "groupValue": "123456",
                 "doclist": {
                     "numFound": 6583,
                     "start": 0,
                     "docs": [{
                         "myID": 123456,
                         "valueIwant": "Hello World"
                     }]
                 }
             }
         ]
     }
 }

This is fine but what I want to do is select the 'valueIwant' in a distinct way. The group.limit will return more values in the docs, but it wont be unique. Is there a way to restrict group.limit to only return unique fl values? With 6583 found for the above example. I would have to expand the limit to 6583 then widdle it down by unique. This gets compounded when I have 700 unique ids that i want to group by with a total of 44m documents.

For example. If I do

fl=valueIwant+myID&group.limit=3&group=true&group.field=myId&q=:

         "grouped": {
         "myID": {
             "matches": 7520236,
             "groups": [{
                     "groupValue": "123456",
                     "doclist": {
                         "numFound": 6583,
                         "start": 0,
                         "docs": [{
                             "myID": 123456,
                             "valueIwant": "Hello World"
                         },
                         {
                             "myID": 123456,
                             "valueIwant": "Hello World"
                         }
                         {
                             "myID": 123456,
                             "valueIwant": "Hello World123456"
                         }]]
                     }
                 }
             ]
         }
     }

What I want is the docs to be unique against valueIwant like so

     "grouped": {
     "myID": {
         "matches": 7520236,
         "groups": [{
                 "groupValue": "123456",
                 "doclist": {
                     "numFound": 6583,
                     "start": 0,
                     "docs": [{
                         "myID": 123456,
                         "valueIwant": "Hello World"
                     },
                     {
                         "myID": 123456,
                         "valueIwant": "Hello Planet"
                     }
                     {
                         "myID": 123456,
                         "valueIwant": "Hello World123456"
                     }]]
                 }
             }
         ]
     }
 }

Is there a way to do this? I was looking at functions but couldnt find anything I needed.

Thanks,

-Peddler


Solution

  • I was able to do this with facet pivots. The hardest thing after this is parsing the response since it comes back as a very deeply nested object. You can see my first solution here Dynamically traversing a deep nested object and accumulating results and my question about making it more 'dynamic'

    facet: true,
      'facet.mincount': 1,
      'facet.sort': 'index',
      'facet.limit': 5,
      'facet.pivot': 'pivotvalue0, pivotvalue1, pivotvalue2, pivotvalue3'