I have grouped my solr documents by a field family
.
the solr query for getting first 20 groups is as follows
/select?q=*:*&group=true&group.field=family&group.ngroups=true&start=0&group.limit=1
Result of this query is 20 groups as following
responseHeader: {
zkConnected: true,
status: 0,
QTime: 1260,
params: {
q: "*:*",
group.limit: "1",
start: "0",
group.ngroups: "true",
group.field: "family",
group: "true"
}
},
grouped: {
family: {
matches: 464779,
ngroups: 396324,
groups: [
{
groupValue: "__fam__ME.EA.HE.728928",
doclist: {
numFound: 1,
start: 0,
maxScore: 1,
docs: [
{
sku: "ME.EA.HE.728928",
title: "Rexton Pocket Family Hearing Instrument Fusion",
family: "__fam__ME.EA.HE.728928",
brand: "Rexton",
brandId: "6739",
inStock: false,
bulkDiscount: false,
quoteOnly: false,
cats: [
"Hearing Machine & Components",
"Health & Personal Care",
"Medical Supplies & Equipment"
],
leafCatIds: [
"6038"
],
parentCatIds: [
"6259",
"4913"
],
Type__attr__: "Pocket Family",
Type of Products__attr__: "Hearing Instrument",
price: 3790,
discount: 40,
createdAt: "2016-02-18T04:51:36Z",
moq: 1,
offerPrice: 2255,
suggestKeywords: [
"Rexton",
"Pocket Family",
"Rexton Pocket Family"
],
suggestPayload: "6038,Hearing Machine & Components",
_version_: 1548082328946868200
}
]
}
},
Just the thing to notice in this result is the value of ngroups which is 396324
But when i want to get data of last pages i would hit this query on Solr
select?q=*:*&group=true&group.field=family&group.ngroups=true&start=396320&group.limit=1
{
responseHeader: {
zkConnected: true,
status: 0,
QTime: 5238,
params: {
q: "*:*",
group.limit: "1",
start: "396320",
group.ngroups: "true",
group.field: "family",
group: "true"
}
},
grouped: {
family: {
matches: 464779,
ngroups: 396324,
groups: [ ]
}
}
}
0 results when i set start to 396320
. There must be 5 documents in the result. The actual number of groups are 386887
. Why is ngroups incorrect?
btw this issue is not present in my local solr server i have setup up. just shows up in solr cloud on the test env
This is a known issue with grouping across distributed nodes (which is what happens in SolrCloud mode):
Grouping is supported for distributed searches, with some caveats:
Currently group.func is is not supported in any distributed searches
group.ngroups
andgroup.facet
require that all documents in each group must be co-located on the same shard in order for accurate counts to be returned. Document routing via composite keys can be a useful solution in many situations.
The most direct solution is to use the family as a part of the routing key, ensuring that all identical family values will end up on the same node. As it seems that the number of distinct family values are very high compared to the number of nodes, this should still ensure that you have a good distribution of documents across nodes.
Depending on what you're actually trying to do, there might be other alternative solutions as well (if you just want a count, using a JSON facet might be a good solution).