Search code examples
solrlucenerequestgroupingsolr5

Solr request: SQL-like JOIN, GROUP BY, SUM(), WHERE SUM()


I'm new to Solr and I have the following problem:

I have those documents:

category:contract:

{
    "contract_id_s": "contract-ENG-00001",
    "title_s": "contract title",
    "ref_easy_s": "REFAAA",
    "commitment_id_s": "ENG-00001",
},

category:commitment:

{
    "commitment_id_s": "ENG-00001",
    "title_s": "commitment title",
    "status_s": "Validated",
    "date_changed_status_s": "2015-09-30",
    "date_status_initiated_s": "2015-09-27",
    "date_status_confirmed_s": "2015-09-28",
    "date_status_validated_s": "2015-09-30",
},

category:commitment AND sub_category_s:commitment_project:

{
    "id": "ENG-00001_AAA",
    "commitment_id_s": "ENG-00001",
    "project_id_s": "AAA",
    "project_name_s": "project name",
    "project_amount_asked_s": "2000",
    "project_amount_validated_s": "2100"
},
{
    "id": "ENG-00001_AAA2",
    "commitment_id_s": "ENG-00001",
    "project_id_s": "AAA",
    "project_name_s": "project name",
    "project_amount_asked_s": "1000",
    "project_amount_validated_s": "1200"
},

For each commitment, there could be a contract. For each commitment, there could be some payments.

Here is what I want to do: - by default, only select commitment that have at least : . one sub_category_s:commitment_project with a project_amount_validated_s value. . one contract. - if filtered on amounts, only select in this list, commitments with the SUM of project_amount_validated_s > amount_min AND < amount_max.

I don't know what is the best practice in terms of performance? - Requesting the ids of the commitments then requesting the details for them? - Is there a way to JOIN the contract informations in this request? - Or the best practice is to request each document one by one?

The problem is that I don't want to request useless data (performance, bandwidth).


Solution

  • Ok, I've found a solution by using !join.

    For instance, in PHP:

    [
        'q' => "{!join from=id to=service_id score=none}uri:\\$serviceUri* AND -deleted:true",
        'fq' => "{!cache=false}category:monthly_volume AND type:\"$type\" AND timestamp:[$strDateStart TO $strDateEnd]",
        'alt' => 'json',
        'max-results' => 1000,
        'sort' => 'timestamp ASC',
        'statsFields' => 'stats.field=value&stats.facet=timestamp',
    ]
    

    Or with URL request:

    http://localhost:8983/solr/fluks-admin/select?q={!join+from=id+to=sector_id+score=none}{!join+from=uri+to=service+score=none}uri:/test-en/service-en*+AND+-deleted:true&fq={!cache=false}category:indicator+AND+timestamp:[201608+TO+201610]+AND+type:("-3"+OR+2+OR+3)+AND+-deleted:true&wt=json&indent=true&json.facet={sum_timestamp:{terms:{limit:-1, field:timestamp, facet:{sum_type:{terms:{limit:-1, field:type, facet:{sum_vol_value:"sum(vol_value)"}}}}}}}