Search code examples
sparqlwikidata

Applying group_concat in wikidata queries for participants


I just learned about group_concat and order by, group by, and now I'm trying to apply them to query murder cases. I.E., to this one below to get all participants and targets.

SELECT DISTINCT ?incident ?label ?participant ?participantLabel ?target ?targetLabel
WHERE {
?incident wdt:P31 wd:Q132821.
?incident rdfs:label ?label.
optional{?incident wdt:P710 ?participant.}
optional{?incident wdt:P533 ?target. } }

And I tried to apply group_concat and group by, order by. (Didn't do anything on target below because even this only for participants doesn't work):

SELECT DISTINCT ?incident ?label ?target ?targetLabel (group_concat(?participantLabel; separator=";") as ?participant)

WHERE {
?incident wdt:P31 wd:Q132821.
?incident rdfs:label ?label.
optional{?incident wdt:P710 ?participant.}
optional{?incident wdt:P533 ?target. }}
GROUP BY ?participant ?participantLabel
ORDER BY ?participantLabel

And I am told Query is malformed: Bad aggregate.

Is it because not every case has participants? How can I tackle this?


Solution

  • You need to read the full error message from wikidata. The key lines are here --

    java.util.concurrent.ExecutionException: org.openrdf.query.MalformedQueryException: Bad aggregate
    ...
    Caused by: org.openrdf.query.MalformedQueryException: Bad aggregate
    ...
    Caused by: com.bigdata.rdf.sail.sparql.ast.VisitorException: Bad aggregate
    ...
    Caused by: java.lang.IllegalArgumentException: Non-aggregate variable in select expression: incident
    

    Basically, all non-aggregate variables in your SELECT must also be in your GROUP BY. With some other tweaks that I think will benefit you, your query becomes something like this --

    SELECT DISTINCT ?incident 
                    ?incidentLabel 
                    ?target
                    ?targetLabel
                    ( GROUP_CONCAT ( DISTINCT ?participantLabel; separator="; " ) AS ?participants )
    WHERE
      {
                   ?incident     wdt:P31     wd:Q132821 .
                   ?incident     rdfs:label  ?incidentLabel .
                   FILTER ( LANGMATCHES ( LANG ( ?incidentLabel ), "en" ) ) 
        OPTIONAL { ?incident     wdt:P710    ?participant .
                   ?participant  rdfs:label  ?participantLabel 
                   FILTER ( LANGMATCHES ( LANG ( ?participantLabel ), "en" ) ) 
                 }
        OPTIONAL { ?incident     wdt:P533    ?target . 
                   ?target       rdfs:label  ?targetLabel 
                   FILTER ( LANGMATCHES ( LANG ( ?targetLabel ), "en" ) ) 
                 }
      }
    GROUP BY ?incident ?incidentLabel ?target ?targetLabel
    ORDER BY ?incidentLabel ?targetLabel 
    

    I cannot explain the duplicate rows that appear in the result set (scroll down to "1991 Vic bombing"). These should have been eliminated by either or both the SELECT DISTINCT and the GROUP BY.