Search code examples
sparql

Prevent GROUP_CONCAT from returning empty string when grouped results are null in SPARQL


I have a SPARQL query that is returning what I want, save one thing. When I use GROUP_CONCAT I receive back an empty string in the result. I would like it to simply be null in the return set when the values getting grouped are null. You can see in my example below my ?team results return "" instead of simply null like ?end is returning. In the case of the empty string, my ?person values are actually null. Is there a way I can get ?team to return null as well?

SPARQL Query:

SELECT ?event ?start ?end ?team {
    SELECT ?event ?start ?end (GROUP_CONCAT(DISTINCT ?person;SEPARATOR=",") AS ?team) {
        ?event a cls:Event ;
            prop:startDate ?start .

        OPTIONAL {
            ?event prop:endDate ?end .
            ?event prop:teamMember ?person .
        }

        FILTER (?start >= "2020-05-25" && ?start < "2020-08-31")
    } GROUP BY ?event ?start ?end
} ORDER BY ?start

Results:

| event       | start      | end        | team                                                         |
|-------------|------------|------------|--------------------------------------------------------------|
| event:Test1 | 2020-05-27 |            | ""                                                           |
| event:Test3 | 2020-05-28 | 2020-05-29 | "http://foo.bar/person/smith,http://foo.bar/person/williams" |
| event:Test2 | 2020-05-29 |            | ""                                                           |

Solution

  • I am afraid the SPARQL specification (see here) confirms what you observe as the right behaviour.

    To see why this is the case, imagine that instead of doing a GROUP_CONCAT, you were doing a COUNT. Then you if a team has no members, you would want to see 0, not null.

    In order to get what you want, I'd try this as a first iteration:

    SELECT ?event ?start ?end ?team {
    
        BIND(IF(?team_temp = "", ?team_temp, ?unboundVariable) AS ?team)
    #The above check if ?team_temp = "". If it is not, then there is a team and you use ?team-temp as ?team. Otherwise, if ?team_temp = "", you use some unbound variable as ?team, and this unbound variable will be null.
    
        {SELECT ?event ?start ?end (GROUP_CONCAT(DISTINCT ?person;SEPARATOR=",") AS ?team_temp) {
                ?event a cls:Event ;
                    prop:startDate ?start .
    
                OPTIONAL { ?event prop:endDate ?end }
                OPTIONAL { ?event prop:teamMember ?person }
    #Notice that if we want to match ?end and ?person optionally AND independently, then we need two optional statements above here, instead of one large one.
    
                FILTER (?start >= "2020-05-25" && ?start < "2020-08-31")
            } GROUP BY ?event ?start ?end}
        } ORDER BY ?start