Search code examples
regexsparqlstrlenwikidata

Query on Wikibase:label REGEX & STRLEN


I'm new to SparkQL & wikidata and I'm trying to query the following:

  • Male singers (artists/performers)
  • Who are alive
  • Given name = 6 characters
  • Given name does NOT Contain ("e","i","u")

I'm having trouble with the Filters on Given names (I believe it's because they reference the "SERVICE wikibase:label"). I've tried using HAVING to no avail. Is the correct action to nest the query and filter on that or are there more elegant ways?

#-- Male artists
SELECT DISTINCT ?m ?givennameLabel (STRLEN(?givennameLabel)AS ?Namechars)
 ?mLabel ?plLabel WHERE {
 ?m wdt:P31 wd:Q5.
 ?m wdt:P21 wd:Q6581097.
 ?m wdt:P735 ?givenname.
 ?m wdt:P27 ?pl.
 ?m (wdt:P106/wdt:P279*) wd:Q483501.
 OPTIONAL { ?m wdt:P175 ?performer. }
 OPTIONAL {?m wdt:P570 ?d } 
 FILTER (!bound(?d))
 SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
FILTER (!regex(?givennameLabel,"(e|i|u)")). #no records
FILTER (STRLEN(?givennameLabel) = 6)        #no records
} #HAVING (!regex(?givennameLabel,"(e|i|u)")) #returns "Bad aggregate"
LIMIT 50 

Try it!

I think something like this should work, although it's not working out.

BIND(STRLEN(str(?givennameLabel)) as ?NL)
FILTER (?NL = 6) 

Solution

  • Ok, I think found the problem: It looks like you call some "magic" SERVICE which creates some variable givennameLabel. For me sounds strange, but that might lead to the fact that the variable is not bound it time of filtering. If you put the whole query in a sub-select, it works:

    #Male artists
    SELECT * WHERE {
    
         {
         SELECT DISTINCT ?m ?givennameLabel (STRLEN(?givennameLabel)AS ?Namechars)
           ?mLabel ?plLabel WHERE {
           ?m wdt:P31 wd:Q5.
           ?m wdt:P21 wd:Q6581097.
           ?m wdt:P735 ?givenname.
           ?m wdt:P27 ?pl.
           ?m (wdt:P106/wdt:P279*) wd:Q483501.
           OPTIONAL { ?m wdt:P175 ?performer. }
           OPTIONAL {?m wdt:P570 ?d } 
           FILTER (!bound(?d))
           SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
        } 
        LIMIT 50
        }
      FILTER (!regex(?givennameLabel,"(e|i|u)")).
      FILTER (STRLEN(?givennameLabel) = 6)       
    } 
    

    Update

    The suggested query does only consider 50 resources and without the LIMIT leads to a timeout. As mentioned by user3240704, an alternative solution is to avoid the SERVICE clause and use rdfs:label instead:

    #Male artists
     SELECT DISTINCT ?m ?givennameLabel (STRLEN(?givennameLabel)AS ?Namechars) 
                     ?mLabel ?plLabel 
     WHERE {
       ?m wdt:P31 wd:Q5.
       ?m wdt:P21 wd:Q6581097.
       ?m wdt:P735 ?givenname.
       ?m wdt:P27 ?pl.
       ?m (wdt:P106/wdt:P279*) wd:Q483501.
       OPTIONAL { ?m wdt:P175 ?performer. }
       OPTIONAL {?m wdt:P570 ?d } 
       FILTER (!bound(?d))
       ?givenname rdfs:label ?label 
       FILTER(LANG(?label) ="en"). 
       FILTER (!regex(?label,"(e|i|u)")). 
       FILTER (STRLEN(?label) = 6)}
    } 
    LIMIT 50