Search code examples
duplicatessparqlrdf

SPAQRL: select item and count occurences of its label


I have this SPARQL query directed to the Open Research Knowledge Graph (ORKG):

PREFIX orkgr: <http://orkg.org/orkg/resource/>
PREFIX orkgc: <http://orkg.org/orkg/class/>
PREFIX orkgp: <http://orkg.org/orkg/predicate/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?o1Label (COUNT(?o1Label) AS ?o1LabelCount)
WHERE {
  ?o1 a orkgc:Paper.
  ?o1 rdfs:label ?o1Label.
  
  FILTER (strlen(?o1Label) > 1).
}

GROUP BY ?o1Label
ORDER BY DESC(?o1LabelCount)

Which results in labels (?o1Label) and the number of occurrences of this label (?o1LabelCount).

How can I extend this query to also include a column for the actual item (?o1)?

Because there might be multiple candidates (when o1LabelCount is > 1), there should be one row for each of these items (with the same label and the same label count).


Solution

  • I see two options:

    First (and probably better) is to use GROUP_CONCAT and collect the entities into one field to be parsed again on application side. this could look like this (link):

    PREFIX orkgr: <http://orkg.org/orkg/resource/>
    PREFIX orkgc: <http://orkg.org/orkg/class/>
    PREFIX orkgp: <http://orkg.org/orkg/predicate/>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    
    SELECT ?o1Label (GROUP_CONCAT(?o1, "\t") AS ?o1s) (COUNT(?o1Label) AS ?o1LabelCount)
    WHERE {
      ?o1 a orkgc:Paper.
      ?o1 rdfs:label ?o1Label.
      
      FILTER (strlen(?o1Label) > 1).
    }
    
    GROUP BY ?o1Label
    ORDER BY DESC(?o1LabelCount)
    

    An alternative would be using nested queries and receive a result as you described (link):

    PREFIX orkgr: <http://orkg.org/orkg/resource/>
    PREFIX orkgc: <http://orkg.org/orkg/class/>
    PREFIX orkgp: <http://orkg.org/orkg/predicate/>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    
    SELECT ?o1Label ?o1 ?o1LabelCount
    WHERE {
      ?o1 rdfs:label ?o1Label .
    
      {
        SELECT ?o1Label (COUNT(?o1Label) AS ?o1LabelCount)
        WHERE {
          [
            a orkgc:Paper;
            rdfs:label ?o1Label
          ]
          FILTER (strlen(?o1Label) > 1).
        }
      }
    }
    
    GROUP BY ?o1Label
    ORDER BY DESC(?o1LabelCount)