Search code examples
sparqlsemantic-webdbpedia

Calculate variance of wikiPageLength with SPARQL on dbpedia


I wanted to calculate variance (or standard deviation) of wikipedia page length history, for light novel books (Light_novels), with dbpedia (http://live.dbpedia.org/sparql).

The SPARQL is here:

PREFIX dbpedia0: <http://dbpedia.org/ontology/> 
PREFIX term: <http://purl.org/dc/terms/>

SELECT DISTINCT ?item ((SUM((?n - ?mean)*(?n - ?mean)))/(COUNT(?n) - 1) AS ?variance) 
WHERE
{
    
?item term:subject dbc:Light_novels . 

  ?item dbpedia0:wikiPageLength ?n
  {
      SELECT ?item (AVG(?n) AS ?mean) WHERE {
        ?item dbpedia0:wikiPageLength ?n . 
      } 
  }

}        

This gives an error Virtuoso 37000 Error SP031: SPARQL compiler: The name ?n is used in multiple clauses, without any logical connection and I tried modifying the code blindlessly countless times, but ended up no idea how can I achieve that.

I really appreciate your helpful comments. Thanks.


Solution

  • Normally this should work

    PREFIX dbpedia0: <http://dbpedia.org/ontology/> 
    PREFIX term: <http://purl.org/dc/terms/>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    
    
    SELECT ?item IF(?count=1,0,(SUM(?x)/(?count-1))) as ?var
    {
    SELECT ?item ?n xsd:integer(?n) as ?ni ?mean xsd:integer(?ni-?mean) as ?nmean ((?nmean)*(?nmean)) as ?x ?count
    {
      ?item dbpedia0:wikiPageLength ?n 
      {
         SELECT ?item AVG(?n2) as ?mean (COUNT(?item)) as $count
         {
           ?item term:subject dbc:Light_novels . 
           ?item dbpedia0:wikiPageLength ?n2
         }
         GROUP BY ?item
      }
    }
    } GROUP BY ?item ?count
    

    but it throws this error

    Internal Optimized compiler error : Bad dfe in sqlo_place_exp in sqldf.c:2491

    So I have modified it to this one:

    PREFIX dbpedia0: <http://dbpedia.org/ontology/> 
    PREFIX term: <http://purl.org/dc/terms/>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    
    SELECT ?item IF(?count=1,-1,(?sx/(?count-1))) as ?var
    {
    SELECT ?item SUM(?x) as ?sx ?count
    {
    SELECT ?item ?n xsd:integer(?n) as ?ni ?mean xsd:integer(?ni-?mean) as ?nmean ((?nmean)*(?nmean)) as ?x ?count
    {
      ?item dbpedia0:wikiPageLength ?n 
      {
         SELECT ?item AVG(?n2) as ?mean (COUNT(?item)) as $count
         {
           ?item term:subject dbc:Light_novels . 
           ?item dbpedia0:wikiPageLength ?n2
         }
         GROUP BY ?item
      }
    }
    } GROUP BY ?item ?count
    }
    
    • we need to aggregations a. for calculating the mean and b. to calculate the sum of differences of each n from mean.

    • I needed to convert type to integer the variable xsd:integer(?ni-?mean), otherwise it does not show the result of multiply

    • With running the second inner subquery you can see how the variables are modified