Search code examples
group-bysumsparqlrdftranspose

Display a level three groupby/sum from a SPARQL query as rows and columns and filter on years


I have some RDF data which I simplified with the below table.

Subject StartDate EndDate Location Technology Value
Subject1 1-1-2005 1-1-2025 ZoneA TechnologyA 100
Subject2 1-1-2007 1-1-2025 ZoneB TechnologyA 50
Subject3 1-1-2009 1-1-2025 ZoneA TechnologyB 80
Subject4 1-1-2011 1-1-2025 ZoneB TechnologyB 100
Subject5 1-1-2007 1-1-2025 ZoneA TechnologyA 20
Subject6 1-1-2009 1-1-2025 ZoneB TechnologyA 10
Subject7 1-1-2011 1-1-2025 ZoneA TechnologyB 60
Subject8 1-1-2013 1-1-2025 ZoneB TechnologyB 70

I want to write a SPARQL query that does the following:

  • Groupby the Location, Technology and the Year (in which the Subjects exists), and sum the Value
  • The Year in which the subject exists is determined from the StartDate and EndDate
  • The Location and Technology must be presented as rows and the year as columns.

Expected Output:

Location Technology 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
ZoneA TechnologyA 100 100 120 120 120 120 120 120 120 120 120
ZoneA TechnologyB 0 0 0 0 80 80 140 140 140 140 140
ZoneB TechnologyA 0 0 50 50 60 60 60 60 60 60 60
ZoneB TechnologyB 0 0 0 0 0 0 100 100 170 170 170

I know that SPARQL does not support displaying data directly in a tabular format like a spreadsheet, but I am trying to avoid any necessary post-processing in e.g. python as much as possible. I want to export the expected output above as a csv file.

I really have no idea how to do this with SPARQL. I have done some simple groupby/sum queries and separately also managed to filter on years based on a StartDate and EndDate, but I really don't have a clue how to do the combined set of this.

Can someone please help me out with this one?


Solution

  • This is what you can try :

    Let's take for example year 2005

    • In the SELECT section you should write a SUM clause like this one

      (SUM(?temp2005) AS ?2005)

    • In the WHERE section you should have

    BIND(IF(your condition,?theCapacity, 0) AS ?temp2005)

    ---> So what this is actually doing is binding the variable containing the capacity to an another in a way that we can differentiate them for each year. It will then SUM all the variables associated to ``?temp2005 as a total which is in our case ?2005

    • NOTE! That you will need to create manually a list of years with the VALUES clause in your WHERE section `VALUES (?Year) {(2005) (2006) ... (2015) }

       PREFIX ex: <http://example.com/data#>
       PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
      
       SELECT ?location ?technology
       (SUM(?temp2005) AS ?2005)
       (SUM(?temp2006) AS ?2006)     
       (SUM(?temp2007) AS ?2007)
       (SUM(?temp2008) AS ?2008)
       (SUM(?temp2009) AS ?2009)
       (SUM(?temp2010) AS ?2010) 
       (SUM(?temp2011) AS ?2011)
       (SUM(?temp2012) AS ?2012)
       (SUM(?temp2013) AS ?2013)
       (SUM(?temp2014) AS ?2014)
       (SUM(?temp2015) AS ?2015)
      
      WHERE {
       ?subject a ex:Subject ;
             ex:startDate ?startDate;
             ex:endDate ?endDate;
             ex:location ?location ;
             ex:technology ?technology ;
             ex:value ?value .
      
      
       #Create our Year value   
       BIND(xsd:gYear(?Year) AS ?Year)
       VALUES (?Year) { 
       (2005) (2006) (2007) (2008) (2009) (2010) (2011) (2012) (2013) (2014) (2015)
        }
      
       #EXTRACT the year component from the date and bind them
        BIND(YEAR(?startDate) AS ?startYear)
        BIND(YEAR(?endDate) AS ?endYear)
      
        BIND(IF((?Year = 2005)&&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 
        0) AS ?temp2005) .
        BIND(IF((?Year = 2006) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 
        0) AS ?temp2006) .
        BIND(IF((?Year = 2007) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 
        0) AS ?temp2007) .
        BIND(IF((?Year = 2008) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 
        0) AS ?temp2008) .
        BIND(IF((?Year = 2009) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 
        0) AS ?temp2009) .
        BIND(IF((?Year = 2010) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 
        0) AS ?temp2010) .
        BIND(IF((?Year = 2011) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 
        0) AS ?temp2011) .
        BIND(IF((?Year = 2012) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 
        0) AS ?temp2012) .
        BIND(IF((?Year = 2013) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 
        0) AS ?temp2013) .
        BIND(IF((?Year = 2014) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 
        0) AS ?temp2014) .
        BIND(IF((?Year = 2015) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value, 
        0) AS ?temp2015) .
      
        }
      
      GROUP BY ?location ?technology
      ORDER BY ?location ?technology
      

    This how the ttl with your example looks like :

         @prefix ex: <http://example.com/data#> .
         @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
    
         ex:Subject1 a ex:Subject ;
          ex:startDate "2005-01-01"^^xsd:date ;
          ex:endDate "2025-01-01"^^xsd:date ;
          ex:location "ZoneA" ;
          ex:technology "TechnologyA" ;
          ex:value 100 .
    
         ex:Subject2 a ex:Subject ;
          ex:startDate "2007-01-01"^^xsd:date ;
          ex:endDate "2025-01-01"^^xsd:date ;
          ex:location "ZoneB" ;
          ex:technology "TechnologyA" ;
          ex:value 50 .
    
         ex:Subject3 a ex:Subject ;
          ex:startDate "2009-01-01"^^xsd:date ;
          ex:endDate "2025-01-01"^^xsd:date ;
          ex:location "ZoneA" ;
          ex:technology "TechnologyB" ;
          ex:value 80 .
    
         ex:Subject4 a ex:Subject ;
          ex:startDate "2011-01-01"^^xsd:date ;
          ex:endDate "2025-01-01"^^xsd:date ;
          ex:location "ZoneB" ;
          ex:technology "TechnologyB" ;
          ex:value 100 .
    
         ex:Subject5 a ex:Subject ;
          ex:startDate "2007-01-01"^^xsd:date ;
          ex:endDate "2025-01-01"^^xsd:date ;
          ex:location "ZoneA" ;
          ex:technology "TechnologyA" ;
          ex:value 20 .
    
         ex:Subject6 a ex:Subject ;
          ex:startDate "2009-01-01"^^xsd:date ;
          ex:endDate "2025-01-01"^^xsd:date ;
          ex:location "ZoneB" ;
          ex:technology "TechnologyA" ;
          ex:value 10 .
    
         ex:Subject7 a ex:Subject ;
          ex:startDate "2011-01-01"^^xsd:date ;
          ex:endDate "2025-01-01"^^xsd:date ;
          ex:location "ZoneA" ;
          ex:technology "TechnologyB" ;
          ex:value 60 .
    
        ex:Subject8 a ex:Subject ;
         ex:startDate "2013-01-01"^^xsd:date ;
         ex:endDate "2025-01-01"^^xsd:date ;
         ex:location "ZoneB" ;
         ex:technology "TechnologyB" ;
         ex:value 70 . 
         
    

    Result