I have some RDF data which I simplified with the below table.
Subject | StartDate | EndDate | Location | Technology | Value |
---|---|---|---|---|---|
Subject1 | 1-1-2005 | 1-1-2025 | ZoneA | TechnologyA | 100 |
Subject2 | 1-1-2007 | 1-1-2025 | ZoneB | TechnologyA | 50 |
Subject3 | 1-1-2009 | 1-1-2025 | ZoneA | TechnologyB | 80 |
Subject4 | 1-1-2011 | 1-1-2025 | ZoneB | TechnologyB | 100 |
Subject5 | 1-1-2007 | 1-1-2025 | ZoneA | TechnologyA | 20 |
Subject6 | 1-1-2009 | 1-1-2025 | ZoneB | TechnologyA | 10 |
Subject7 | 1-1-2011 | 1-1-2025 | ZoneA | TechnologyB | 60 |
Subject8 | 1-1-2013 | 1-1-2025 | ZoneB | TechnologyB | 70 |
I want to write a SPARQL query that does the following:
Expected Output:
Location | Technology | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ZoneA | TechnologyA | 100 | 100 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 | 120 |
ZoneA | TechnologyB | 0 | 0 | 0 | 0 | 80 | 80 | 140 | 140 | 140 | 140 | 140 |
ZoneB | TechnologyA | 0 | 0 | 50 | 50 | 60 | 60 | 60 | 60 | 60 | 60 | 60 |
ZoneB | TechnologyB | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 100 | 170 | 170 | 170 |
I know that SPARQL does not support displaying data directly in a tabular format like a spreadsheet, but I am trying to avoid any necessary post-processing in e.g. python as much as possible. I want to export the expected output above as a csv file.
I really have no idea how to do this with SPARQL. I have done some simple groupby/sum queries and separately also managed to filter on years based on a StartDate and EndDate, but I really don't have a clue how to do the combined set of this.
Can someone please help me out with this one?
This is what you can try :
Let's take for example year 2005
In the SELECT section you should write a SUM clause like this one
(SUM(?temp2005) AS ?2005)
In the WHERE section you should have
BIND(IF(your condition,?theCapacity, 0) AS ?temp2005)
---> So what this is actually doing is binding the variable containing the capacity to an another in a way that we can differentiate them for each year. It will then SUM all the variables associated to ``?temp2005 as a total which is in our case ?2005
NOTE! That you will need to create manually a list of years with the VALUES clause in your WHERE section `VALUES (?Year) {(2005) (2006) ... (2015) }
PREFIX ex: <http://example.com/data#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?location ?technology
(SUM(?temp2005) AS ?2005)
(SUM(?temp2006) AS ?2006)
(SUM(?temp2007) AS ?2007)
(SUM(?temp2008) AS ?2008)
(SUM(?temp2009) AS ?2009)
(SUM(?temp2010) AS ?2010)
(SUM(?temp2011) AS ?2011)
(SUM(?temp2012) AS ?2012)
(SUM(?temp2013) AS ?2013)
(SUM(?temp2014) AS ?2014)
(SUM(?temp2015) AS ?2015)
WHERE {
?subject a ex:Subject ;
ex:startDate ?startDate;
ex:endDate ?endDate;
ex:location ?location ;
ex:technology ?technology ;
ex:value ?value .
#Create our Year value
BIND(xsd:gYear(?Year) AS ?Year)
VALUES (?Year) {
(2005) (2006) (2007) (2008) (2009) (2010) (2011) (2012) (2013) (2014) (2015)
}
#EXTRACT the year component from the date and bind them
BIND(YEAR(?startDate) AS ?startYear)
BIND(YEAR(?endDate) AS ?endYear)
BIND(IF((?Year = 2005)&&((?Year>=?startYear) && (?Year<=?endYear)), ?value,
0) AS ?temp2005) .
BIND(IF((?Year = 2006) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value,
0) AS ?temp2006) .
BIND(IF((?Year = 2007) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value,
0) AS ?temp2007) .
BIND(IF((?Year = 2008) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value,
0) AS ?temp2008) .
BIND(IF((?Year = 2009) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value,
0) AS ?temp2009) .
BIND(IF((?Year = 2010) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value,
0) AS ?temp2010) .
BIND(IF((?Year = 2011) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value,
0) AS ?temp2011) .
BIND(IF((?Year = 2012) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value,
0) AS ?temp2012) .
BIND(IF((?Year = 2013) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value,
0) AS ?temp2013) .
BIND(IF((?Year = 2014) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value,
0) AS ?temp2014) .
BIND(IF((?Year = 2015) &&((?Year>=?startYear) && (?Year<=?endYear)), ?value,
0) AS ?temp2015) .
}
GROUP BY ?location ?technology
ORDER BY ?location ?technology
This how the ttl with your example looks like :
@prefix ex: <http://example.com/data#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
ex:Subject1 a ex:Subject ;
ex:startDate "2005-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneA" ;
ex:technology "TechnologyA" ;
ex:value 100 .
ex:Subject2 a ex:Subject ;
ex:startDate "2007-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneB" ;
ex:technology "TechnologyA" ;
ex:value 50 .
ex:Subject3 a ex:Subject ;
ex:startDate "2009-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneA" ;
ex:technology "TechnologyB" ;
ex:value 80 .
ex:Subject4 a ex:Subject ;
ex:startDate "2011-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneB" ;
ex:technology "TechnologyB" ;
ex:value 100 .
ex:Subject5 a ex:Subject ;
ex:startDate "2007-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneA" ;
ex:technology "TechnologyA" ;
ex:value 20 .
ex:Subject6 a ex:Subject ;
ex:startDate "2009-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneB" ;
ex:technology "TechnologyA" ;
ex:value 10 .
ex:Subject7 a ex:Subject ;
ex:startDate "2011-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneA" ;
ex:technology "TechnologyB" ;
ex:value 60 .
ex:Subject8 a ex:Subject ;
ex:startDate "2013-01-01"^^xsd:date ;
ex:endDate "2025-01-01"^^xsd:date ;
ex:location "ZoneB" ;
ex:technology "TechnologyB" ;
ex:value 70 .