Search code examples
sparqlrdfturtle-rdfshacl

Validating that every subject has a type of class


I have the following Data & Shape Graph.

@prefix hr: <http://learningsparql.com/ns/humanResources#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix schema: <http://schema.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .

hr:Employee a rdfs:Class .
hr:BadThree rdfs:comment "some comment about missing" .
hr:BadTwo a hr:BadOne .
hr:YetAnother a hr:Another .
hr:YetAnotherName a hr:AnotherName .
hr:Another a hr:Employee .
hr:AnotherName a hr:name .
hr:BadOne a hr:Dangling .
hr:name a rdf:Property .

schema:SchemaShape
    a sh:NodeShape ;
    sh:target [
        a sh:SPARQLTarget ;
        sh:prefixes hr: ;
        sh:select """
            SELECT ?this
            WHERE {
                ?this ?p ?o .
            }
            """ ;
    ] ; 

    sh:property [                
        sh:path rdf:type ;
        sh:nodeKind sh:IRI ;
        sh:hasValue rdfs:Class
    ] ; 
.

Using pySHACL:

import rdflib

from pyshacl import validate

full_graph = open( "/Users/jamesh/jigsaw/shacl_work/data_graph.ttl", "r" ).read()

g = rdflib.Graph().parse( data = full_graph, format = 'turtle' )

report = validate( g, inference='rdfs', abort_on_error = False, meta_shacl = False, debug = False )
print( report[2] )

What I think should happen is the SPARQL based target should select every subject in the Data Graph and then verify that there is a path of rdf:type which has a value of rdfs:Class.

I get the following result:

Validation Report
Conforms: True

The expected validation errors should include only the following subjects:

| <http://learningsparql.com/ns/humanResources#BadOne>         |
| <http://learningsparql.com/ns/humanResources#BadTwo>         |
| <http://learningsparql.com/ns/humanResources#BadThree>       |
| <http://learningsparql.com/ns/humanResources#AnotherName>    |
| <http://learningsparql.com/ns/humanResources#name>           |
| <http://learningsparql.com/ns/humanResources#YetAnotherName> |

Is this possible with SHACL? If so, what should the shape file be?


Solution

  • What follows results in the expected validation errors, however, there are still several things I do not understand.

    1. The sh:prefixes hr: ; is not needed. It is designed to supply prefixes for the SPARQL target SELECT statement itself and nothing more.

    2. Inference needed to be disabled. It was inserting triples and trying to validate them. In this use case, that is not what is desired. What should be validated is what is in the schema and nothing else.

    3. I was also thinking that it would not be an issue to put everything into a single graph based on what apparently was a misunderstanding of https://github.com/RDFLib/pySHACL/issues/46.

    graph_data = """
    @prefix hr: <http://learningsparql.com/ns/humanResources#> .
    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix xml: <http://www.w3.org/XML/1998/namespace> .
    @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
    @prefix schema: <http://schema.org/> .
    @prefix sh: <http://www.w3.org/ns/shacl#> .
    
    hr:Employee a rdfs:Class .
    hr:BadThree rdfs:comment "some comment about missing" .
    hr:BadTwo a hr:BadOne .
    hr:YetAnother a hr:Another .
    hr:YetAnotherName a hr:AnotherName .
    hr:Another a hr:Employee .
    hr:AnotherName a hr:name .
    hr:BadOne a hr:Dangling .
    hr:name a rdf:Property .
    """
    
    shape_data = '''
    @prefix hr: <http://learningsparql.com/ns/humanResources#> .
    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix xml: <http://www.w3.org/XML/1998/namespace> .
    @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
    @prefix schema: <http://schema.org/> .
    @prefix sh: <http://www.w3.org/ns/shacl#> .
    
    schema:SchemaShape
        a sh:NodeShape ;
        sh:target [
            a sh:SPARQLTarget ;
            sh:prefixes hr: ;
            sh:select """
                SELECT ?this
                WHERE {
                    ?this ?p ?o .
                }
                """ ;
        ] ; 
    
        sh:property [                
            sh:path ( rdf:type [ sh:zeroOrMorePath rdf:type ] ) ;
            sh:nodeKind sh:IRI ;
            sh:hasValue rdfs:Class
        ] ; 
    .
    '''
    
    data  = rdflib.Graph().parse( data = graph_data, format = 'turtle' )
    shape = rdflib.Graph().parse( data = shape_data, format = 'turtle' )
    
    report = validate( data, shacl_graph=shape, abort_on_error = False, meta_shacl = False, debug = False, advanced = True )
    

    An alternative using a SPARQL based constraint would look like:

    graph_data = """
    @prefix hr: <http://learningsparql.com/ns/humanResources#> .
    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix xml: <http://www.w3.org/XML/1998/namespace> .
    @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
    @prefix schema: <http://schema.org/> .
    @prefix sh: <http://www.w3.org/ns/shacl#> .
    
    hr:Employee a rdfs:Class .
    hr:BadThree rdfs:comment "some comment about missing" .
    hr:BadTwo a hr:BadOne .
    hr:YetAnother a hr:Another .
    hr:YetAnotherName a hr:AnotherName .
    hr:Another a hr:Employee .
    hr:AnotherName a hr:name .
    hr:BadOne a hr:Dangling .
    hr:name a rdf:Property .
    """
    
    shape_data = '''
    @prefix hr: <http://learningsparql.com/ns/humanResources#> .
    @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix xml: <http://www.w3.org/XML/1998/namespace> .
    @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
    @prefix schema: <http://schema.org/> .
    @prefix sh: <http://www.w3.org/ns/shacl#> .
    
    schema:SchemaShape
        a sh:NodeShape ;
        sh:target [
            a sh:SPARQLTarget ;
            sh:select """
                SELECT ?this
                WHERE {
                    ?this ?p ?o .
                }
                """ ;
        ] ; 
    
        sh:sparql [ 
            a sh:SPARQLConstraint ; 
            sh:message "Node does not have type rdfs:Class." ; 
            sh:prefixes hr: ; 
            sh:select """ 
                SELECT $this 
                WHERE { 
                    $this rdf:type ?o . 
    
                    FILTER NOT EXISTS {
                        ?o rdf:type* rdfs:Class
                    }
                    FILTER ( strstarts( str( $this ), str( hr: ) ) ) 
                }
                """ ;
        ]
    .
    '''
    
    
    data  = rdflib.Graph().parse( data = graph_data, format = 'turtle' )
    shape = rdflib.Graph().parse( data = shape_data, format = 'turtle' )
    
    report = validate( data, shacl_graph=shape, abort_on_error = False, meta_shacl = False, debug = False, advanced = True )