Search code examples
javardfjenasemantic-webowl

How to extract RDF triples from XML file using an existing ontology?


I am trying to extract RDF triples from XML files by using an existing ontology. I am using Java, and can use XPath to extract data from XML and Jena to read and write RDF documents and ontologies. How can I extract the relevant triples from the XML according to the existing ontology?


Solution

  • Forget about XPath to extract triples, it way easier and less problematic with Jena.

    You can use the interface SimpleSelector together with model.listStatements from Jena.

    In this example I am using SimpleSelector to find all the triples with a single property but you can implement the any search you need by customizing the method selects.

    FileManager fManager = FileManager.get();
    Model model = fManager.loadModel("some_file.rdf");
    
    Property someRelevantProperty = 
        model. createProperty("http://your.data.org/ontology/",
                              "someRelevantProperty");
    
    SimpleSelector selector = new SimpleSelector(null, null, (RDFNode)null) {
        public boolean selects(Statement s)
            { return s.getPredicate().equals(someRelevantProperty);}
    }
    
    StmtIterator iter = model.listStatements(selector);
    while(it.hasNext()) {
       Statement stmt = iter.nextStatement();
       System.out.print(stmt.getSubject().toString());
       System.out.print(stmt.getPredicate().toString());
       System.out.println(stmt.getObject().toString());
    }
    

    You'll find more information here.

    If you describe a bit more the ontology you are using and the type of search you need we might be able to help more.