Search code examples
javascalardf

Parsing RDF items


I have a couple lines of (I think) RDF data

<http://www.test.com/meta#0001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> 
<http://www.test.com/meta#0002> <http://www.test.com/meta#CONCEPT_hasType> "BEAR"^^<http://www.w3.org/2001/XMLSchema#string>

Each line has 3 items in it. I want to pull out the item before and after the URL. So that would result in:

0001, type, Class
0002, CONCEPT_hasType, (BEAR, string)

Is there a library out there (java or scala) that would do this split for me? Or do I just need to shove string.splits and assumptions in my code?


Solution

  • Most RDF libraries will have something to facilitate this. For example, if you parse your RDF data using Eclipse RDF4J's Rio parser, you will get back each line as a org.eclipse.rdf4j.model.Statement, with a subject, predicate and object value. The subject in both your lines will be an org.eclipse.rdf4j.model.IRI, which has a getLocalName() method you can use to get the part behind the last #. See the Javadocs for more details.

    Assuming your data is in N-Triples syntax (which it seems to be given the example you showed us), here's a simple bit of code that does this and prints it out to STDOUT:

      // parse the file into a Model object
      InputStream in = new FileInputStream(new File("/path/to/rdf-data.nt"));
      org.eclipse.rdf4j.model.Model model = Rio.parse(in, RDFFormat.NTRIPLES);
    
      for (org.eclipse.rdf4j.model.Statement st: model) {
           org.eclipse.rdf4j.model.Resource subject = st.getSubject();
           if (subject instanceof org.eclipse.rdf4j.model.IRI) {
                  System.out.print(((IRI)subject).getLocalName());
           }
           else {
                  System.out.print(subject.stringValue());
           }
           // ... etc for predicate and object (the 2nd and 3rd elements in each RDF statement)
      }
    

    Update if you don't want to read data from a file but simply use a String, you could just use a java.io.StringReader instead of an InputStream:

     StringReader r = new StringReader("<http://www.test.com/meta#0001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .");
     org.eclipse.rdf4j.model.Model model = Rio.parse(r, RDFFormat.NTRIPLES);
    

    Alternatively, if you don't want to parse the data at all and just want to do String processing, there is a org.eclipse.rdf4j.model,URIUtil class which you can just feed a string and it can give you back the index of the local name part:

      String uri = "http://www.test.com/meta#0001";
      String localpart = uri.substring(URIUtil.getLocalNameIndex(uri));  // will be "0001" 
    

    (disclosure: I am on the RDF4J development team)