Search code examples
c#rdfdotnetrdf

Parsing RDF - dotnetrdf c#


This is my first time ever looking at RDF, and after trying, I've no idea how to parse it. I'm looking at some RDF (in Turtle format) used in the AFF4 file system, here's a portion of it:

<aff4://0295fab8-94b7-4435-bdb3-932cf48e40bd>
    a                          aff4:ImageStream ;
    aff4:chunkSize             "32768"^^xsd:int ;
    aff4:chunksInSegment       "2048"^^xsd:int ;
    aff4:compressionMethod     <http://code.google.com/p/snappy/> ;
    aff4:imageStreamHash       "82798a275176aa141a2993ca8931535b1303545a0954473f5c5e55b4d8d5a8e3ebdb9e9323e5ecfaf65f8d379a8e2b9150750f5cf44851cf4edb6a2e05372f42"^^aff4:SHA512 ;
    aff4:imageStreamIndexHash  "039eb2da046cfb8c8d40e6f9b42aae501fb36f9b09b5f29d660d3637f87c37c98c3ee3b995265adff1d2b971fa795317333bf50200e72fdfe9fa96acdb88b6d0"^^aff4:SHA512 ;
    aff4:size                  "185335808"^^xsd:long ;
    aff4:stored                : ;
    aff4:target                <aff4://92015053-5f7b-4e5a-a1e7-901d8943cf1f> ;
    aff4:version               "1"^^xsd:int .

There's a lot of this stuff in the file, but I've no idea how to access any of it, thus far I've cobbled together:

private static void ParseInformationStream(Stream informationStream)
    {

        Console.WriteLine("Parsing information.turtle file: ");

        informationStream.Position = 0;

        TurtleParser turtleParser = new TurtleParser();
        Graph graph = new Graph();
        turtleParser.Load(graph, new StreamReader(informationStream));

        foreach (var triple in graph.Triples)
        {
           Console.WriteLine(triple.Subject);
        }

    }

This prints out some of the data, but if for example, I wanted to access the aff4:compressionMethod (node?) specifically, how would I go about doing that? I've been reading about Sparql, but it all seems a bit overkill for what I need.

Any input or advice would be appreciated.


Solution

  • You can use the methods of the IGraph interface to access the contents of the parsed graph. For example the following will retrieve all image streams (in Turtle "a" is a shortcut for the rdf:type predicate) and print out the compression method for each stream:

    // Get the node for rdf:type
    var rdfType = graph.CreateUriNode(new Uri(RdfSpecsHelper.RdfType));
    // Get the node for the aff4:ImageStream type
    var imageStream = graph.GetUriNode("aff4:ImageStream");
    // Get the node for the aff4:compressionMethod predicate
    var compressionMethod = graph.CreateUriNode("aff4:compressionMethod");
    // Get the streams (the subject of x a aff4:ImageStream in the Turtle)
    var imageStreams = graph.GetTriplesWithPredicateObject(rdfType, imageStream).Select(t => t.Subject);
    foreach (var streamInstance in imageStreams)
    {
        // Get the first compressionMethod value for the stream instance
        var compression = graph.GetTriplesWithSubjectPredicate(streamInstance, compressionMethod)
            .Select(t => t.Object).FirstOrDefault();
        Console.WriteLine("Stream " + streamInstance + " uses compression method " + compression);
    }
    

    For more on accessing nodes an triples in a graph in dotNetRDF, please see https://github.com/dotnetrdf/dotnetrdf/wiki/UserGuide-Working-With-Graphs