Search code examples
rdfsparqldotnetrdf

DotnetRdf - ResultSetHandler Loads the results into memory rather than streaming or yeilding it one by one


I'm using the DotnetRDF library for connecting to remote SPARQL Endpoint and executing the SPARQL query.

Currently the application is throwing Out of memory error; I had a look into the DotnetRDF code to find out the root cause of this issue. It seems to be because of saving all the result set into the memory rather than streaming it(After streaming/reading response from HTTPWebResponse).

After getting the successful HTTP response,Part of the Parsing( Eg:SparqlCsvParser) all the results (of type SparqlResult) are added into a List in SparqlResultSet. Could this not lead to out of memory error?

I would like to know whether there are any methods available in DotnetRDF to lazily return the result set one by one rather than loading everything into memory?


Solution

  • See the documentation on the Handlers API which is described as the following:

    The Handlers API is a powerful API that permits the stream processing of RDF and SPARQL Results. It can be used in virtually any part of the API that works with RDF or SPARQL results.

    You can take a look at the API docs for ISparqlResultsHandler for the built-in implementations or write your own as needed.

    Note that doesn't necessarily get the results in a lazy fashion it simply allows you to control how the parsed results are processed. If you need to process them in a lazy fashion you can likely do this by using a blocking queue with a fixed capacity (though you'll likely need to push the parsing onto a background thread for that to work)