Search code examples
xqueryzorbajsoniq

Zorba with more than just a filesystem


Is it possible to use Zorba (and xquery/jsoniq) to query documents stored in something other than a filesystem? I saw a slidedeck from 28msec and others that suggested they had done this, but I wasn't sure how. I didn't know if they used 3rd party code or something else.

I tried and liked Basex but it doesn't support JSONiq and I believe doesn't scale out.


Solution

  • Zorba can query not only the local file system, but also any documents stored in a way accessible via a REST API.

    First, Zorba provides a few built-in modules to connect to Couchbase, SQL databases, etc

    http://www.zorba.io/documentation/latest/modules/connectors

    Second, support for more stores can be implemented using the REST module:

    http://www.zorba.io/documentation/latest/modules/zorba/io/http-client

    For each new document store to be supported, it is common to create a new user-defined module that wraps the REST calls into a JSONiq module API that mimics that of a document store, in a way that maps naturally to the underlying REST API (e.g., connect, get, put, update, delete, ...). The parameters of this module's functions can be JSON objects or XML documents almost identical to those actually passed as content/body to the REST API (e.g., query by example).

    With knowledge of the module syntax, this usually takes about 1000 lines of code and a few days to do, and if applicable can easily be shared with other users for example on GitHub as it is mostly a single module file. Some such modules may be available online.

    It should also be said that Zorba supports the standardized EXPath HTTP client as well (similar to the other one, but with general parameters passed in XML format instead of JSON). This means that any modules designed to query XML-based document stores should even be interoperable with XQuery/JSONiq engines besides Zorba.

    Document stores that do not support REST can also be supported, but this requires C++ coding and is significantly more involved.

    Since you mention scaling up, I should also mention Sparksoniq, which scales up to query JSON data (tested up to a few billion objects) stored on HDFS.

    I hope this helps you further.