I have been checking out GridGain for a while and came across some features regarding GridGain's SQL capabilities, which led me to some questions (that I couldn't find a firm answer in the docs)
From the examples, there is always an explicit data model. I am using Java, so that means there's always a class definition of the model to be queried for. The examples in the API docs: http://atlassian.gridgain.com/wiki/display/GG60/SQL,+Scan,+And+Full+Text+Queries begin by showing how properties much be annotated, which suggests to me an explicit model is always required. Properties of the model can be annotated for SQL querying such as "@GridCacheQuerySqlField". Is an explicit data model always required? Ideally, I would like a way to not have to explicitly state the model, as my use case does change often and has complex relations.
What subset of SQL queries can be performed through GridGain's SQL API? My use cases often require very complex queries. For example, in the docs (same link as above) it states that "Continuous Queries cannot be used with SQL. Only predicate-based queries are supported." where can I find what subset of SQL is supported (and under what conditions, as the example provided does not perform continuous sql queries unless the condition that queries are predicate-based is met)
Thanks in advance for the insight
GridGain has support for non-fixed data model in Enterprise version, namely portable objects. Portable objects allow you to render data model as a map-like nesting structure which allows dynamic structure changes, indexing and portability across different languages (Java, C#, .NET). You can take a look at portable objects in GridGain Enterprise edition examples and read documentation here: http://entdoc.gridgain.org/latest/Portable+Cross+Platform+Objects In open-source version explicit class definition is always required.
The SQL limitations are described in GridCacheQuery javadoc: http://gridgain.com/sdk/6.5.0/javadoc/org/gridgain/grid/cache/query/GridCacheQuery.html
Group by
and sort by
statements are applied separately on each node, so result set will likely be incorrectly grouped or sorted after results from multiple remote nodes are grouped together.
Aggregation functions like sum
, max
, avg
, etc. are also applied on each node. Therefore you will get several results containing aggregated values, one for each node.
Joins will work correctly only if joined objects are stored in collocated mode or at least one side of the join is stored in REPLICATED
cache.