Search code examples
arangodbaql

How to access the data from multiple documents within a collection with self Join in ArangoDB


I have stored the data in ArangoDB 2.7.1 as with collection name DSP:

{"content": "Book.xml", "type": "string", "name": "name", "key": 102}
{"content": "D:/XMLexample/Book.xml", "type": "string", "name": "location", "key": 102}
{"content": "xml", "type": "string", "name": "mime-type", "key": 102}
{"content": 4130, "type": "string", "name": "size", "key": 102}
{"content": "Sun Aug 25 07:53:32 2013", "type": "string", "name": "created_date", "key": 102}
{"content": "Wed Jan 23 09:14:07 2013", "type": "string", "name": "modified_date", "key": 102}
{"content": "catalog", "type": "tag", "name": "root", "key": 102}
{"content": "book", "type": "string", "name": "tag", "key": 103} 
{"content": "bk101", "type": {"py/type": "__builtin__.str"}, "name": "id", "key": 103}
{"content": "Gambardella, Matthew", "type": {"py/type": "__builtin__.str"}, "name": "author", "key": 1031} 
{"content": "XML Developer's Guide", "type": {"py/type": "__builtin__.str"}, "name": "title", "key": 1031}
{"content": "Computer", "type": {"py/type": "__builtin__.str"}, "name": "genre", "key": 1031}
{"content": "44.95", "type": {"py/type": "__builtin__.str"}, "name": "price", "key": 1031}
{"content": "2000-10-01", "type": {"py/type": "__builtin__.str"}, "name": "publish_date", "key": 1031}
{"content": "An in-depth look at creating applications with XML.", "type": {"py/type": "__builtin__.str"}, "name": "description", "key": 1031}

Here, a single set {"content": "Book.xml", "type": "string", "name": "name", "key": 102} represents a single document in collection.

Now, I want to access all the documents with the similar value of the key attribute in multiple documents or the value title and price where genre is computer" for the same key value. I have tried an AQL as FOR p IN DSP filter p.name == "publish_date" AND p.content == "2000-10-01" AND p.name == 'title' return p but this is returning an empty set because it is comparing within the single document, Not in a collection.

Like relational database, there is some kind of self join will required but I don't know how the self join will applied. Please tell me how can I access the all documents with same value of the key attribute where publish_date is "2000-10-01". I expect the result of this query the following documents because corresponding to publish_date with value 2000-10-01 the value of key is 1031:

{"content": "Gambardella, Matthew", "type": {"py/type": "__builtin__.str"}, "name": "author", "key": 1031} 
{"content": "XML Developer's Guide", "type": {"py/type": "__builtin__.str"}, "name": "title", "key": 1031}
{"content": "Computer", "type": {"py/type": "__builtin__.str"}, "name": "genre", "key": 1031}
{"content": "44.95", "type": {"py/type": "__builtin__.str"}, "name": "price", "key": 1031}
{"content": "2000-10-01", "type": {"py/type": "__builtin__.str"}, "name": "publish_date", "key": 1031}
{"content": "An in-depth look at creating applications with XML.", "type": {"py/type": "__builtin__.str"}, "name": "description", "key": 1031}

Solution

  • Assuming that the publish date is stored in attribute name and its value in attribute content, you will first need to find all documents with that combination:

    FOR self IN DSP 
      FILTER self.name == 'publish_date' && self.content == '2000-10-01'
      RETURN self
    

    Now, with these documents being found, you can join them with the DSP collection again, filtering out documents that have the same key value, but excluding the already found documents from the initial FOR:

    FOR self IN DSP 
      FILTER self.name == 'publish_date' && self.content == '2000-10-01'
      FOR other IN DSP 
        FILTER other.key == self.key && other._key != self._key 
        RETURN { self, other }
    

    If you're always filtering on name and content and/or key, it may be sensible to index these attributes. It looks like key deserves an index on its own. A hash index should suffice as key will always be equality-compared. name and content (in this order) can be put into a skiplist index.