Search code examples
full-text-indexingarangodb

How to create a fulltext index on array attributes?


I'd like to create a fulltext index for an array attribute without needs to redundantly copy all its strings. I tried by defining an index for "attrib[*].string" but this does not work. Am I using the wrong syntax? If not, is it hard to support such an index scenario? For my naive understanding there might be no huge difference in function except the reading function while creating the index ... at least I hope this :-)


Solution

  • The fulltext indexing in ArangoDB versions 2.5 and before only supports indexing a single attribute per fulltext index. Documents containing non-string values inside the index attribute are ignored for indexing. That means neither specifying multiple attribute names works when creating the index nor using an array with multiple string values inside the document.

    I just added a change to the fulltext feature in devel (the to-be 2.6 release) that will allow indexing direct sub-attributes of object values, provided the object member values are strings. Additionally, indexing array values is now supported provided the array values are strings.

    Thus the following will be supported in 2.6:

     var c = db._create("example");
     c.ensureFulltextIndex("translations");
     c.insert({ translations: { en: "fox", de: "Fuchs", fr: "renard", ru: "лиса" } });
     c.insert({ translations: "Fox is the English translation of the German word Fuchs" });
     c.insert({ translations: [ "ArangoDB", "document", "database", "Foxx" ] });
    
     c.fulltext("translations", "лиса").toArray();       // returns only first document
     c.fulltext("translations", "Fox").toArray();        // returns first and second documents
     c.fulltext("translations", "prefix:Fox").toArray(); // returns all three documents
    

    This is probably not exactly what was required (indexing a sub-attribute of each array member of an array index attribute), but should be much closer to what is possible in 2.5.

    In 2.5, the only option is to create a separate attribute in each document, containing all the to-be-indexed text values as a concatenated string. This way everything will be contained in a single text attribute, and this is what the 2.5 fulltext index can handle.