Search code examples
elasticsearchneo4jtokenizegraph-databasesgraphaware

How do I tokenize data that is replicated from Neo4j to Elastic search?


In my Neo4j graph, I only need one specific kind of node to be searchable by users. This node has the label "Synonym" and only one property, "alias".

I am using the GraphAware Neo4j Elasticsearch Integration (Neo4j Module) which replicates the graph to elastic search, i.e. it creates an elastic search index for me. I can then make queries like

CALL ga.es.queryNode('{\"query\":{\"match\":{\"alias\":\"mySynonym\"}}}')
YIELD node RETURN node

This works, but I would like to use an n gram tokenizer for my synonyms, i.e. for the "alias" properties. Currently, the query above only returns a result once I type in the full name, i.e. "mySynonym", but not when I only type "myS".

In the module documentation I couldn't find anything about tokenizers. So I tried to update the elastic search index created by the Neo4J Module like this:

PUT neo4j-index-node/_settings
{

    "analysis": {
      "analyzer": {
        "my_analyser": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 20,
          "token_chars": [
            "letter",
            "digit",
            "punctuation"
          ]
        }
      }
    }
}

and then:

    PUT neo4j-index-node/_mapping/Synonym?update_all_types 
{
  "properties": {
    "alias": {
      "type": "text",
      "analyzer": "my_analyser",
      "search_analyzer": "my_analyser"
    }
  }
}

The second command gives me an error:

Mapper for [alias] conflicts with existing mapping in other types:\n[mapper [alias] has different [analyzer]

I read somewhere that it is not possible to change the mapping AFTER the index was created. But my index is created by the Neo4j module and I don't know how to specify the tokenizer beforehand.

Any ideas? Thanks!


Solution

  • It's true that you cannot modify the existing mapping. Remove all existing indexes. Try to create ES template first for Neo4j index (before importing Neo4j data).

    Templates can be created like this:

    PUT _template/template_1
    {
      "template": "te*",
      "settings": {
        "number_of_shards": 1
      },
      "mappings": {
        "type1": {
          "_source": {
            "enabled": false
          },
          "properties": {
            "host_name": {
              "type": "keyword"
            },
            "created_at": {
              "type": "date",
              "format": "EEE MMM dd HH:mm:ss Z YYYY"
            }
          }
        }
      }
    }
    

    .. in template set your indexes pattern. Then inside settings section add your custom analyzer like this:

    PUT my_index
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "my_custom_analyzer": {
              "type":      "custom",
              "tokenizer": "standard",
              "char_filter": [
                "html_strip"
              ],
              "filter": [
                "lowercase",
                "asciifolding"
              ]
            }
          }
        }
      }
    }
    

    .. then start indexing data. I showed you 2 example queries, but you should combine then into one