Search code examples
elasticsearchmappingsearch-enginetokenizenest

Mapping and indexing Path hierarchy in Elastic NEST to search with in directory paths


I need to search for files and folder with in specific directories. In order to do that, elastic asks us to create the analyzer and set the tokenizer to path_hierarchy

PUT /fs
{
    "settings": {
        "analysis": {
            "analyzer": {
                "paths": {
                    "tokenizer": "path_hierarchy"
                }
            }
        }
    }
}

Then, create the mapping as illustrated below with two properties: name (holding the name of the file) and path (to store the directory path):

PUT /fs/_mapping/file
{
    "properties": {
        "name": {
            "type": "string",
            "index": "not_analyzed"
        },
        "path": {
            "type": "string",
            "index": "not_analyzed",
            "fields": {
                "tree": {
                    "type": "string",
                    "analyzer": "paths"
                }
            }
        }
    }
}

This requires us to index the path of the directory where the file lives:

PUT /fs/file/1
{
  "name": "README.txt",
  "path": "/clinton/projects/elasticsearch",
}

The Question: How can i create this mapping in NEST Elastic using c#?


Solution

  • The analyzer is created by declaring a custom analyzer, and then setting its tokenizer to "path_tokenizer":

                    //name of the tokenizer  "path_tokenizer"
                    string pathTokenizerName = "path_tokenizer";
    
                    //the name of the analyzer
                    string pathAnalyzerName = "path";
    
                    PathHierarchyTokenizer pathTokenizer = new PathHierarchyTokenizer();
    
                    AnalyzerBase pathAnalyzer = new CustomAnalyzer
                    {
                        Tokenizer = pathTokenizerName,
                    };
    

    The second step is creating the index with required analyzer and mapping, in the code below the property "LogicalPath" will keep the locations of directories in the system"

                    //Create the index,
                         elastic.CreateIndex(_indexName, i => i
                            .NumberOfShards(1).NumberOfReplicas(0)
                            // Adding the path analyzers to the index.
                                .Analysis(an => an
                                    .Tokenizers(tokenizers => tokenizers
                                        .Add(pathTokenizerName, pathTokenizer)
                                    )
                                    .Analyzers(analyzer => analyzer
                                        .Add(pathAnalyzerName, pathAnalyzer)
                                    )
                                )
                            // Add the mappings
                                .AddMapping<Document>(t => t
                                    .MapFromAttributes()
                                        .Properties(props => props
                                        //associating path tokenizer with required property  "Logical Path"
                                            .String(myPathProperty => myPathProperty
                                                 .Name(_t => _t.LogicalPath)
                                                 .IndexAnalyzer(pathAnalyzerName)
                                        )
                                )
                            ));