For my project of analyzing access logs I need to make the Path Hierarchy Tokenizer work. The thing is that the analyzer itself seems to be working fine, just not with my indexed data. I have a feeling that something with the mapping might be wrong.
Note: The Elasticsearch version I am working with is 5.6. Upgrading is not an option. I have made the mistake of using some syntax that was not yet available in v.5.6 so I there is a possibility that there is something wrong with the syntax. I have not been able to spot my mistake, though.
This is part of my custom template:
{
"template": "beam-*"
"order" : 20,
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"custom_path_tree": {
"tokenizer": "custom_hierarchy"
},
"custom_path_tree_reversed": {
"tokenizer": "custom_hierarchy_reversed"
}
},
"tokenizer": {
"custom_hierarchy": {
"type": "path_hierarchy",
"delimiter": "/"
},
"custom_hierarchy_reversed": {
"type": "path_hierarchy",
"delimiter": "/",
"reverse": "true"
}
}
}
},
And this is the mapping. The object field contains paths. I want to be able to search object.tree and object.tree_reversed to identify the most visited categories in an online shop.
"mappings": {
"logs": {
"properties": {
"object": {
"type": "text",
"fields": {
"tree": {
"type": "text",
"analyzer": "custom_path_tree"
},
"tree_reversed": {
"type": "text",
"analyzer": "custom_path_tree_reversed"
}
}
},
When I try this
POST beam-2019.07.02/_analyze
{
"analyzer": "custom_path_tree",
"text": "/belletristik/science-fiction/postapokalypse"
}
I get this
{
"tokens": [
{
"token": "/belletristik",
"start_offset": 0,
"end_offset": 13,
"type": "word",
"position": 0
},
{
"token": "/belletristik/science-fiction",
"start_offset": 0,
"end_offset": 29,
"type": "word",
"position": 0
},
{
"token": "/belletristik/science-fiction/postapokalypse",
"start_offset": 0,
"end_offset": 44,
"type": "word",
"position": 0
}
]
}
The analyzer itself seems to be working perfectly fine and is doing what it is supposed to do.
Yet when I try to build a query
GET beam-2019.07.03/_search
{
"query": {
"term": {
"object.tree": "/belletristik/"
}
}
}
I get no results, although there should be a few hundred.
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Maybe my query is wrong. Or something with the mapping doesn't add up?
The term query will not apply analyzer at query time on input string and hence it tries to match /belletristik/
. If you notice the output of analyser the token generated by it is /belletristik
. There is not slash /
at the end of the generated token. So the input term doesn't match any of the document.
Modify the query as below:
GET beam-2019.07.03/_search
{
"query": {
"term": {
"object.tree": "/belletristik"
}
}
}
You can also use match query instead if you don't want to change the input term for the query. Since match will apply analyzer on /belletristik/
as well. This will hence try to match /belletristik
(token generated by analyser when applied by match query on /belletristik/
) and hence will match the documents.
GET beam-2019.07.03/_search
{
"query": {
"match": {
"object.tree": "/belletristik/"
}
}
}