recently I am taking interest in Elasticsearch analyzer.I understand what is token graph,start_offset,end_offset,position and positionLength.
Index schema
PUT synonym_graph_index
{
"settings": {
"number_of_replicas": 0,
"analysis": {
"analyzer": {
"synonym_graph_analyzer":{
"type":"custom",
"tokenizer":"standard",
"filter":["synonym_filter"]
}
},
"filter": {
"synonym_filter":
{
"type":"synonym_graph",
"synonyms":["wi fi => wifi,hotspot,fast network"]
}
}
}
},
"mappings": {
"properties": {
"text_field": {
"type": "text",
"analyzer": "synonym_graph_analyzer"
}
}
}
}
I add a document in it.
POST synonym_graph_index/_analyze
{
"analyzer": "synonym_graph_analyzer"
, "text": "Airtel wi fi is up and down"
}
Result of analysis
{
"tokens" : [
{
"token" : "Airtel",
"start_offset" : 0,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "wifi",
"start_offset" : 7,
"end_offset" : 12,
"type" : "SYNONYM",
"position" : 1,
"positionLength" : 2
},
{
"token" : "hotspot",
"start_offset" : 7,
"end_offset" : 12,
"type" : "SYNONYM",
"position" : 1,
"positionLength" : 2
},
{
"token" : "fast",
"start_offset" : 7,
"end_offset" : 12,
"type" : "SYNONYM",
"position" : 1
},
{
"token" : "network",
"start_offset" : 7,
"end_offset" : 12,
"type" : "SYNONYM",
"position" : 2
},
{
"token" : "is",
"start_offset" : 13,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "up",
"start_offset" : 16,
"end_offset" : 18,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "and",
"start_offset" : 19,
"end_offset" : 22,
"type" : "<ALPHANUM>",
"position" : 5
},
{
"token" : "down",
"start_offset" : 23,
"end_offset" : 27,
"type" : "<ALPHANUM>",
"position" : 6
}
]
}
to understand better i made table.
By using above table i made the graph also.
the network
token has change its position.Did it happen because i used standard tokenizer and it split fast network
.And one more thing i would like to know that in some case positionlength is not mention.
positionLength
to indicate that this token occupies 2 positions.