Search code examples
jsonelasticsearchtemplatesindexingfilebeat

Elasticsearch Filebeat ignores custom index template and overwrites output index's mapping with default filebeat index template


What are you trying to do?

Using Filebeat to take input data as filestream from JSON files in ndjson format and inserting them into my_index in Elasticsearch with no additional keys.


Show me your configs.

Elasticsearch.yml

# ---------------------------------- Cluster -----------------------------------
#
cluster.name: masterCluster
#
# ------------------------------------ Node ------------------------------------
#
node.name: masterNode
#
#----------------------- BEGIN SECURITY AUTO CONFIGURATION -----------------------

# Security features
xpack.security.enabled: false
xpack.security.enrollment.enabled: false

xpack.security.http.ssl.enabled: false
xpack.security.transport.ssl.enabled: false

#----------------------- END SECURITY AUTO CONFIGURATION -------------------------

Filebeat.yml

# ============================== Filebeat inputs ===============================

filebeat.inputs:

- type: filestream

  enabled: true

  paths:
    - /home/asura/EBK/data/*.json

  parser:
    - ndjson:
        keys_under_root: true
        add_error_key: true

# ======================= Elasticsearch template setting =======================

setup.ilm.enabled: false

setup.template:
  name: "my_index_template"
  pattern: "my_index*"

# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:

  hosts: ["localhost:9200"]
  index: "my_index"


What do my_index and my_index_template look like?

Mappings of my_index in Kibana :

{
  "mappings": {}
}

Preview of my_index_template in Kibana :

{
  "template": {
    "settings": {
      "index": {
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_content"
            }
          }
        }
      }
    },
    "aliases": {},
    "mappings": {}
  }
}

What does your input file look like?

input.json

{"filename" :"16.avi", "frame": 131, "Class":"person", "confidence":32, "Date & Time" :"Thu Oct 3 14:02:41 2019", "Others" :"Blue"}
{"filename" :"16.avi", "frame": 131, "Class":"person", "confidence":36, "Date & Time" :"Thu Oct 3 14:02:41 2019", "Others" :"Grey,Blue"}

I drag and drop the above file in the watched folder and the insertion just works.


What does the data look like after inserting into Elasticsearch?

GET Request : http://<host>:<my_port>/my_index/_search?filter_path=hits.hits._source

Response :

{
  "hits": {
    "hits": [
      {
        "_source": {
          "@timestamp": "2022-04-21T21:49:04.084Z",
          "log": {
            "offset": 0,
            "file": {
              "path": "/home/asura/EBK/data/input.json"
            }
          },
          "frame": 131,
          "Class": "person",
          "input": {
            "type": "filestream"
          },
          "ecs": {
            "version": "8.0.0"
          },
          "host": {
            "name": "pisacha"
          },
          "agent": {
            "ephemeral_id": "d389a35d-40f7-4680-a485-8e6939d011ab",
            "id": "c6cb1ce5-ff92-499d-9e3c-e79478795fca",
            "name": "pisacha",
            "type": "filebeat",
            "version": "8.1.3"
          },
          "Date & Time": "Thu Oct 3 14:02:41 2019",
          "Others": "Blue",
          "filename": "16.avi",
          "confidence": 32
        }
      },
      {
        "_source": {
          "@timestamp": "2022-04-21T21:49:04.084Z",
          "agent": {
            "type": "filebeat",
            "version": "8.1.3",
            "ephemeral_id": "d389a35d-40f7-4680-a485-8e6939d011ab",
            "id": "c6cb1ce5-ff92-499d-9e3c-e79478795fca",
            "name": "pisacha"
          },
          "Others": "Grey,Blue",
          "filename": "16.avi",
          "input": {
            "type": "filestream"
          },
          "frame": 131,
          "Class": "person",
          "ecs": {
            "version": "8.0.0"
          },
          "host": {
            "name": "pisacha"
          },
          "confidence": 36,
          "log": {
            "offset": 133,
            "file": {
              "path": "/home/asura/EBK/data/input.json"
            }
          },
          "Date & Time": "Thu Oct 3 14:02:41 2019"
        }
      },
      {
        "_source": {
          "@timestamp": "2022-04-21T21:49:04.084Z",
          "input": {
            "type": "filestream"
          },
          "agent": {
            "id": "c6cb1ce5-ff92-499d-9e3c-e79478795fca",
            "name": "pisacha",
            "type": "filebeat",
            "version": "8.1.3",
            "ephemeral_id": "d389a35d-40f7-4680-a485-8e6939d011ab"
          },
          "ecs": {
            "version": "8.0.0"
          },
          "host": {
            "name": "pisacha"
          },
          "message": "",
          "error": {
            "type": "json",
            "message": "Error decoding JSON: EOF"
          }
        }
      }
    ]
  }
}

It didn't use the template that I specified.


And surprisingly:

Preview of my_index in Kibana after Filebeat has inserted the data :

{
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "Class": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "Date & Time": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "Others": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "agent": {
        "properties": {
          "ephemeral_id": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "id": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "type": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "version": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "confidence": {
        "type": "long"
      },
      "ecs": {
        "properties": {
          "version": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "error": {
        "properties": {
          "message": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "type": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "filename": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "frame": {
        "type": "long"
      },
      "host": {
        "properties": {
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "input": {
        "properties": {
          "type": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "log": {
        "properties": {
          "file": {
            "properties": {
              "path": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          },
          "offset": {
            "type": "long"
          }
        }
      },
      "message": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}

The mapping in my_index_template is HUGE, tens of thousands of lines long. Almost as if it has all the fields that fields.yml has. Also it made a data_stream named my_index for it by default.

Even after setting setup.ilm.enabled: false the data is still getting inserted with all the fields shown in filebeat default index template. I have searched and tried everything I could, I need some guidance here from someone who isn't shooting in the dark.

Version used for Elasticsearch, Kibana and Filebeat : 8.1.3 Please do comment if you need more info :)

References:

  1. Parsing ndjson: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html#_parsers
  2. For using custom index: https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html#index-option-es
  3. For using custom templates: https://www.elastic.co/guide/en/beats/filebeat/current/configuration-template.html
  4. For filtered response: https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#common-options-response-filtering

Solution

  • TLDR;

    I am not sure there is an option to stop Filebeat to add the those fields.

    But you could add a filter processor in your output to remove them.

    # ============================== Filebeat inputs ===============================
    
    filebeat.inputs:
    
    - type: filestream
    
      enabled: true
    
      paths:
        - /home/asura/EBK/data/*.json
    
      parser:
        - ndjson:
            keys_under_root: true
            add_error_key: true
    
    # ======================= Elasticsearch template setting =======================
    
    setup.ilm.enabled: false
    
    setup.template:
      name: "my_index_template"
      pattern: "my_index*"
    
    # ---------------------------- Elasticsearch Output ----------------------------
    output.elasticsearch:
    
      hosts: ["localhost:9200"]
      index: "my_index"
      processors:
      - drop_fields:
          fields: ["agent", "ecs", "host", ...]
    

    If the option to just disable entirely Beats to add some fields in the first place exist it would be a better option. I am just not aware of it.


    EDITS:

    The complete working solution involves Globally Declared Processors.

    filebeat.inputs:
    - type: filestream
    
      # Input Processors act during input stage of processing pipeline
      processors:
      - drop_fields:
          fields: ["key1","key2"]
    
    # ---------------------------- Global Processors ------------------
    # Global processors for fields that are added later by filebeat
    processors:
    - drop_fields:
        fields: ["agent", "ecs", "input", "log", "host"]
    
    

    Reference:

    https://discuss.elastic.co/t/filebeat-didnt-drop-some-of-the-fields-like-agent-ecs-etc/243911/2