elasticsearch dns logstash elastic-stack logstash-configuration

How hit_cache_size in logstash dns filter works?

I am using dns filter in logstash for my csv file. In my csv file, I have two fields. They are website and count.

Here's the sample content of my csv file:

|website|n|
|www.google.com|n1|
|www.yahoo.com|n2|
|www.bing.com|n3|
|www.stackoverflow.com|n4|
|www.smackcoders.com|n5|
|www.zoho.com|n6|
|www.quora.com|n7|
|www.elastic.co|n8|

Here's my logstash config file:

  input {
       file {
          path => "/home/paulsteven/log_cars/cars_dns.csv"
          start_position => "beginning"
          sincedb_path => "/dev/null"
       }
    }
    filter {
        csv {
            separator => ","
            columns => ["website","n"]
        }
        dns { 
          resolve => [ "website" ] 
          action => "replace" 
          hit_cache_size => 8000 
          hit_cache_ttl => 300 
          failed_cache_size => 1000 
          failed_cache_ttl => 10
        }
    }
    output {
      elasticsearch {
        hosts => "localhost:9200"
        index => "dnsfilter03"
        document_type => "details"
      }
      stdout{}
    }

Here's the sample data passing through logstash:

  {
          "@version" => "1",
              "path" => "/home/paulsteven/log_cars/cars_dns.csv",
           "website" => "104.28.5.86",
                 "n" => "n21",
              "host" => "smackcoders",
           "message" => "www.smackcoders.com,n21",
        "@timestamp" => 2019-04-23T10:41:15.680Z
    }

In the logstash config file, I want to know about hit_cache_size. What is the use of it. I read the guide of dns filter in th elastic website but unable to figure it out. I added the field in my logstash config but nothing happened. can i get any examples for that. I want to know the use of hit_cache_size. What is the job, it's doing in dns filter

Solution

The hit_cache_size allows you to store the result of a successful request, so if you need to run a dns request on the same host will look into the cache instead and only will do a dns lookup if the host is not cached.

If your data has unique hosts then there is no reason to use the hit_cache_size since the hosts only appears once.

The hit_cache_ttl works with the hit_cache_size and says how many seconds the request will be stored in the cache.