Search code examples
elasticsearchlogstashlogstash-jdbc

sql_last_value resets whenever logstash restarts


Whenever I logstash restarts it start fetching database records from scratch, here is my .conf file which I am executing

input {
    jdbc {
        # Postgres jdbc connection string to our database, mydb
        jdbc_connection_string => "jdbc:postgresql://**************:5432/******"
        # The user we wish to execute our statement as
        jdbc_user => "*******"
        jdbc_password => "******"
        # The path to our downloaded jdbc driver
        jdbc_driver_library => "/usr/share/logstash/lib/postgresql-42.2.9.jar"
        # The name of the driver class for Postgresql
        jdbc_driver_class => "org.postgresql.Driver"
        jdbc_paging_enabled => true
        # the column which should be stored to .logstash_jdbc_last_run
        tracking_column => "unix_ts_in_secs"
        use_column_value => true
        tracking_column_type => "numeric"
        schedule => "*/5 * * * * *"
        jdbc_page_size => 500
        # sql_last_value is stored in this file so each pipeline should have different
        # file name so it doesn't get mixed
        last_run_metadata_path => "/etc/logstash/conf.d/lastrun/.logstash_jdbc_last_run_alerts"
        # our query
        statement_filepath => "/etc/logstash/db-queries/alerts-query.sql"
    }
}

filter {
  mutate {
    remove_field => ["unix_ts_in_secs"]
  }
}

output {
    elasticsearch {
        index => "typec"
        document_id => "%{id}"
        hosts => ["************"]
        user => "******"
        password => "******"
    }
}

File: alerts-query.sql

SELECT *
FROM alerts as a
WHERE (
    extract(epoch FROM a."updated_at") > :sql_last_value
    AND a."updated_at" < NOW()
)
ORDER BY a."updated_at" ASC

Is there any way to persist the :sql_last_value and even if logstash restart it starts from the previously stored :sql_last_value instead starting from 0?


Solution

  • The problem is the use_column_value setting.

    Description from the docs:

    When set to true, uses the defined tracking_column value as the :sql_last_value. When set to false, :sql_last_value reflects the last time the query was executed.

    This means that :sql_last_value is not referring to the value in last_run_metadata_path but your tracking_column-value. When you restart Logstash this value is gone/unknown before the first execution. Hence the value defaults to 0 and Logstash fetches all records.

    So if you want that the :sql_last_value survives restarts you should only rely on last_run_metadata_path and remove the use_column_value setting.