Search code examples
databaseinfluxdbkapacitor

TICKScript never resets Level to OK


I’m writing a TickScript that acts on a series of points that can have exactly two outcomes.

Either the result is pass or “not pass” (usually some variant of exit NUM).

The script I have looks sort of like this:

// RP: autogen
// Monitor the result of updates
// WARNING if the result is anything other than pass
batch
    |query('''SELECT * FROM "mydb"."autogen"."measurement"''')
        .period(25h)
        .every(24h)
        .groupBy('host')
    |alert()
        .id('kapacitor/{{ .TaskName }}/{{ .Group }}')
        .infoReset(lambda: TRUE)
        .warn(lambda: "result" != 'pass')
        .message(
            '{{ index .Tags "host" }}' +
            '{{ if eq .Level "OK" }} are updating again.' +
            '{{ else }}' +
            'are failing to update.' +
            '{{ end }}'
        )
        .idField('id')
        .levelField('level')
        .messageField('description')
        .stateChangesOnly()
    @alertFilterAdapter()
    @alertFilter()

The script does seem to sort of do its thing, but has a critical issue of never setting the Level back to OK.

If I feed influx these 4 points:

time                host     name                   result
----                ----     ----                   ------
1544079584447374994 fakeS176 /usr/bin/yum update -y pass
1544079584447374994 fakeS177 /usr/bin/yum update -y exit 1
1544129084447375177 fakeS176 /usr/bin/yum update -y exit 1
1544129084447375177 fakeS177 /usr/bin/yum update -y pass

I would expect 1 warning, and 1 OK. Where all of the timestamps listed above are within the 25 hour period.

However what actually happens is that I get 2 warns and no OKs.

Could someone give some advice on how to move forward?


Solution

  • Update - a coworker told me about a nodes I had no idea about. Adding a last() node and adding an as(), then removing the infoReset() node seemed to do it.

    // RP: autogen
    // Monitor the result of updates
    // WARNING if the result is anything other than pass
    batch
        |query('''SELECT * FROM "mydb"."autogen"."measurement"''')
            .period(25h)
            .every(24h)
            .groupBy('host')
        |last('result')
             .as('result')
        |alert()
            .id('kapacitor/{{ .TaskName }}/{{ .Group }}')
            .warn(lambda: "result" != 'pass')
            .message(
                '{{ index .Tags "host" }}' +
                '{{ if eq .Level "OK" }} are updating again.' +
                '{{ else }}' +
                'are failing to update.' +
                '{{ end }}'
            )
            .idField('id')
            .levelField('level')
            .messageField('description')
            .stateChangesOnly()
        @alertFilterAdapter()
        @alertFilter()
    

    Screw this blasted language.