Search code examples
azure-data-explorerkql

How to predict when a disk runs out of space?


I collect Free disk space metrics at regular intervals and would like to predict when the disk will be full.

I thought I could use series_decompose_forecast

Here's a sample query:

let DiskSpace = 
range Timestamp from ago(60d) to now() step 1d
| order by Timestamp desc
| serialize rn=row_number() + 10
| extend FreeSpace = case
(
    rn % 5 == 0, rn + 5
    , rn % 3 == 0, rn -4
    , rn % 7 == 0, rn +3
    , rn
)
| project Timestamp, FreeSpace;
DiskSpace
| make-series 
    FreeSpace = max(FreeSpace) default= long(null)
    on Timestamp from ago(60d) to now() step 12h
| extend FreeSpace = series_fill_backward(FreeSpace)
| extend series_decompose_forecast(FreeSpace, 24)
| render timechart 

And the result

enter image description here

The baseline seems like it could show me when it will hit zero (or some other threshold), but if I specify more Points, it excludes more points from the learning process (still unsure if it excludes them from the start or end).

I don't even care for the whole time series, just the date of running out of free space. Is this the correct approach?


Solution

  • It seems that series_fit_line() is more than enough in this scenario. Once you got the slope and the interception you can calculate any point on the line.

    range Timestamp from now() to ago(60d) step -1d
    | extend rn = row_number() + 10
    | extend FreeSpace = rn + case(rn % 5 == 0, 5, rn % 3 == 0, -4, rn % 7 == 0, 3, 0)
    | make-series FreeSpace = max(FreeSpace) default= long(null) on Timestamp from ago(60d) to now() step 12h
    | extend FreeSpace = series_fill_forward(series_fill_backward(FreeSpace))
    | extend (rsquare, slope, variance, rvariance, interception, line_fit) = series_fit_line(FreeSpace)
    | project slope, interception, Timestamp, FreeSpace, line_fit
    | extend x_intercept = todatetime(Timestamp[0]) - 12h*(1 + interception / slope)
    | project-reorder x_intercept
    | render timechart with (xcolumn=Timestamp, ycolumns=FreeSpace,line_fit)
    
    x_intercept
    2022-12-06T01:56:54.0389796Z

    Graph

    Fiddle

    P.S.

    • No need for serialize after order by.
    • No need for order by if you create the range backwards.
    • Null value in a time-series breaks a lot of functionality (fixed with additional series_fill_forward)