Search code examples
mathmemorystatisticsforecasting

How can I predict memory usage and time based on historical values


A maths problem really I think... I have some historical data for some spreadsheet outputs along with the number of rows and columns.

What I'd like to do is use this data to predict the peak memory usage and time taken based on the - known - row and columns.

So, if no historical data exists then there will be no predictions. 1 or 2 historical values will be very inaccurate but I hope that given a wide enough variety of historical values, then a reasonably-accurate prediction could be made?

I've got a table on a jsfiddle. Any help or ideas would be really appreciated. I don't really know where to start on this one.

http://jsfiddle.net/JelbyJohn/kwje9chf/3/

<table class="table table-condensed">
</table>

Solution

  • You could fit a linear regression model.

    Since this is a programming site, here is some R code:

    > d <- read.table("data.tsv", sep="\t", header=T)
    > summary(lm(log(Bytes.RAM) ~ log(Rows) + log(Columns), d))
    
    Call:
    lm(formula = log(Bytes.RAM) ~ log(Rows) + log(Columns), data = d)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -0.4800 -0.2409 -0.1618  0.1729  0.6827 
    
    Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  12.42118    0.61820  20.093 8.72e-09 ***
    log(Rows)     0.51032    0.09083   5.618 0.000327 ***
    log(Columns)  0.58200    0.07821   7.441 3.93e-05 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
    
    Residual standard error: 0.4052 on 9 degrees of freedom
    Multiple R-squared: 0.9062, Adjusted R-squared: 0.8853 
    F-statistic: 43.47 on 2 and 9 DF,  p-value: 2.372e-05 
    

    This model explains the data pretty well (the is 0.89) and suggests the following relationship between the size of the spreadsheet and memory usage:

    Bytes.RAM = exp(12.42 + 0.51 * log(Rows) + 0.58 * log(Columns))
    

    A similar model can be used to predict the execution time (the Seconds column). There, the R² is 0.998.