Search code examples
javamathinteger-overflowlinear-regression

Linear Regression and Java Dates


I am trying to find the linear trend line for a set of data. The set contains pairs of dates (x values) and scores (y values). I am using a version of this code as the basis of my algorithm.

The results I am getting are off by a few orders of magnitude. I assume that there is some problem with round off error or overflow because I am using Date's getTime method which gives you a huge number of milliseconds. Does anyone have a suggestion on how to minimize the errors and compute the correct results?


Solution

  • Maybe it helps to transform the long value that Date returns into something smaller.

    If you do not need millisecond precision, you can just divide by 1000. Maybe you do not even need seconds, divide by another 60.

    Also, the value is anchored at January, 1st, 1970. If you only need more recent dates, you could subtract the offset to re-base it in 2000.

    The whole idea is to make differences in the data more significant numerically (percentage-wise).