Search code examples
algorithmmathstatisticsnormalizationtrending

How can I normalize trending data?


Say I want to calculate the velocity of two datapoints (A and A'), each having a score, and a time published (A' is a future version of A, and has a higher score). This would be

[A'(score) - A(score)] / [A'(time published) - A (time published)]

What I want to capture are trends with high velocities. This means I want a score going from 20 to 200 having higher weight than 8500 to 9000. So I thought I'd normalize this data by dividing the scores by a baseline.

Ex. if A(score) is 2, and A'(score) is 3, the baseline is 2, so in the formula above,

A'(score) - A(score) would be (3/2 - 2/2)

However, this means that when the numbers are this low, the velocities will be very high (since on the other hand

9000/8500 - 8500/8500 

produces very low velocities, given that time difference is constant in this example only, however normally, time differences are variable).

Is there any way to reduce the impact of low starting scores WHILE at the same time allowing jumps from, say, 20 to 200 being significant? Thank you.


Solution

  • There are two ways to look at this. Either could give you what you want.

    • My first thought was that your question came very close to providing your answer. You gave yourself an important hint by calling your first calculation your velocity - your rate of change of a score over time. You could then look at its acceleration - your rate of change of the velocity over time. That's:

      (A''(score) - A'(score)) - (A'(score) - A(score))

      Note, I'm not dividing by time, because you say the time difference is constant for each measurement. Then you're dividing each value by a constant, which is inefficient and probably doesn't give you any further clarity.

    • More likely, though, it seems you want how significant the change is from one score to the next. I suspect what you want is:

      (A'(score) - A(score)) / A(score)

      This is (a - b) / b, which reduces down to (a/b) - 1. If you don't care about the -1, the simplest way you can see the relevant change in your score is:

      A'(score)/A(score)

    This shows the rate of growth of the score from one step to the next.


    Edit, after clarification:

    Given your comment, a variable rate of time makes the logic more complicated, but still do-able.

    In that case, you do want to calculate velocity, as you were doing:

    V = A'(score) - A(score) / A'(time) - A(time)
    

    But you want to normalize it based on the previous velocity:

    result = V'/V
    

    This then becomes similar to the "acceleration" example - it requires 3 samples to have a good idea of the rate of change of the rate of change. If you spell it out all the way, you get something like:

    result = (A''(score) - A'(score))/(A''(time) - A'(time)) / (A'(score) - A(score))/(A'(time) - A(time))
    

    You can do some math to shove these numbers around, but there's really no prettier result than that.