Search code examples
mathstatisticsprobabilitypower-law

Power law curve fitting for social network queries


Twitter recently announced that you can approximate the rank of any given twitter user with high accuracy by inputting their follower count in the following formula:

exp($a + $b * log(follower_count))

where $a=21 and $b=-1.1

This is obviously a lot more efficient than sorting the entire list of users by follower count for a given user.

If you have a similar data set from a different social site, how could you derive the values for $a and $b to fit that data set? Basically some list of frequencies the distribution of which is assumed to be power law.


Solution

  • You have the following model:

    y = exp(a + b.log(x))
    

    which is equivalent to:

    log(y) = a + b.log(x)
    

    Therefore, if you take logs of your data set, you end up with a linear model, so you can then use linear regression to determine the best-fit values of a and b.

    However, this all sounds pretty meaningless to me. Who's to say that a given networking site determines user rank using this sort of relationship?