Twitter recently announced that you can approximate the rank of any given twitter user with high accuracy by inputting their follower count in the following formula:
exp($a + $b * log(follower_count))
where $a=21 and $b=-1.1
This is obviously a lot more efficient than sorting the entire list of users by follower count for a given user.
If you have a similar data set from a different social site, how could you derive the values for $a and $b to fit that data set? Basically some list of frequencies the distribution of which is assumed to be power law.
You have the following model:
y = exp(a + b.log(x))
which is equivalent to:
log(y) = a + b.log(x)
Therefore, if you take logs of your data set, you end up with a linear model, so you can then use linear regression to determine the best-fit values of a
and b
.
However, this all sounds pretty meaningless to me. Who's to say that a given networking site determines user rank using this sort of relationship?