I have an n
-by-1
vector where n = 20000
. I would like to do a decile ranking for the data in this vector, which is basically replacing the value of each element by its corresponding decile.
I am currently doing it this way:
deciles = quantile(X,9);
X = discretize(X,[-inf deciles inf]);
Where X
is my array of data. I'm doing this because I want to have 10 groups of data with the same number in each of them.
Can you validate this procedure or let me know if there is a more robust way to do so?
You can easily verify that what you have is correct by creating simple data of a known size.
nGroups = 10;
nPerGroup = 10000;
X = linspace(0, 1, nGroups * nPerGroup);
deciles = quantile(X, nGroups - 1);
X = discretize(X,[-inf deciles inf]);
nPerGroup = arrayfun(@(x)sum(X == x), 1:nGroups)
%// 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000
Another alternative is to instead sort your data and then reshape so that the number of columns is the number of desired groups. This approach would rely on only built-in functions
X = linspace(0, 1, nGroups * nPerGroup);
Y = reshape(sort(X), [], nGroups);
Each column is then a different group.