I have n (in my case just 9) different ranking of the same items. Now, I'm trying to find a combination using PCA (Principal Component Analysis), in order to improve the accuracy of my ranking. The method should be unsupervised, that is I'd like to generate new ranking based.
My idea is to try all the possible subset (without repetitions) of the 9 different ranking and run PCA for every of that. There for I will come out with 501 different new ranking (in the case of n=9). With different subset I obtain better results.
When I say better results I mean that I have the true ranking and when I finish the combination I compare the result of all the ranking (combined and the original 9).
Is this method makes sense?
Your question involves a subset of voting theory and there are many possibilities on how to solve this. Some of the techniques are more flexible than others. For example, some techniques can accomodate ordered rankings of variable sizes (imagine one ranking only contained 5 ordered items, while the other rankings contained 9 ordered items) while others cannot. Some techniques can assign variable weights to the different reviewers. Netflix has very complex proprietary algorithms they use to combine multiple users' movie rankings into overall rankings. That being said, I would say your combinatorial PCA approach does not strike me as either computationally efficient or terribly relevant. If you are taking information from only a subset of your 9 rankings, you are potentially discarding useful (although perhaps subtle) information.
Perhaps the biggest problem with the Borda count is that it does not adaquately handle the different standard deviations of two items that may have very similar average rankings. If we constrain ourselves to the subset of problems where all ordered rankings are of the same size and all have equal weight, I can recommend one method that is intuitive and leads to very good results across a wide range of cases: Aggregate Z-score Minimization. This works as follows:
Effectively, the ranking problem is converted into a classification problem where we are trying to classify each ranking position into the best fitting sampled distribution of each item. The constraint is that only one ranking position can be assigned to each Gaussian item distribution. By minimizing the aggregate z-score distance globally, we are finding the most statistically likely configuration for the "true" ranking.
If you don't want to do the programming to exhaustively calculate the combinatorial sums of step 3, there is a heuristic method I'll demonstrate here that usually gives good results (but not guaranteed to be the best solution).
Consider we have 4 independent rankings of 6 items (A-F) here. Assume the first item listed in each ranking is at ranking position #1:
1. A,C,F,E,B,D
2. D,B,C,E,F,A
3. F,A,B,C,D,E
4. E,A,C,B,D,F
Next compute the mean and standard deviation of each item's ranking positions:
A: (#1, #6, #2, #2); μ = 2.75, σ = 2.217
B: μ = 3.5, σ = 1.291
C: μ = 3.0, σ = 0.816
D: μ = 4.25, σ = 2.217
E: μ = 3.75, σ = 2.062
F: μ = 3.75, σ = 2.217
We can see from the relatively tight spread of means (2.75 to 4.25) that all of the items are contending for about the same average, middle positions. This is a case where the Borda count may tend to perform poorly because the standard deviations become extra important when the averages are all so close. So next, we create the matrix of z-score distances from each item to each possible ranking position:
A: 0.7892, 0.3382, 0.1127, 0.5637, 1.0147, 1.4657
B: 1.9365, 1.1619, 0.3873, 0.3873, 1.1619, 1.9365
C: 2.4495, 1.2247, 0.0000, 1.2247, 2.4495, 3.6742
D: 1.4657, 1.0147, 0.5637, 0.1127, 0.3382, 0.7892
E: 1.3339, 0.8489, 0.3638, 0.1213, 0.6063, 1.0914
F: 1.2402, 0.7892, 0.3382, 0.1127, 0.5637, 1.0147
It's probably obvious, but in the event you had any item with σ = 0, you can assign that item to its exclusive ranking position immediately. Now if you don't want to exhaustively solve this matrix for the ranking combination with the lowest possible aggregate z-score assignment, you can use this heuristic. Sum each column, and then subtract the minimum value from that column to get a value we can call "savings":
sum: 9.2151, 5.3777, 1.7658, 2.5225, 6.1344, 9.9718
min: 0.7892, 0.3382, 0.0000, 0.1127, 0.3382, 0.7892
savings: 8.4259, 5.0395, 1.7658, 2.4098, 5.7962, 9.1826
Take the column with the largest "savings" value and assign the item with the min value to that position. In our example here, this means we will assign the item "D" to the 6th position. After doing this, recompute the sum, min, and savings values, but first remove the "D" item's row and also remove the 6th column (because they have already been assigned). Then assign the new largest "savings" value to the item with the min value in that column. Continue until all rankings are assigned. In this example, the final (heuristic) ranking will be as follows: A, E, C, B, F, D
(aggregate z-score: 3.3783). I didn't check my work, but it looks like the exhaustively solved solution of A, F, C, B, E, D
(aggregate z-score: 3.3612) might be 0.5% better than the heuristic solution.
It's worth noting that the naive solution where we just simply ordered the means A, C, B, E, F, D
(aggregate z-score: 3.8754) is substantially less likely (statistically) to be the best ranking.