Matchmaking algorithm effeciency

I'm designing an application that helps a user to create games by forming the best possible teams.

For example, suppose that 30 people are waiting in a poll to play basketball. A game of basketball is played by 10 players (5 vs 5). Out of the 30 people waiting to get picked, the algorithm selects the best possible combination of 5 vs 5 to play against each other.

(To do so, the algorithm looks for a combination of players where there is a maximum probability of draw. This is computed by using two quantifiers characterizing each players: the average skill of the player, and the confidence factor of the guessed rating.)

I want to see how fast is this algorithm.

It needs to go through all possible combination of players. So in this case the total number of combination is C(30,10) right ??

Can we say that the algorithm is O(n!) with n being the total number of players waiting to play ?

Is this problem NP-complete ?? If so, can anyone give me a short way to explain why it is NP-complete (not a full proof, just a valid reasoning will suffice.)

Thank you very much ! :)

Solution

It's not clear how the average skill and the confidence factors should be combined, but most likely, finding a polynomial solution in both nand k (number of players, and number of players per team) would be as hard as proving P = NP.

My intuition is based on the fact that the Subset Sum Problem is NP-Hard, and the current problem appears to be more difficult than it.

In our case, even for a simple approach that would ignore confidence factors, and would just sum up the player skills in order to get the team strength (1), it seems to be NP-Complete.

That is because even if we fix one team, i.e. know the target sum, it's NP hard to determine whether there is another team with the same score (2). Furthermore, answering to the question "is there an equal team?" is easier than answering to "which is the best match for this team?".

Additionally, finding the best match among all pairs of teams is probably harder than finding the best match with one team fixed, thus this would be an intuitive demonstration that the given problem is harder than the subset sum problem.

Remarks:

(1) It's unlikely that taking confidence factors into account in any way would simplify the problem (because in the particular case when they were all equal, we still run into the case of irrelevant confidence factors).

(2) In the subset sum problem no assumption is made about the subset size. However, if there would be a known polynomial time algorithm that would find a subset with sum 0 in polynomial time for any given size, we could apply it for every possible size and this would result in a polynomial algorithm for the arbitrary size (which would prove P = NP).

(3) The value 0 in the sum = 0 constraint is not an essential. It can be any arbitrary value. E.g. because we fixed the size, we can simply subtract from all elements a value equal to targetSum / size, and we would now be searching for a subset with sum 0.

Regarding the other questions:

So in this case the total number of combination is C(30,10) right ?

No, it is actually C(30, 5) * C(25, 5) / 2, because we first need to pick 5 players for the first team, and then to pick, from the rest, 5 players for the second team. The division by two is performed because otherwise each pair would be counted twice and we probably do not want to differentiate between the teams.

Can we say that the algorithm is O(n!) with n being the total number of players waiting to play ?

The complexity of brute force enumeration is O(n^2k / (k! * k!)), where n is the total number of players, and k is the team size. Thus, for small k it is ok. The approach is NP complete if we consider both n and k as variables.

The complexity is like that because the formula for combinations is the following:

C(n, k) = n * (n - 1) * .. * (n - k + 1) / (1 * 2 * .. * k),

and according to the previous point, there are C(n, k) * C(n - k, k) / 2 possibilities.