My goal is to dynamically determine what question should be next in quiz by using statistics of previous answers
So, I have:
I want to somehow combine these paramaters in formula to choose most suitable next question for user
How can I do it?
My idea was to give user a question with median difficulty as first one and then check if user scored less than 50% of maximum, then get questions with 25 percentile difficulty else get 75 percentile. Then repeat this schema on a smaller stint (25-50 percentile or 50-75 percentile and so on)
Let's assume that the player has a fixed function score = f(difficulty)
that gives for each difficulty the expected score percentage. Once we know this function, we can invert it and find the difficulty level that will give us the expected score we want.
However, the function is not known. But we have samples of this function in the form of our previous questions. So, we can fit a function to these samples. If you have knowledge about the form of the dependence, you can include that knowledge in the shape of your fitted function. I will simply assume a truncated linear function:
score = f(difficulty) = max(0, min(m * difficulty + n, 1))
The two parameters that we need to find are m
and n
. If we remove all sample questions where the user scored 100% or 0%, we can ignore the truncation. Then, we have a list of samples that form a linear system of equations:
score1 = m * difficulty1 + n
score2 = m * difficulty2 + n
score3 = m * difficulty3 + n
...
This system will usually not have a solution. So, we can solve for a least-squares solution. To do this, we will incrementally build a 2x2 matrix A
and a 2-dimensional vector b
that represent the system A * x = b
. We will start with the zero matrix and the zero vector. For each question, we will update:
/ A11 A12 \ += / difficulty * difficulty difficulty \
\ A21 A22 / \ difficulty 1 /
/ b1 \ += / difficulty * score \
\ b2 / \ score /
Once we have added at least two questions, we can solve:
m = (A12 * b2 - A22 * b1) / (A12 * A12 - A11 * A22)
n = (A12 * b1 - A11 * b2) / (A12 * A12 - A11 * A22)
And we can find the difficulty for an expected score of P
as:
difficulty = (P - n) / m
Let's do an example. The following table contains a few questions and the state of the function after adding the question.
diff score | A11 A12 A22 b1 b2 | m n
--------------+----------------------------+-------------
70 0.3 | 4900 70 1 21 0.3 |
50 0.4 | 7400 120 2 41 0.7 | -0.005 0.65
40 0.5 | 9000 160 3 61 1.2 | -0.006 0.74
35 0.7 | 10225 195 4 85.5 1.9 | -0.010 0.96
Here is the fitted function and the sample questions:
And if we want to find the difficulty for an expected score of e.g. 75%, we get:
difficulty(0.75) = 21.009