Search code examples
algorithmmathstatisticsdata-analysisanalysis

Formula to get next question in quiz basing on previous statistics


My goal is to dynamically determine what question should be next in quiz by using statistics of previous answers

So, I have:

  • Question with difficulty field (1-100)
  • Maximum score you can get in question (let it be 256)
  • Score user have reached in question (x out of max)

I want to somehow combine these paramaters in formula to choose most suitable next question for user

How can I do it?

My idea was to give user a question with median difficulty as first one and then check if user scored less than 50% of maximum, then get questions with 25 percentile difficulty else get 75 percentile. Then repeat this schema on a smaller stint (25-50 percentile or 50-75 percentile and so on)


Solution

  • Let's assume that the player has a fixed function score = f(difficulty) that gives for each difficulty the expected score percentage. Once we know this function, we can invert it and find the difficulty level that will give us the expected score we want.

    However, the function is not known. But we have samples of this function in the form of our previous questions. So, we can fit a function to these samples. If you have knowledge about the form of the dependence, you can include that knowledge in the shape of your fitted function. I will simply assume a truncated linear function:

    score = f(difficulty) = max(0, min(m * difficulty + n, 1))
    

    The two parameters that we need to find are m and n. If we remove all sample questions where the user scored 100% or 0%, we can ignore the truncation. Then, we have a list of samples that form a linear system of equations:

    score1 = m * difficulty1 + n
    score2 = m * difficulty2 + n
    score3 = m * difficulty3 + n
    ...
    

    This system will usually not have a solution. So, we can solve for a least-squares solution. To do this, we will incrementally build a 2x2 matrix A and a 2-dimensional vector b that represent the system A * x = b. We will start with the zero matrix and the zero vector. For each question, we will update:

    / A11  A12 \  +=  / difficulty * difficulty    difficulty \
    \ A21  A22 /      \       difficulty               1      /
    
    / b1 \ += / difficulty * score \
    \ b2 /    \        score       /
    

    Once we have added at least two questions, we can solve:

    m = (A12 * b2 - A22 * b1) / (A12 * A12 - A11 * A22)
    n = (A12 * b1 - A11 * b2) / (A12 * A12 - A11 * A22)
    

    And we can find the difficulty for an expected score of P as:

    difficulty = (P - n) / m
    

    Let's do an example. The following table contains a few questions and the state of the function after adding the question.

    diff   score  |   A11  A12  A22  b1     b2 |     m     n 
    --------------+----------------------------+-------------
    70      0.3   |  4900   70    1  21    0.3 | 
    50      0.4   |  7400  120    2  41    0.7 | -0.005  0.65
    40      0.5   |  9000  160    3  61    1.2 | -0.006  0.74
    35      0.7   | 10225  195    4  85.5  1.9 | -0.010  0.96
    

    Here is the fitted function and the sample questions:

    Fitted Function

    And if we want to find the difficulty for an expected score of e.g. 75%, we get:

    difficulty(0.75) = 21.009