Determine the "difficulty" of quiz with multiple weights?

Im trying to determine the "difficultly" of a quiz object.

My ultimate goal is to be able to create a "difficulty score" (DS) for any quiz. This would allow me to compare one quiz to another accurately, despite being made up of different questions/answers.

When creating my quiz object, I assign each question a "difficulty index" (DI), which is number on a scale from 1-15.

15 = most difficult
1 = least difficult

Now a strait forward way to measure this "difficulty score" could be to add up each question's "difficulty index" then divide by maximum possible "difficulty index" for the quiz. ( ex. 16/30 = 53.3% Difficulty )

However, I also have multiple "weighting" properties associated to each question. These weights are again one a scale of 1-5.

5 = most impact
1 = least impact

The reason I have (2) instead of the more common (1) is so I can accommodate a scenario as follows...

If presenting the student with a very difficult question (DI=15) and the student answers "incorrect", don't have it hurt their score so much BUT if they get it "correct" have it improve their score greatly. I call these my "positive" (PW) and "negative" (NW) weights.

Quiz Example A:
Question 1: DI = 1 | PW = 3 | NW = 3
Question 2: DI = 1 | PW = 3 | NW = 3
Question 3: DI = 1 | PW = 3 | NW = 3
Question 4: DI = 15 | PW = 5 | NW = 1

Quiz Example B:
Question 1: DI = 1 | PW = 3 | NW = 3
Question 2: DI = 1 | PW = 3 | NW = 3
Question 3: DI = 1 | PW = 3 | NW = 3
Question 4: DI = 15 | PW = 1 | NW = 5

Technically the above two quizzes are very similar BUT Quiz B should be more "difficult" because the hardest question will have the greatest impact on your score if you get it wrong.

My question now becomes how can I accurately determine the "difficulty score" when considering the complex weighting system?

Any help is greatly appreciated!

Solution

The challenge of course is to determine the difficulty score for each single question.

I suggest the following model:

Hardness (H): Define a hard question such that chances of answering it correctly are lower. The hardest question is such that (1) the chance of answering it correctly are equal to random choice (because it is inherently very hard), and (2) it has the largest number of possible answers. We'll define such question as (H = 15). On the other end of the scale, we'll define (H = 0) for a question where the chances of answering it correctly are 100% (because it is trivial) (I know - such question will never appear). Now - define the hardness of each question by subjective extrapolation (remember that one can always guess between the given options). For example, if a (H = 15) question has 4 answers, and another question with similar inherent hardness has 2 answers - it would be (H = 7.5). Another example: If you believe that an average student has 62.5% of answering a question correctly - it would also be a (H = 7.5) question (this is because a H = 15 has 25% of correct answer, while H = 0 has 100%. The average is 62.5%)
Effect (E): Now, we'll measure the effect of PW and NW. For questions with 50% chance of answering correctly - the effect is E = 0.5*PW - 0.5*NW. For questions with 25% chance of answering correctly - the effect is E = 0.25*PW - 0.75*NW. For trivial question NW doesn't matter so the effect is E = PW.
Difficulty (DI): The last step is to integrate the hardness and the effect - and call it difficulty. I suggest DI = H - c*E, where c is some positive constant. You may want to normalize again.

Edit: Alternatively, you may try the following formula: DI = H * (1 - c*E), where the effect magnitude is not absolute, but relative to the question's hardness.

Clarification:

The teacher needs to estimate only one parameter about each question: What is the probability that an average student would answer this question correctly. His estimation, e, will be in the range [1/k, 1], where k is the number of answers.

The hardness, H, is a linear function of e such that 1/k is mapped to 15 and 1 is mapped to 0. The function is: H = 15 * k / (k-1) * (1-e)

The effect E depends on e, PW and NW. The formula is E = e*PW - (1-e)*NW

Example based on OP comments:

Question 1:

k = 4, e = 0.25 (hardest). Therefore H = 15

PW = 1, NW = 5, e = 0.25. Therefore E = 0.25*1 - 0.75*5 = -3.5

c = 5. DI = 15 - 5*(-3.5) = 32.5

Question 2:

k = 4, e = 0.95 (very easy). Therefore H = 1

PW = 1, NW = 5, e = 0.95. Therefore E = 0.95*1 - 0.05*5 = 0.7

c = 5. DI = 1 - 5*(0.7) = -2.5