php algorithm similarity levenshtein-distance

"Personality quiz" style comparison [PHP]

I'm trying to think of an efficient or reasonable algorithm to take the results of a test submitted by the user and compare them with the values of several profiles to find a match (like how online dating services match your answers to suitable mates).

I really have no idea how to go about this. If the user answers 10 questions about himself and there are 10 candidates to match him with, we're looking at thousands of comparisons through the database. There must be a better way to do this.

Of the research I've done, maybe I could accomplish this with the Levenshtein distance function, but I don't know how to go about it because I'm not entirely familiar with this and I don't understand it that well. But maybe I could do something like compare the user's results concatenated into a string (e.g. 'AEBCDAABEAD') with the answers of each candidate and measure similarity that way?

Any suggestions?

Thanks much.

Solution

I think using the exact answers is not a solution flexible enough for every purpose, because other answers may not have a deep impact on certain profile types. Someone with 1-2 and 3-4 will still have a non-matching answer, even if the person take 20-25 is way off. Afaik with Levensthein 'AB' and 'AC' are as similar as 'AZ' to 'AB'.

Also the Levensthein algorith is a good idea, I guess you get too worse matches in some cases, if you do this question-based.

Let me describe what technique comes into my mind, when I read you question.

Profile categories and answer weight

I'm thinking of a configuration where you can described a few profiles or attribute categories. Let's take for example food tastes. So our categories may look like: sweet, sour, spicy, normal etc.

Now for your survey I would configure for each question a category weight. which you can accumulate.

Example

Do you like chili con carne
Yes - spicy +3
No - spicy -1

Now you can use a algorithm to determine the distance in each category and weight them in a caculation.

(sweet | spicy | sour | normal)
    -5      15      2        8  // Person 1
    10      -5     10        2  // Person 2
     8      -8      7       12  // Person 3

Now you can compare for example the persons decision and see, that the distance between [2] and [3] is way smaller than between [1] and [2]. Note: I'm not talking about Levensthein distance here, because these values are numeric and a calculation give better results than just not matching characters.

I'm not sure if this is helpful to you, but thiis came into my mind and seemed to be a neat solution.