Search code examples
database-designdata-miningkeywordanalysisdata-analysis

Analyse and rate users, and rank search results based on criteria


I'm not sure if this should be posted here (as it is a programming-related question) or on maths (as it could be statistical) or even if it is a valid question. Please let me know if you think I've posted on the wrong site!

For my final year project, I'm developing an on-line portfolio website, where users create an account, enter in their skills, rate each skill (from (for example) 1-bad 5-ok -excellent), their employment history, relevant experience, examples of work, and contact information. This as refused on the grounds it's too simple.

To harden the project, I've proposed an employer perspective, allowing employers to intelligently search for potential employees, where each result(user) is ranked on how well they and their skills meets the criteria.

Due to a hiccup on my part, I'm no longer able to change my project and just have to make do with what I've proposed.

edited to make my question clearer

Lets say I have the two following users:

User 1: "PHP" with a skill level of 10 (excellent) and "jQuery" with a skill level of 6 (ok)
User 2: "PHP" with a skill level of 5 (ok) and a "jQuery" skill level of 9(excellent).

And lets say an employer searches for "PHP", are there any tools, theories, or techniques I can research which would allow me to develop a ranking algorithm based all the relevant users skill against a given criteria?
In the example above, User 1 would be ranked before User 2 as they have a higher skill level in PHP. But if "PHP, jQuery" was searched, then User 2 would rank first as they are more relevant to the search.

I hope that explains my problem a bit better!


Solution

  • This is not yet data mining.

    What you are talking about is ranking, as it is done all over the place in information retrieval.

    However, what you mostly seem to be undecided with is which similarity function to use. Well, that is up to you, none of the tools can answer this for you. They'll just give you more choices that you hadn't thought of before. The simplest would be Manhattan distance - i.e. sum up the differences in each given criterion.

    I'd expect this to still be "too easy" for your advisor. You should be able to compile the request into a single SQL query (which literally is: compute score sum, order by sum), and have the database answer it to you with good performance easily. After all, you won't be handling billions of resumees. Sorry.