Search code examples
javasqldatabase-designfull-text-searchsearch-engine

Ranking search Results with Keywords in SQL


I am having trouble starting this idea. I have an SQL database and an html interface with JAVA/Apache(tomcat). I was trying to make a simple database that sorts names of songs and matches to keywords about the songs. Example I have entered keywords like "epic , movies , queen , sing" and a song like queen bohemian rhapsody meets most of the criteria it is an "epic" song from a "movie" and preformed by "queen" so this would make a value of 3. and also in the database a song like queen we are the champion would meet all keywords (epic,movie,queen,sing) -> so this song would be a 4.

I've dabbled with the count functions in sql and some JAVA code but i cant seen to find a way to shift through the keywords, or count them to see which result is better.

Where should I look to learn how to make something like this ?


Thank You very much for the explanation and the links . After hours of reading and re-reading I understand more about databases structure. And have a near functional webpage. I have one last question . The queries give me a result of the song_ID and the count. I have thought of a few ways to get the song name to also display.

1st idea I had was to add the name to the junction table.

2nd was create a temp table (view) and then use that info to seek data from the other tables.

3rd was write a java code to take the orginal results and then only display the song name and count. ....

I think I am missing something easy in SQL abilities to handle the data better (and more efficiently).


Solution

  • Let's say your database has two columns: tag and song_id, as such:

    CREATE TABLE tags(tag STRING KEY, song_id INT);
    

    Then you could do everything in a SQL query:

    SELECT song_id, count(*)
    FROM tags
    WHERE tag IN ("epic", "movie", "queen", "sing")
    GROUP BY song_id
    ORDER BY count(*) DESC;
    

    If you want more complicated text search, i.e. if this solution ends up being too slow, you should look into Apache Lucene and/or Solr.