I am trying to implement a search algorithm in my simple datastructure. However, this is not a "HOW DO I DO THIS?"-question, but rather a "how could i optimize the algorithm?"
I am trying to keep a index of files, and each file can be associated with any number of tags (which acts like a category)
This is how my data is structured:
Entries:
------------------------------------
| id | description | short | score |
------------------------------------
Tags:
-------------
| id | text |
-------------
EntryTags:
-------------------
| entry_id | tag_id |
-------------------
In the search field, the search request, will always be turned into single words split apart with a plus (+).
in the following example i will be searching for "blue+website+simple+layout"
- split searchterm up into array named t
- convert each word in array t into a number using the id from "Tags" table
- for each element in array t, select make new array for each element with "EntryTags" matching the search
- generate array A, where elements that are in all 4 arrays are put into
- generate array B, where elements that are in 3 of the 4 arrays are put into
- generate array C, where elements that are in 2 of the 4 arrays are put into
- generate array D with the last elemenets rest
- sort array A,B,C and D by the score parameter from the table
- output array A, then B, then C, then D
ofcourse this is not optimized or anything, but my lack of experience with more complex SQL is kicking my butt :(
In the end, all of this will be written in PHP and the mysqli library (and i will ofcourse keep the thread updated as i progress further)
You could use a kind of Bloom filter (at least this is part of Google's strategy). First you look up for entries with all of the entered tags. If you find nothing, try out all combinations with one tag missing, then with two tags missing ... until you have enough matches. The lookup in the Bloom filter is very fast, so it's okay to do many lookups.