I have a table which has two columns: ID
(primary key, auto increment) and keyword
(text, full-text index).
The values entered in the keyword
column include the following:
keyword
Suppose that we have this sentence as an input:
"Find sports car sales statistics in Manhattan."
I'm looking (and I have been searching for quite a while) to find either a MySQL query or an algorithm which takes in the given input, and detects the keywords used from the keywords
column, resulting in an output of:
"Sports cars", "Car sales", "Statistics"
In other words, I'm trying to take an input that's in the form of a sentence, and then match all the existing (and most relevant) keyword values in the database that are found in the sentence. Note that these keywords could be phrases that consist of words separated by a space.
After researching I got to know that MySQL does a similar job through its full-text search feature. I have tried all the natural language, boolean, and query expansion options, but they include keyword records that only have half of its contents matching with the input. For example, it outputs:
"Car", "Car sales", "Sports cars", "Sports foo", "Cars bar", "Statistics".
I don't want this to happen because it includes words that aren't even in the input (i.e. foo and bar).
Here's the MySQL query for the above mentioned search:
SELECT * FROM tags WHERE MATCH(keyword) AGAINST('Find sports car sales statistics in Manhattan.' IN BOOLEAN MODE)
I also tried to improve on the relevancy, but this one only returns a single record:
SELECT *, SUM(MATCH(keyword) AGAINST('Find sports car sales statistics in Manhattan.' IN BOOLEAN MODE)) as score FROM tags WHERE MATCH(keyword) AGAINST('Find sports car sales statistics in Manhattan.' IN BOOLEAN MODE) ORDER BY score DESC
If we suppose that you have your column in a list as a pythonic way for such tasks you can use set.intersection
to get the intersection between two set (the second element could be another iterables like list or tuple) :
>>> col={'Car','Car sales','Cars','Sports cars','Sports foo','Car bar','Statistics'}
>>> col={i.lower() for i in col}
>>> s="Find sports car sales statistics in Manhattan."
>>> col.intersection(s.strip('.').split())
set(['car', 'statistics'])
And in your case you can put the result of your query within a set
or convert it to set
.
Note : the following set comprehension will convert the elements if your column to lower case :
>>> col={i.lower() for i in col}
But this recipe will find the intersection between your column and the splitted string with white spaces. so the result will be :
set(['car', 'statistics'])
As another way you can use re.search
:
>>> col={'Car','Car sales','Cars','Sports cars','Sports foo','Car bar','Statistics'}
>>> s='Find sports car sales statistics in Manhattan.'
>>> for i in col:
... g=re.search('{}'.format(i),s,re.IGNORECASE)
... if g:
... print g.group(0)
...
statistics
car sales
car
As a simple way you can use a function like following to get a combinations of your phrases :
from itertools import permutations
def combs(phrase):
sp=phrase.split()
com1=[map(lambda x:' '.join(x),li) for li in [permutations(sp,j) for j in range(1,len(sp)+1)]]
for i,k in enumerate(sp):
if not k.endswith('s'):
sp[i]=k+'s'
com2=[map(lambda x:' '.join(x),li) for li in [permutations(sp,j) for j in range(1,len(sp)+1)]]
return com1+com2
print {j for i in combs('Car sales') for j in i}
set(['Car', 'sales', 'sales Cars', 'Car sales', 'Cars sales', 'sales Car', 'Cars'])
Note that this function could be more efficient and complete.