I am building a search engine for the list of articles I have. I was advised by a lot of people to use elastic search for full text search. I wrote the following code. It works. But I have a few issues.
1) If the same article is added twice - that is indexdoc is run twice for the same article, it accepts it and adds the article twice. Is there a way to have a "unique key" in the search index.
2) How can I change the scoring / ranking function? I want to give more importance to title?
3) Is this the correct way to do it anyways?
4) How do I show related results - if there is a spelling mistake?
from elasticsearch import Elasticsearch
from crsq.models import ArticleInfo
es = Elasticsearch()
def indexdoc(articledict):
doc = {
'text': articledict['articlecontent'],
'title' : articledict['articletitle'],
'url': articledict['url']
}
res = es.index(index="article-index", doc_type='article', body=doc)
def searchdoc(keywordstr):
res = es.search(index="article-index", body={"query": {"query_string": {"query": keywordstr}}})
print("Got %d Hits:" % res['hits']['total'])
for hit in res['hits']['hits']:
print("%(url)s: %(text)s" % hit["_source"])
def indexurl(url):
articledict = ArticleInfo.objects.filter(url=url).values()
if len(articledict):
indexdoc(articledict)
return
1) You have to specify an id for you document. You have to add the parameter id
when you are indexing
res = es.index(index="article-index", doc_type='article', body=doc, id="some_unique_id")
2) There is more than one way to do this, but for example you can boost title by changing a bit your query:
{"query": {"query_string": {"query": keywordstr, "fields" : ["text", "title^2"]}}
With this change title
will have the double of importance that field text
3) As a proof of concept is not bad.
4) This is a big topic, I think you should check the documentation of suggesters