I have a PostgreSQL database containing news articles parsed from the web. The parser runs every hour and collects new news items and stores them in DB. The users of the app are able to add certain keywords to their profile so that if a new news item is found containing the keyword then they will be notified. Currently I am using SQL query for this, so whenever I get a new news article I try to match it against all keywords added by users and then send out notififcation, but this takes a lot of time. So I am thinking of integrating Elasticsearch. I have come across the percolation query , but I can't find a good documentation around it, so not sure if I will be able to create complex queries with it. Search needs to take into account the following:
Thanks for the precision.
To use percolate query in your case, you would have to :
PUT /percolated_queries_index
{
"mappings": {
"properties": {
"article": {
// Mapping for your article
},
"query": {
"type": "percolator"
},
"user": {
// Mapping for the information related to the user
}
}
}
}
The article
field is required because the article documents that you will percolate will use this mapping. This should probably be the same mapping as the one you use in the article
index. As mentionned in the documentation, you should see this mapping as the pretreatment on the document you will match. For example, you will have to specify a stemming analyzer here.
percolator
field.PUT /percolated_queries_index/_doc
{
"query" : {
// The elasticsearch query corresponding to the user preferences
},
"user": {
// Information for the user, e.g., id, email
}
}
The query corresponds to the user preferences rewritten as an elasticsearch query. For example, a match query for the author of the article and boolean queries for the AND, OR, NOT keywords. This will be probably the difficult part because you will have to write something that transforms the user query into an elasticsearch query. If you can use the query string syntax, it should be much easier.
You should not set an article
field here.
percolate
search query using this article in the document
firled parameter. If the article is already indexed, you can also use directly its id (the syntax is given in the document).GET /percolated_queries_index/_search
{
"query" : {
"percolate" : {
"field" : "query",
"document" : {
// The content of the article
}
}
}
"_source": "user"
}
The response to this query will return the documents containing a search query matching the article document including the user information corresponding to the article. Since you are usually not interested in the search query itself, you can filter to get only the user
field.
The response to this query will give you all the users to which the new article should be sent to.