Elasticsearch Keyword Search migrated from RDBMS tables with more than 20 association relationships
Hello. I want to migrate RDBMS data to Elasticsearch and use the search engine function.
I’m not sure if I’m heading in the right direction with the implementation details, and I’m worried that I might have done something wrong from the start, so I decided to post a question.
I would really appreciate it if you could give me some advice or opinions after reading the content.
The range (number) of RDBMS tables that are subject to search is about 20, including mapping tables (1:N relationship setting)
I want to enter a search term in the search box UX and perform a search on all fields that exist in the above table.
Is there a way to maintain the association relationship of RDBMS, migrate data to Elasticsearch, and output all related information stored in the above table when searching with keywords?
(Currently, I use Logstash to create an index for each table, add a common field that can be searched across each index, and get the key of the original table referenced by the join table.
I’m sending another request to RDBMS with this key and getting data, so I think I’m not using ES’s features properly.)
Thank you for reading this long post.
Hey and welcome to the community,
Well, the short answer for your problem is no; you can't replicate the logic of a relational database into a non-relational distributed system like elasticsearch, here is why!
The answer for your problem is denormalizing your data, elasticsearch wrote about it before in an outdated version but still helpful, you can read about it in detail to understand more.
You can still implement the same relational logic by using field join types but it's not recommended at all because of performance issues, read this and this too.
Before jumping straight to logstash and start indexing your data, you need to first have the full relation between each table. For examle consider a One-too-Many relation of users and blogpost, each user can write many blogposts. A simple SQL database would be:
Table: users
id Full name
__________________
1 John Smith
2 Adam Doe
table: blogposts
__________________
title user_id url
aaaa 1 https://example.com/aaaa
bbbb 2 https://example.com/bbbb
Denormalizing meaning instead of storing two tables (aka indexes) in elasticsearch, you'll need to store only one (or redundant copy of your data, this will improve performance in term of seach). In other words you'll index the result of the SQL query that performs the join. In this case the result of the SQL query would be something like:
title user_id url Full name
_______________________________________________________
aaaa 1 https://example.com/aaaa John Smith
bbbb 2 https://example.com/bbbb Adam Doe
And in elasticsearch you'll create an index call it for example users_blogpost
and the documents indexed would be the rows of the SQL results:
{
"title": "aaaa",
"user_id": 1,
"url": "https://example.com/aaaa",
"full_name": "John Smith"
},
{
"title": "bbbb",
"user_id": 2,
"url": "https://example.com/bbbb",
"full_name": "Adam Doe"
}
You can write a script (here is a list of ES clients, choose your favorite language) that does a batch process or chunk to your sql table and index the sql result to Elasticsearch. But Before performing any index you need to write your mapping and define the appropriate type for each field.
Hope it helps,
Marwane.