Search code examples
elasticsearchlogstashdata-modeling

How to perform a keyword search on Elasticsearch after migrating from RDBMS tables with more than 20 association relationships?


Elasticsearch Keyword Search migrated from RDBMS tables with more than 20 association relationships

Hello. I want to migrate RDBMS data to Elasticsearch and use the search engine function.

I’m not sure if I’m heading in the right direction with the implementation details, and I’m worried that I might have done something wrong from the start, so I decided to post a question.

I would really appreciate it if you could give me some advice or opinions after reading the content.

The range (number) of RDBMS tables that are subject to search is about 20, including mapping tables (1:N relationship setting)

I want to enter a search term in the search box UX and perform a search on all fields that exist in the above table.

Is there a way to maintain the association relationship of RDBMS, migrate data to Elasticsearch, and output all related information stored in the above table when searching with keywords?

(Currently, I use Logstash to create an index for each table, add a common field that can be searched across each index, and get the key of the original table referenced by the join table.

I’m sending another request to RDBMS with this key and getting data, so I think I’m not using ES’s features properly.)

Thank you for reading this long post.


Solution

  • Hey and welcome to the community,

    Well, the short answer for your problem is no; you can't replicate the logic of a relational database into a non-relational distributed system like elasticsearch, here is why!

    The answer for your problem is denormalizing your data, elasticsearch wrote about it before in an outdated version but still helpful, you can read about it in detail to understand more.

    You can still implement the same relational logic by using field join types but it's not recommended at all because of performance issues, read this and this too.

    How denormalizing your data

    Before jumping straight to logstash and start indexing your data, you need to first have the full relation between each table. For examle consider a One-too-Many relation of users and blogpost, each user can write many blogposts. A simple SQL database would be:

    Table: users
    
    id    Full name
    __________________
    1    John Smith
    2    Adam Doe
    
    
    table: blogposts
    __________________
    title    user_id url
    aaaa     1       https://example.com/aaaa
    bbbb     2       https://example.com/bbbb
    
    

    Denormalizing meaning instead of storing two tables (aka indexes) in elasticsearch, you'll need to store only one (or redundant copy of your data, this will improve performance in term of seach). In other words you'll index the result of the SQL query that performs the join. In this case the result of the SQL query would be something like:

    title    user_id url                        Full name
    _______________________________________________________
    
    aaaa     1       https://example.com/aaaa   John Smith
    bbbb     2       https://example.com/bbbb   Adam Doe
    

    And in elasticsearch you'll create an index call it for example users_blogpost and the documents indexed would be the rows of the SQL results:

    {
      "title":     "aaaa",
      "user_id":    1,
      "url":      "https://example.com/aaaa",
      "full_name":      "John Smith"
    },
    {
      "title":     "bbbb",
      "user_id":    2,
      "url":      "https://example.com/bbbb",
      "full_name":      "Adam Doe"
    }
    

    You can write a script (here is a list of ES clients, choose your favorite language) that does a batch process or chunk to your sql table and index the sql result to Elasticsearch. But Before performing any index you need to write your mapping and define the appropriate type for each field.

    Hope it helps,

    Marwane.