Search code examples
javamysqldistributed-system

Pagination between separated data


I am troubling in this issue.

I am having 2 separated systems.

But I need do to a query that need the data from System A to query some data from the System B.

And the data from system B need to paged (20 records per page).

I can do this with with something like get 100 first record in System A then query in system B and maybe it can take 15 record from it then I need to take 100 more in system A to fill for page 1 of system B.

But for the total page I think i need to loop over all record in system A to do that.

I think it's very bad for prod when I have million record in each systems.

Please help me with this issue.

Thanks so much and sorry for my bad English.


Solution

  • Short answer: use a db view ( and consider a materialized view ).

    Take database for system A, build a dblink on database for system B, create a view that joins the two data ( consider materialized view too ). If you have this view, use in system A.

    If you cannot create a dblink, create an etl that reads periodically ( each night ? each hour ? depends on your requirements ) from system B, store in a table in database for system A and then make a view in your system A to use these data.

    When you have all data on one system, you can consider some in memory database (like Redis, or similar ) to store key data and be able to query this "big data" efficently.

    Consideration to do before start coding:

    • always consider first why you do this and frequency: how many times each minute, etc..
    • you have millions of data, but what about data growth ? How much this data will become in one month? One year? Ten year ?
    • don't do a query each client request, instead cache result when it's possible. Not every client request should call two databases and consider how much your queries will impact on sytem B ( and if there are some limitation on this system too, lime rate limit, etc.. )
    • maybe you don't want all data ( like all columns ) but instead a subset of data. Don't do a "join" with everything, consider instead only what you really need
    • test with real world data, don't stop on local machine / test environment
    • assume the worst: 10x data, 10x frequency requests
    • consider how do you document all this and how to share this knowledge. Link a system to another is always a pain point in your architecture