Search code examples
mysqljoinstored-proceduresinner-join

Filter large number of records on mysql when using INNER JOIN with two fields


I'm working on existing database with millions of inserts per day. Database design itself pretty bad and filtering records from it takes huge amount of time. we are in the process of moving this to ELK cluster but in the mean time I have to filter some records for immediate use.

I have two tables like this

table - log_1

        datetime         |   id     |   name    |   ip

    2017-01-01 01:01:00  |  12345   |    sam    |     192.168.100.100


table - log_2

         datetime        |   mobile    |   id

    2017-01-01 01:01:00  |  999999999  |  12345   

I need to filter my data using ip and from the log_1 and datetime on both log_1 and log_2. to do that I use below query

SELECT log_1.datetime, log_1.id, log_1.name, log_1.ip, log_2,datetime, log_2.mobile, log_2.id
FROM log_1
INNER JOIN log_2
ON log_1.id = log_2.id AND log_1.datetime = log_2.datetime
where log_1.ip = '192.168.100.100'
limit 100

Needless to say this take forever to retrieve results with such large number of records. is there any better method I can do the same thing without waiting long time mysql to respond ?. In other words how can I optimized my query against such large database.

database is not production and it's for just analytics


Solution

  • First of all, your current LIMIT clause is fairly meaningless, because the query has no ORDER BY clause. It is not clear which 100 records you want to retain. So, you might want to use something like this:

    SELECT
        l1.datetime,
        l1.id,
        l1.name,
        l1.ip,
        l2.datetime,
        l2.mobile,
        l2.id
    FROM log_1 l1
    INNER JOIN log_2 l2
        ON l1.id = l2.id AND l1.datetime = l2.datetime
    WHERE
        l1.ip = '192.168.100.100'
    ORDER BY
        l1.datetime DESC
    LIMIT 100;
    

    This would return the 100 most recent matching records. As for speeding up this query, one way to at least make the join faster would be to add the following index on the log_2 table:

    CREATE INDEX idx ON log_2 (datetime, id, mobile);
    

    Assuming MySQL chooses to use this index, it should make the join much faster, because each id and datetime value can be looked up in a B-tree instead of doing a manual scan of the entire table. Note that the index also covers the mobile column, which is needed in the select.