Search code examples
phpmysqlsymfonyphone-numbercdr

Longest Prefix between two MySQL Tables


I have a MySQL database with 2 tables:

Table A:

  • Number
  • Location

Table B:

  • Calling Code
  • Area Code
  • Location

Initially, I have about 60,000 entries in table A, which has the Location column empty at the beginning. In table B I have about 250,000+ entries with a lot of area codes, calling codes (1, 011) and their respective location in the world. What I want is a FAST way of populating the table A's location column with the location of the number.

So for example if the first entry in Table A is (17324765600, null) I want to read trough table B and get the location for that number. Right now I am getting the location of a number with this query:

SELECT b.location
FROM 
  tableB b
  LEFT JOIN tableA a
     ON a.number LIKE CONCAT(b.calling_code, b.code, '%')
ORDER BY CHAR_LENGTH(b.code) DESC
LIMIT 1;

That gives me the proper location (even though I have my doubts that it can fail..). The problem is that performance wise this method is a no go. If I loop over all the 50k number

Update 1

Allow me to put some sample data with the expected output: Sample Table A:

number  location
17324765600 NULL
01134933638950  NULL
0114008203800   NULL
…60k Records + at the moment..

Sample Table B:

calling_code    code    location
1   7324765 US-NJ
011 34933   Spain
011 400820  China
…250,000+ records at the moment

Expected output after the processing: Table A:

number  location
17324765600 US-NJ
01134933638950  Spain
0114008203800   China

The best I’ve come up with is the following update statement:

UPDATE tableA a JOIN tableB b ON a.location LIKE CONCAT(b.calling_code, b.code, '%') SET a.location = b.location

Of course here I am not sure if it will always return the longest prefix of the code, for example if in the above tables there was another code starting with 73247XX let’s say that code is for Iowa (just as an example).. I am not sure if the query will always return the longest code so here I would also need help.

Let me know if the samples help.

.SQL for the database structure: Download

Update 2:

I am thinking on doing this the following way:

Before inserting the data in table A I am thinking of exporting Table B into a CSV and sort it by area code, that way I can have 2 pointers one for the array of entries for table A and one for the csv, both sorted by area code that way I can make a kind of parallel search and populate the entry's location on PHP and not having to do this in MySQL.

Let me know if this approach seems like a better option if so I will test it out and publish the answer.


Solution

  • I decided to take the below approach since I did not received any clear response:

    Prior to the process I prepared 2 new tables, a table for country codes and a table for state codes (since I also need to know the state in case the number is within the US). Both tables will have: country, state, calling_code, code …

    As for these 2 tables I broke down all the numbers with the prefixes and grouped them by area code so instead of having full 6 numbers to identify a country/state I grouped them by the first 3 numbers and if the code is within the USA or not, hence the 2 tables.

    With this modifications I was able to break the 250,000 + rows table to only about 300 rows (each table).

    After this I will follow these steps:

    1. I get the list of phone numbers
    2. I first execute a query very similar as the one I posted to update all the numbers that belong to the country_code table
    3. I then update the rows that are still without location assigned with the table of state_code
    4. I had to put some kind of cron in order to get this done every x amount of time to avoid having a huge amount of phones.

    This may not be the best approach but for the 50k numbers that are in place at the moment I was able to (manually executing query by query with some more polishing) get it down to about 10 seconds, executing this every x amount of time (which will allow performing this process to less than 10k numbers) will make this smoothly.

    I will mark this as the answer but if someone else magically comes up with a better answer I will make sure to update this.

    Divide and conquer!