I have a MySQL database with 2 tables:
Table A:
Table B:
Initially, I have about 60,000 entries in table A, which has the Location column empty at the beginning. In table B I have about 250,000+ entries with a lot of area codes, calling codes (1, 011) and their respective location in the world. What I want is a FAST way of populating the table A's location column with the location of the number.
So for example if the first entry in Table A is (17324765600, null) I want to read trough table B and get the location for that number. Right now I am getting the location of a number with this query:
SELECT b.location
FROM
tableB b
LEFT JOIN tableA a
ON a.number LIKE CONCAT(b.calling_code, b.code, '%')
ORDER BY CHAR_LENGTH(b.code) DESC
LIMIT 1;
That gives me the proper location (even though I have my doubts that it can fail..). The problem is that performance wise this method is a no go. If I loop over all the 50k number
Update 1
Allow me to put some sample data with the expected output: Sample Table A:
number location 17324765600 NULL 01134933638950 NULL 0114008203800 NULL …60k Records + at the moment..
Sample Table B:
calling_code code location 1 7324765 US-NJ 011 34933 Spain 011 400820 China …250,000+ records at the moment
Expected output after the processing: Table A:
number location 17324765600 US-NJ 01134933638950 Spain 0114008203800 China
The best I’ve come up with is the following update statement:
UPDATE tableA a JOIN tableB b ON a.location LIKE CONCAT(b.calling_code, b.code, '%') SET a.location = b.location
Of course here I am not sure if it will always return the longest prefix of the code, for example if in the above tables there was another code starting with 73247XX let’s say that code is for Iowa (just as an example).. I am not sure if the query will always return the longest code so here I would also need help.
Let me know if the samples help.
.SQL for the database structure: Download
Update 2:
I am thinking on doing this the following way:
Before inserting the data in table A I am thinking of exporting Table B into a CSV and sort it by area code, that way I can have 2 pointers one for the array of entries for table A and one for the csv, both sorted by area code that way I can make a kind of parallel search and populate the entry's location on PHP and not having to do this in MySQL.
Let me know if this approach seems like a better option if so I will test it out and publish the answer.
I decided to take the below approach since I did not received any clear response:
Prior to the process I prepared 2 new tables, a table for country codes and a table for state codes (since I also need to know the state in case the number is within the US). Both tables will have: country, state, calling_code, code …
As for these 2 tables I broke down all the numbers with the prefixes and grouped them by area code so instead of having full 6 numbers to identify a country/state I grouped them by the first 3 numbers and if the code is within the USA or not, hence the 2 tables.
With this modifications I was able to break the 250,000 + rows table to only about 300 rows (each table).
After this I will follow these steps:
This may not be the best approach but for the 50k numbers that are in place at the moment I was able to (manually executing query by query with some more polishing) get it down to about 10 seconds, executing this every x amount of time (which will allow performing this process to less than 10k numbers) will make this smoothly.
I will mark this as the answer but if someone else magically comes up with a better answer I will make sure to update this.
Divide and conquer!