I have two tables viz. Total_Data and Distinct_S1.
Total_Data has 3.5 million rows. Fields: "S1", "S2", "S3", "S4"
Distinct_S1 has 1 million rows. Fields: "S1", "frequency". "S1" of Distinct_S1 consists of all distinct values which occur in "S1" of Total_Data.
Task at hand: "frequency" of Distinct_S1 for S1 = 'xxyy' (suppose) should consist of the number of times 'xxyy' occurs in Total_Data.
I used the following query from within a python script(MySQLdb imported) to accomplish the Task at hand:
cur.execute("update Distinct_S1 set frequency=(select count(*) from Total_Data where S1='%s') where S1='%s'"%(S1_val, S1_val)
The above query works fine but it seems to take a lot of time. Is there any other faster way to achieve the same?
You can use the multi-table update syntax with a join to update all rows at once. See the manual for the syntax.
But a much easier way is to also insert all the values in the table directly without first populating it with the distinct values.
INSERT INTO Distinct_S1 (S1, frequency)
SELECT S1, COUNT(*)
FROM Total_Data
GROUP BY S1