Search code examples
mysqlhashgroup-bycountfrequency

Calculating frequency of password hashes efficiently in MySQL


For my bachelor thesis I have to analyze a password leak and I have a table with 2 colums MEMBER_EMAIL and MEMBER_HASH

I want to calculate the frequency of each hash efficiently

So that the output looks like:

Hash    | Amount
----------------
2e3f..  | 345
2f2e..  | 288
b2be..  | 189

My query until now was straight forward:

SELECT MEMBER_HASH AS hashed, count(*) AS amount 
FROM thesis.fulllist 
GROUP BY hashed 
ORDER BY amount DESC

While it works fine for smaller tables, i have problems computing the query on the whole list (112 mio. entries), where it takes me over 2 days, ending in a weird connection timeout error even if my settings regarding that are fine.

So I wonder if there is a better way to calculate (as i can't really think of any), would appreciate any help!


Solution

  • Your query can't be optimized as it's quite simple. The only way I think to improve the way the query is executed is to index the "MEMBER_HASH".

    This is how you can do it :

    ALTER TABLE `table` ADD INDEX `hashed` (`MEMBER_HASH`);