performance of big INSERT with data from huge select

We are generating some crosssell-data for a shop. We want to display stuff like "customers who looked at this product, also looked at these products". To generate this data, we do this query on a daily routine from session-based product-viewed-data.

            INSERT INTO
                product_viewed_together
            (
                product,
                product_associate,
                viewed
            )
            SELECT
                v.product,
                v2.product,
                COUNT(*)
            FROM
                product_view v
            INNER JOIN
                product_view v2
            ON
                v2.session = v.session
                AND v2.product != v.product
                AND DATE_ADD(v2.created, INTERVAL %d DAY) > NOW()
            WHERE
                DATE_ADD(v.created, INTERVAL %d DAY) > NOW()
            GROUP BY
                v.product,
                v2.product;

Table product_view is joined to itself. As this table is quite big (circa 26 million rows), the result is even bigger. The query issues a huge amount of performance and time.

I am not use, we choosed a layout fitting the problem in a good way. Is there a better way to store and generate this data?

Solution

Make the date tests sargable:

DATE_ADD(v.created, INTERVAL %d DAY) > NOW()

-->

v.created > NOW - INTERVAL %d DAY

Is product_view a VIEW? Or a TABLE? If a table, provide two "covering" indexes:

INDEX(created, session, product)  -- (for v)
INDEX(session, created, product)  -- (for v2)

Perhaps all the counts you get are even? This bug can be fixed in about 3 ways, each will double the speed. I think the optimal one is to change one line in the ON to

DATE_ADD(v2.created, INTERVAL %d DAY) > NOW()

-->

v2.created > v.created

I think that will double the speed.

However, the counts may not be exactly correct if you can have two different products with the same created.

Another issue: You will end up with

prod  assoc  CT
123   234    43
234   123    76  -- same pair, opposite order

My revised test says that 234 came before 123 more often than the other way.

Give those things a try. But if you still need more; I have another, more invasive, thought.