MySQL / SQLAlchemy — JOIN on REGEX substring match between columns

I'd like to join (e.g. LEFT JOIN) following two tables upon a match of the main URL substring (ignoring https://www.)

first just using MySQL
then also implemented in SQLAlchemy

For example:

https://www.example.com/path1/  =matches=  https://example.com/path1/

but (http vs https)

http://www.example.com/path1/  =!does not match!=  https://example.com/path1/

First_Table

id	Page	Visits
0	https: // www. example .com/path1/	164
1	https: // www. example .com/path2/	4685
2	https: // www. example .com/	12005

Second_Table

id	Page	Visits
5	https: // example .com/path1/	546
8	https: // example .com/path2/	4689
9	https: // example .com/	8466

A regex, that captures the main substring is for example (?<=https://)(?:www\.)?(.+)

How would you approach this? What is the proper MySQL syntax and how can it be implemented with SQLAlchemy?

Solution

I'm not a SQL specialist and not sure what you want to do but I would do the following:

Decline one table by replacing //www.example.com by //example.com or the inverse (//example.com by //www.example.com) so that both tables have a similar page URL for your comparaison.
Create an union of this declined table with the other one.
Group by page URL so that we can sum the visits.

The SQL code:

SELECT
  Page,
  SUM(Visits) AS `Total_Visits`
FROM
  (
    SELECT
      *
    FROM
      First_Table t1
    UNION
      (
        SELECT
          t2.id,
          REPLACE(t2.Page, '//example.com', '//www.example.com') AS `Page`,
          t2.Visits
        FROM
          Second_Table t2
      )
  ) Union_Table
GROUP BY
  Page

Result:

Page	Total_Visits
https://www.example.com/	20471
https://www.example.com/path1/	710
https://www.example.com/path2/	9374

You could add a sort on the page URL.

Test and play with it here: http://sqlfiddle.com/#!9/d14e11/32