Search code examples
mysqlsymfonydoctrinedql

Possible solutions for slow Left Joins on complex doctrine query


I have a symfony repository method, which fetches a quite complex data set, which will then be placed in a CSV file by an export manager class. I don't want to place the whole code of handling the export job, but I managed to chatch the point at which the query slows down, so my question is regarding any other alternative to make that query faster, rather than the code itself. So the data fetched is some "site" data, which has multpiple "memberships", which then has a "user" . So the problem is, that when my query is trying to join the user information to the site, it slows down the execution. It looks like this:

$qb->leftJoin('s.memberships', 'ex_sm', 'WITH', 'ex_sm.revokedAt IS NULL');
$qb->leftJoin('ex_sm.user', 'ex_jappr', 'WITH', 'ex_sm.approverJobReactiveWeight IS NOT NULL');  

There are a few things (I tried or crossed through my mind could help) to mention:

  • I checked the tables, all the linked columns have an index and they are the same int data type.
  • I red an article about the DQL performance issues, where it was mentioned that overuse of DQL Left Join calls can kill performance as they re-map the same entity object over and over. A possible solution mentioned there was to fetch the main data set, and then looping through the collection, add the additional (joining data fields) to each element directly from the field's entity class. That possibly could work (not sure how much impact it would have), the problem is that what I have is a really complex legacy code, I don't want to touch the export manager's logic, because that would require far too much testing. The export manager expects a query builder class, so I have to find a solution within the query itself.
  • The issue is defintely caused by the join, and not the 'WITH' clause or the addtional conditions. I tried to call the query with the plain leftJoin call, same result.
  • I know the leftJoin methods can be called chained to each other, the code looks this way because some of these call are used in if statements.
  • I spend 2 days trying all sort things found here and other websites.

There are 6 different user types, for now I just called the script fetching the one above and it took 33 minutes to return the data. We are talking about 512 sites, which is not a huge collection of data. So my question is : Is there another DQL or any Doctrine way to simlify or reduce the call nulber of leftJoins in such a complex query, and somehow improve the performance?

Update: I think the problem comes from indexes so I give some details on the relationships: The 'memberships' entity comes from a table named 'access' the relationship to the user in its model looks like this:

/**
 * The user this membership encapsulates.
 *
 * @ORM\ManyToOne(targetEntity="User", inversedBy="siteMemberships", cascade={"persist"})
 * @ORM\JoinColumn(name="security_identity_id", referencedColumnName="id")
 *
 * @var User
 */
protected $user; 

Here is a screenshot of the indexes assigend to the 'security_identity_id' columns enter image description here

And the related User comes from a 'user' table having this relationship pointing to the membershipt

/**
 * @ORM\OneToMany(targetEntity="SiteMembership", mappedBy="user", cascade={"persist"}, fetch="EXTRA_LAZY")
 */
protected $siteMemberships;

Primary key is 'id' in the entity. Hope this gives a better view of the problem. I'm not an expert with sql, but tried everything I found and could understand so far.

Update: Here is the query executed:

SELECT s0_.name AS name_0, s0_.id AS id_1, GROUP_CONCAT(DISTINCT u1_.name SEPARATOR ', ') AS sclr_2 FROM site s0_ 
  LEFT JOIN access a2_ ON s0_.id = a2_.entity_id 
  AND a2_.type IN ('site_member') 
  AND (a2_.revoked_at IS NULL) 
  LEFT JOIN user u1_ ON a2_.security_identity_id = u1_.id 
  AND (a2_.approver_job_reactive_weight IS NOT NULL)

This returns the first site record with it's joined membershipt and user peroperty. But even the fething of this one single row takes more than 2 minutes.

Here is the create table information of the access (membership entity) table

'CREATE TABLE `access` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `buddy_id` int(11) DEFAULT NULL,
  `security_identity_id` int(11) DEFAULT NULL,
  `revoked_at` datetime DEFAULT NULL,
  `created_at` datetime NOT NULL,
  `updated_at` datetime NOT NULL,
  `type` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `approver_job_reactive_weight` int(11) DEFAULT NULL,
  `entity_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `access_idx` (`type`,`security_identity_id`,`entity_id`,`buddy_id`),
  KEY `IDX_6692B54395CE8D6` (`buddy_id`),
  KEY `IDX_6692B54DF9183C9` (`security_identity_id`),
  KEY `IDX_6692B5481257D5D` (`entity_id`),
  KEY `idx_revoked_id_approver_type` (`revoked_at`,`entity_id`,`approver_job_reactive_weight`,`approver_job_planned_weight`,`type`),
  KEY `idx_user_site_access` (`revoked_at`,`security_identity_id`,`buddy_id`,`type`),
  KEY `idx_user` (`security_identity_id`),
  KEY `idx_user_id` (`security_identity_id`),
  CONSTRAINT `FK_6692B54DF9183C9` FOREIGN KEY (`security_identity_id`) REFERENCES `user` (`id`)
)
ENGINE=InnoDB AUTO_INCREMENT=262441 DEFAULT CHARSET=utf8 
COLLATE=utf8_unicode_ci'

I removed some of the columns which are not relevant.


Solution

  • When doing a LEFT JOIN, the ON needs to say how the tables are related. The WHERE clause normally has IS NULL or IS NOT NULL to say whether to exclude or include the right-hand rows.

    LEFT JOIN and INNER JOIN are mostly the same speed. But I need to see the indexes (SHOW CREATE TABLE) and the SQL of the SELECT to see if there are other issues.

    More

    Replace

    KEY `IDX_6692B5481257D5D` (`entity_id`),
    

    With

    INDEX(entity_id, type, revoked_at)