Search code examples
mysqlactiverecordyii2database-performance

Yii2 Dynamic Relational Query Junction with Sort uses 2 queries instead of a JOIN, why?


I'm working with Yii2 Relational Database / Active Query Models and I ran into an issue trying to use the magic method getModelName() with $this->hasMany()->viaTable() to set the relation while trying to sort by a sort column in my junction table.

First I tried to just add my ->orderBy() clause to the main query:

    return $this->hasMany(Category::class,
        ['id' => 'categoryId'])
            ->viaTable('{{kits_x_categories}}',
                ['kitId' => 'id'])
            ->orderBy('{{kits_x_categories}}.sort asc');

That didn't work as expected and upon further digging I found out that this results in two separate queries, the first one selects my category Ids into an array, then uses said array for a WHERE IN() clause in the main (2nd) query to get the actual models that are related.

My first thought was to use the 3rd function($query) {} callback parameter of the ->viaTable() call and putting my $query->orderBy() clause on there:

    return $this->hasMany(Category::class,
        ['id' => 'categoryId'])
            ->viaTable('{{kits_x_categories}}',
                ['kitId' => 'id'],
                function($query) {
                    return $query->orderBy('{{kits_x_categories}}.sort asc');
                }
            );

However, all that did was return the category ID's in my desired order but ultimately had no effect on the main query that does the IN() condition with said ids since the order of the ids in the IN() condition have no effect on anything.

Finally, I ended up with this which lets it do what it wants, but then forces in my join to the main query with the IN() condition so that I can have the main query sort by my junction table sort column. This works as desired:

    return $this->hasMany(Category::class,
        ['id' => 'categoryId'])
            ->viaTable('{{kits_x_categories}}',
                ['kitId' => 'id'])
        ->leftJoin('{{kits_x_categories}}', '{{kits_x_categories}}.categoryId = {{categories}}.id')
        ->where(['{{kits_x_categories}}.kitId' => $this->id])
        ->orderBy('{{kits_x_categories}}.sort asc');

This results in 2 queries.

First the query gets the category ids from the join table:

SELECT * FROM `kits_x_categories` WHERE `kitId`='49';

Then the main query with the IN() condition and my forced join for sort:

SELECT `categories`.* FROM `categories` 
    LEFT JOIN `kits_x_categories` ON `kits_x_categories`.categoryId = `categories`.id 
    WHERE (`kits_x_categories`.`kitId`='49') AND (`categories`.`id` IN ('11', '7', '9')) 
    ORDER BY `kits_x_categories`.`sort`

So here is my actual question... This seems largely inefficient to me but I am by no means a database/SQL god so maybe I just don't understand fully. What I want is to understand.

Why does Yii do this? What is the point of making one query to get the IDs first, then making another query to get the objects based on the ids of the relation? Wouldn't it be more efficient to just do a regular join here? Then, in my opinion, sorting by a junction sort column would be intuitive rather than counter-intuitive.

The only thing I can think of is has to do with the lazy vs eager loading of data, maybe in lazy in only gets the IDs first then when it needs to load the data it pulls the actual data using IN()? If I used joinWith() instead of viaTable() would that make any difference here? I didn't dig into this as I literally just thought of that as I was typing this.

Lastly, In this scenario, There is only going to be a few categories for each kit so efficiency is not a big deal but I'm curious are there any performance implications in my working solution if I were to use this in the future on a different model set that could have thousands+ of relations?


Solution

  • Yii 2 does that:

    1. To support lazy loading.
    2. To support cross-database relations such as MySQL -> Redis.
    3. To reduce number of edge-cases significantly so internal AR code becomes less complicated.