Should rely on explain command when analyzing query efficiency?

explain  SELECT * FROM house JOIN street s on house.STREET_ID = s.ID WHERE s.type<>'out_of' AND street.DISTRICT_ID=4 ORDER BY house.num;
explain SELECT * FROM house JOIN street s on house.STREET_ID = s.ID WHERE s.type<>'out_of' AND house.DISTRICT_ID=4 ORDER BY house.num;

Type and district_id are compounds indexed in Street table district_id is indexed in House table

explain with street.DISTRICT_ID=4, street table is joined

<data>
<row>
 <id>1</id>
 <select_type>SIMPLE</select_type>
 <table>street</table>
 <type>ref</type>
 <possible_keys>PRIMARY,FK_STREET_DISTRICT,IDX_STREET_DISTRICT_ID_STREET_TYPE</possible_keys>
 <key>IDX_STREET_DISTRICT_ID_STREET_TYPE</key>
 <key_len>8</key_len>
 <ref>const</ref>
 <rows>16</rows>
 <Extra>Using index condition; Using temporary; Using filesort</Extra>
</row>
<row>
 <id>1</id>
 <select_type>SIMPLE</select_type>
 <table>house</table>
 <type>ref</type>
 <possible_keys>FK_HO_ST,idx_or_st_entity</possible_keys>
 <key>FK_HO_ST</key>
 <key_len>8</key_len>
 <ref>street.ID</ref>
 <rows>70695</rows>
 <Extra></Extra>
</row>
</data>

explain with house.DISTRICT_ID=4

<data>
<row>
  <id>1</id>
  <select_type>SIMPLE</select_type>
  <table>house</table>
  <type>ALL</type>
  <possible_keys>FK_ST_HO,FK_ST_DISTRICT,idx_or_st_entity,IDX_HO_DISTRICT_NUM</possible_keys>
  <key>null</key>
  <key_len>null</key_len>
  <ref>null</ref>
  <rows>989734</rows>
  <Extra>Using where; Using filesort</Extra>
</row>
<row>
  <id>1</id>
  <select_type>SIMPLE</select_type>
  <table>street</table>
  <type>eq_ref</type>
  <possible_keys>PRIMARY</possible_keys>
  <key>PRIMARY</key>
  <key_len>8</key_len>
  <ref>house.STREET_ID</ref>
  <rows>1</rows>
  <Extra>Using where</Extra>
</row>
</data>

DISTRICT_ID is the same for house and street but the received estimated rows have a big difference. Does it mean that query with joined DISTRICT_ID is more efficient, where estimated rows = 70695?

Solution

The first query is actually examining an estimated 1,131,120 rows. For JOIN queries, you should look at the product of the rows field between the joined tables. In your case, it estimates it will examine 16 rows from street, and for each of these rows, it estimates it will examine 70,695 rows in the joined table house. 16*70695 = 1131120.

Whereas the second table estimates it will examine 989,734 rows in house (as a table-scan, indicated by type: ALL), and for each of the matching rows, it will look up 1 row in the joined table street by primary key (indicated by type: eq_ref).

Take the numbers with a grain of salt. They are only estimates, and they can be off by quite a bit. It's better to look at them not as precise numbers, but as orders of magnitude. I would consider both of them to be inefficient, unless you expect the final result of the query to have hundreds of thousands of rows.

Without seeing your tables, it's hard to guess at a better index strategy. Normally questions about query-optimization should include either a link to a dbfiddle, or else the output of SHOW CREATE TABLE for each table in the query.

I guess from your queries that both tables have a DISTRICT_ID column. Is that redundant, or can it be used to refine the join condition? For example:

... 
FROM house h JOIN street s 
ON h.STREET_ID = s.ID AND h.DISTRICT_ID = s.DISTRICT_ID
...

What I'm thinking is that if this can be used to narrow down the number of examined rows, it would be worth adding it to the index. But I can't know that given the limited information you've given about these tables.