本文同时发表在
当前mysql执行的策略很简单:mysql对任何关联都执行嵌套循环操作,即mysql先在一个表中循环取出单条数据,然后再嵌套循环到下一个表中寻打匹配的行,依次下去,直到描述到所表表中匹配的行为止。然后根据各个表匹配的行,返回查询中需要的各个列。mysql会尝试在最后一个关联表中打到所有匹配的行,如果最后一个关联表无法找到更多的行以后,mysql返回到上一层次关联表,看是否能够找到更多的匹配记录,依此类推迭代执行。
按照这样的方式查找第一条表记录,再嵌套查询下一个关联表,然后回溯到上一个表,在mysql中是通过嵌套循环的方式来实现的--正如其名‘嵌套循环关联’。请看下面的例子中的简单查询:
select tbl1.col1,tbl2.col2 from tbl1 inner join tbl2 using(col3) where tbl1.col1 in(5,6);
假设mysql按照查询中的表顺序进行关联操作,我们则可以用下面的伪代码表示mysql将如何完成这个查询:
outer_iter = iterator_over tbl1 where col1 in(3,4)outer_row = outer_iter.next while outer_row inner_iter = iterator over tbl2 where col3=outer_row.col3 inner_row = inner_iter.next while inner_row output[outer_row.col1,inner_row.col2] inner_row = inner_iter.next end out_row = outer_iter.nextend
上面的执行计划对于单表查询和多表关联查询都适用,如果是一个单表查询,那么只需要完成上面的外层的基本操作。对于外连接和上面的执行过程任然适用。例如我们将上面的查询修改如下:
SELECT tbl1.col1 ,tbl2.col2 FROM tbl1 left outer join tbl2 using (col3) WHERE tbl1.col1 in (3,4)
对应的伪代码:
outer_iter = iterator over tbl1 where col1 in(3,4)outer row = outer_iter.nextwhile outer_row inner_iter = iterator over tbl2 where col3 = outer_row.col3 inner_row = inner_iter.next if inner row -> 手动加粗 while inner_row out_put [outer_row.col1,inner_row.col2] inner_row = inner_iter.next end else -> 手动加粗 out_put[outer_row.col1,NULL] -> 手动加粗 end outer_row = outer_iter.nextend
从上面两个例子也可以看出,对于主表来说,是先进行主表的where条件筛选,再进行表联接,而不是先进行整表联接再进行where条件的筛选。
举个例子:
数据表结构:
mysql> create table a( -> id int unsigned not null primary key -> );mysql> create table b like a;
表中数据:
mysql> select * from a;+----+| id |+----+| 1 || 2 || 3 || 4 || 5 |+----+5 rows in set (0.00 sec)mysql> select * from b;+----+| id |+----+| 4 || 5 || 6 || 7 |+----+4 rows in set (0.00 sec)
explain查询:
mysql> explain select a.id as aid,b.id as bid from a left join b using(id) where a.id>3;+----+-------------+-------+--------+---------------+---------+---------+----------+------+--------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+--------+---------------+---------+---------+----------+------+--------------------------+| 1 | SIMPLE | a | range | PRIMARY | PRIMARY | 4 | NULL | 3 | Using where; Using index || 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 | com.a.id | 1 | Using index |+----+-------------+-------+--------+---------------+---------+---------+----------+------+--------------------------+2 rows in set (0.00 sec)
可以看出,首先在a表上进行范围查询,筛选出a.id>3的数据,然后在进行"嵌套查询"。
注意,on后面的筛选条件主要是针对的是关联表,而对于主表筛选并不适用,比如:
mysql> select a.id as aid,b.id as bid from a left join b on a.id=b.id and a.id>3;+-----+------+| aid | bid |+-----+------+| 1 | NULL || 2 | NULL || 3 | NULL || 4 | 4 || 5 | 5 |+-----+------+5 rows in set (0.00 sec)mysql> explain select a.id as aid,b.id as bid from a left join b on a.id=b.id and a.id>3;+----+-------------+-------+--------+---------------+---------+---------+----------+------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+--------+---------------+---------+---------+----------+------+-------------+| 1 | SIMPLE | a | index | NULL | PRIMARY | 4 | NULL | 5 | Using index || 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 | com.a.id | 1 | Using index |+----+-------------+-------+--------+---------------+---------+---------+----------+------+-------------+2 rows in set (0.00 sec)
我们发现a表的id<=3的数据并未被筛选走,explain的结果是a表进行了index类型的查询,即主键索引的全部扫描。
如果在on的筛选条件是针对b表的呢,情况会怎么样?
下面的例子数据表结构和数据变了,我们只关注查询的结果区别:mysql> select * from a left join b on a.data=b.data and b.id<=20;+----+------+------+------+| id | data | id | data |+----+------+------+------+| 1 | 1 | NULL | NULL || 2 | 2 | NULL | NULL || 3 | 3 | NULL | NULL || 4 | 4 | 1 | 4 || 4 | 4 | 6 | 4 || 4 | 4 | 11 | 4 || 4 | 4 | 16 | 4 || 5 | 5 | 2 | 5 || 5 | 5 | 7 | 5 || 5 | 5 | 12 | 5 || 5 | 5 | 17 | 5 |+----+------+------+------+11 rows in set (0.00 sec)mysql> select * from a left join b on a.data=b.data where b.id<=20;+----+------+------+------+| id | data | id | data |+----+------+------+------+| 4 | 4 | 1 | 4 || 5 | 5 | 2 | 5 || 4 | 4 | 6 | 4 || 5 | 5 | 7 | 5 || 4 | 4 | 11 | 4 || 5 | 5 | 12 | 5 || 4 | 4 | 16 | 4 || 5 | 5 | 17 | 5 |+----+------+------+------+8 rows in set (0.00 sec)
由此,我们可以根据伪码来分析两者的区别:
outer_iter = iterator over aouter row = outer_iter.nextwhile outer_row inner_iter = iterator over b where data = outer_row.date where id<=20 inner_row = inner_iter.next if inner row while inner_row out_put [outer_row,inner_row] inner_row = inner_iter.next end else out_put[outer_row,NULL] end outer_row = outer_iter.nextend
outer_iter = iterator over aouter row = outer_iter.nextwhile outer_row inner_iter = iterator over b where data = outer_row.date ->手动加粗 inner_row = inner_iter.next if inner row while inner_row out_put [outer_row,inner_row] inner_row = inner_iter.next end else out_put[outer_row,NULL] end outer_row = outer_iter.nextendleft join的结果集中 where b.id<=20 ->手动加粗
参考资料:《高性能MySQL》