博客 从执行计划看NOT IN、NOT EXISTS 和 LEFT JOIN效率,记住内外关联条件不要乱放-SQL开发实战系列(六)

从执行计划看NOT IN、NOT EXISTS 和 LEFT JOIN效率,记住内外关联条件不要乱放-SQL开发实战系列(六)

   数栈君   发表于 2023-02-28 18:31  336  0

一、从执行计划看NOT IN、NOT EXISTS 和 LEFT JOIN效率
有些单位的部门(如40)中一个员工也没有,只是设了一个部门名字,如下列语句:

select count(*) from dept where deptno=40;
1
如何通过关联查询把这些信息查出来?
同样有三种写法:NOT IN、NOT EXISTS 和LEFT JOIN。
语句及PLAN如下(版本为11.2.0.4.0 )。
环境:

alter table dept add constraints pk_dept primary key (deptno); --如果你有就不用建了
1
NOT IN用法
EXPLAIN PLAN FOR select *
FROM dept
WHERE deptno NOT IN (SELECT emp.deptno FROM emp WHERE emp.deptno IS NOT NULL);
SELECT * FROM TABLE(dbms_xplan.display());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1353548327
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Ti
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 23 | 6 (17)| 00
| 1 | MERGE JOIN ANTI | | 1 | 23 | 6 (17)| 00
| 2 | TABLE ACCESS BY INDEX ROWID| DEPT | 4 | 80 | 2 (0)| 00
| 3 | INDEX FULL SCAN | PK_DEPT | 4 | | 1 (0)| 00
|* 4 | SORT UNIQUE | | 14 | 42 | 4 (25)| 00
|* 5 | TABLE ACCESS FULL | EMP | 14 | 42 | 3 (0)| 00
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("DEPTNO"="EMP"."DEPTNO")
filter("DEPTNO"="EMP"."DEPTNO")
5 - filter("EMP"."DEPTNO" IS NOT NULL)

19 rows selected


NOT EXISTS 用法
EXPLAIN PLAN FOR SELECT*
FROM dept
WHERE NOT EXISTS ( SELECT NULL FROM emp WHERE emp.deptno = dept.deptno) ;
SELECT * FROM TABLE(dbms_xplan.display());
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1353548327
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Ti
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 23 | 6 (17)| 00
| 1 | MERGE JOIN ANTI | | 1 | 23 | 6 (17)| 00
| 2 | TABLE ACCESS BY INDEX ROWID| DEPT | 4 | 80 | 2 (0)| 00
| 3 | INDEX FULL SCAN | PK_DEPT | 4 | | 1 (0)| 00
|* 4 | SORT UNIQUE | | 14 | 42 | 4 (25)| 00
|* 5 | TABLE ACCESS FULL | EMP | 14 | 42 | 3 (0)| 00
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")
filter("EMP"."DEPTNO"="DEPT"."DEPTNO")
5 - filter("EMP"."DEPTNO" IS NOT NULL)

19 rows selected


LEFT JOIN 用法
根据前面介绍过的左联知识,LEFT JOIN 取出的是左表中所有的数据,其中与右表不匹配的就表示左表NOT IN右表。
所以这里LEFT JOIN加上条件TS NULL,就是LEFT JOIN的写法:

EXPLAIN PLAN FOR
SELECT dept.*
FROM dept
LEFT JOIN emp ON emp.deptno = dept.deptno WHERE emp.deptno IS NULL;

SELECT * FROM TABLE(dbms_xplan.display());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 1353548327
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Ti
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 23 | 6 (17)| 00
| 1 | MERGE JOIN ANTI | | 1 | 23 | 6 (17)| 00
| 2 | TABLE ACCESS BY INDEX ROWID| DEPT | 4 | 80 | 2 (0)| 00
| 3 | INDEX FULL SCAN | PK_DEPT | 4 | | 1 (0)| 00
|* 4 | SORT UNIQUE | | 14 | 42 | 4 (25)| 00
|* 5 | TABLE ACCESS FULL | EMP | 14 | 42 | 3 (0)| 00
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")
filter("EMP"."DEPTNO"="DEPT"."DEPTNO")
5 - filter("EMP"."DEPTNO" IS NOT NULL)

19 rows selected


通过看上面的执行计划,三个SQL用的都是 MERGE JOIN ANTI, 说明这三种方法的效率一样。
如果想改写,就要对比改写前后的PLAN,根据PLAN来判断并测试哪种方法的效率高,一定要记住不能凭借某些结论来碰运气。

二、外连接中的条件不要乱放,建议大家使用join而非(+)
对于系列三博客介绍的左联语句,见下面的数据。

SELECT l.str AS left_str, r.str AS right_str,r.status FROM l
LEFT JOIN r ON l.v = r.v
ORDER BY 1 , 2 ;
LEFT_STR RIGHT_STR STATUS
-------- --------- ----------
left_1
left_2
left_3 right_3 1
left_4 right_4 0

那现在有这么一个需求:对于其中的L表,四条数据都返回。而对于R表,我们需要只显示其中的status=1的数据,也就是下面这样的结果:

LEFT_STR RIGHT_STR STATUS
-------- --------- ----------
left_1
left_2
left_3 right_3 1
left_4

对于这个需求,可能有些人会加一个where条件!然后结果就变成了下面这样了:
left join写法:

SELECT l.str AS left_str, r.str AS right_str,r.status
FROM l
LEFT JOIN r ON (l.v = r.v)
where r.status=1
ORDER BY 1 , 2;
LEFT_STR RIGHT_STR STATUS
-------- --------- ----------
left_3 right_3 1

(+)写法:

SELECT l.str AS left_str, r.str AS right_str, r.status
FROM l, r
where l.v = r.v(+)
and r.status = 1
ORDER BY 1, 2;
LEFT_STR RIGHT_STR STATUS
-------- --------- ----------
left_3 right_3 1

而此时的执行计划:

SQL> EXPLAIN PLAN FOR
2 SELECT l.str AS left_str, r.str AS right_str,r.status
3 FROM l
4 LEFT JOIN r ON (l.v = r.v)
5 where r.status=1
6 ORDER BY 1 , 2;

Explained

SQL> SELECT * FROM TABLE(dbms_xplan.display());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 688663707
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 42 | 7 (15)| 00:00:01 |
| 1 | SORT ORDER BY | | 2 | 42 | 7 (15)| 00:00:01 |
|* 2 | HASH JOIN | | 2 | 42 | 6 (0)| 00:00:01 |
|* 3 | TABLE ACCESS FULL| R | 2 | 24 | 3 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| L | 4 | 36 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("L"."V"="R"."V")
3 - filter("R"."STATUS"=1)

17 rows selected


很明显,结果以及执行计划(HASH JOIN)与我们期望得到的结果都不一致!!!这是很多人在写查询或更改查询时常遇到的一种错误。问题就在于所加条件的位置及写法,正确的写法分别如下:

SQL> SELECT l.str AS left_str, r.str AS right_str, r.status
2 FROM l
3 LEFT JOIN r
4 ON (l.v = r.v and r.status = 1)
5 ORDER BY 1, 2;

LEFT_STR RIGHT_STR STATUS
-------- --------- ----------
left_1
left_2
left_3 right_3 1
left_4



SQL> SELECT l.str AS left_str, r.str AS right_str, r.status
2 FROM l, r
3 where l.v = r.v(+)
4 and r.status(+) = 1
5 ORDER BY 1, 2;

LEFT_STR RIGHT_STR STATUS
-------- --------- ----------
left_1
left_2
left_3 right_3 1
left_4


看一下这时候的执行计划:

SQL> EXPLAIN PLAN FOR
2 SELECT l.str AS left_str, r.str AS right_str, r.status
3 FROM l
4 LEFT JOIN r
5 ON (l.v = r.v and r.status = 1)
6 ORDER BY 1, 2;

Explained


SQL> SELECT * FROM TABLE(dbms_xplan.display());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 2310059642
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 84 | 7 (15)| 00:00:01 |
| 1 | SORT ORDER BY | | 4 | 84 | 7 (15)| 00:00:01 |
|* 2 | HASH JOIN OUTER | | 4 | 84 | 6 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| L | 4 | 36 | 3 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL| R | 2 | 24 | 3 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("L"."V"="R"."V"(+))
4 - filter("R"."STATUS"(+)=1)

17 rows selected


以上两种写法结果均正确,且根据执行计划HASH JOIN OUTER明确走的是外连接。而且根据上面查询我们能够看出来JOIN的方式明显更容易辨别,这也是我反复建议使用JOIN的原因。
对于上面SQL我们还可以使用先过滤再关联的方式,即R表先过滤:

(select * from r where status=1) r
1
总结
同上一篇博客所说,在使用in exists或则NOT IN、NOT EXISTS 和 LEFT JOIN时候,不要想当然的认为in和not in效率极其低下,在本章案例中通过执行计划能够直观的看到,三者效率竟然一致了!!所以,读万卷书不如行万里路,网上别人做的总结再好,也不如自己实践一把来的真实。还有就是,在使用关联查询时候,关联条件和过滤条件一定要想好放哪里,不然你会想当然的错了!
————————————————

免责申明:

本文系转载,版权归原作者所有,如若侵权请联系我们进行删除!

想了解或咨询更多有关袋鼠云大数据产品、行业解决方案、客户案例的朋友,浏览袋鼠云官网:https://www.dtstack.com/?src=bbs
同时,欢迎对大数据开源项目有兴趣的同学加入「袋鼠云开源框架钉钉技术群」,交流最新开源技术信息,群号码:30537511,项目地址:https://github.com/DTStack

0条评论
社区公告
  • 大数据领域最专业的产品&技术交流社区,专注于探讨与分享大数据领域有趣又火热的信息,专业又专注的数据人园地

最新活动更多
微信扫码获取数字化转型资料
钉钉扫码加入技术交流群