博客 窗口函数OVER(PARTITION BY)详细用法——语法+函数+开窗范围ROWS和RANGE

窗口函数OVER(PARTITION BY)详细用法——语法+函数+开窗范围ROWS和RANGE

   数栈君   发表于 2023-09-19 10:07  256  0

一、函数写法

函数名(参数) OVER (PARTITION BY子句 ORDER BY子句 ROWS/RANGE子句)

由三部分组成:
函数名:如sum、max、min、count、avg等聚合函数以及lead、lag行比较函数等;
over: 关键字,表示前面的函数是分析函数,不是普通的集合函数;
分组子句:over关键字后面挂号内的内容;

分析子句又由下面三部分组成:
PARTITION BY :分组子句,表示分析函数的计算范围,不同的组互不相干;
ORDER BY: 排序子句,表示分组后,组内的排序方式;
ROWS/RANGE:窗口子句,是在分组(PARTITION BY)后,组内的子分组(也称窗口),此时分析函数的计算范围窗口,而不是PARTITON。窗口有两种,ROWS和RANGE;

二、开窗的窗口范围ROWS与RANGE

1.范围限定用法

●CURRENT ROW: 当前行

●UNBOUNDED:不受控制的,无限的

●UNBOUNDED PRECEDING: 区间的第一行

●UNBOUNDED FOLLOWING:区间的最后一行

●UNBOUNDED PRECEDING AND UNBOUNED FOLLOWING:针对当前所有记录的前一条、后一条记录,分组中的所有记录

●PRECEDING:在...之前,N PRECEDING: 当前行之前的N行,可以是数字用于RANGE数据范围限定,也可以是一个能计算出数字的表达式

●FOLLOWING:在...之后,N FOLLOWING:当前行之后的N行,可以是数字用于RANGE数据范围限定,也可以是一个能计算出数字的表达式

●ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW :指第一行至当前行的数据

●ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING :指当前行到最后一行的汇总

●ROWS BETWEEN 1 PRECEDING AND CURRENT ROW :指当前行的上一行(ROWNUM-1)到当前行的数据

●ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING :指当前行的上一行(ROWNUM-1)到当前行的下一行(ROWNUM+1)的数据

●RANGE BETWEEN CURRENT ROW AND 350 FOLLOWING:指当前行到当前行数据+350的范围内的数据

●RANGE BETWEEN 5 PRECEDING AND 5 FOLLOWING:指当前行数据幅度减5加5后的范围内的数据

2.ROWS和RANGE的区别

ROWS按行数限定

RANGE按数据范围限定

(1) ROWS按行数限定
表结构及测试数据:

DROP TABLE IF EXISTS `test`;
CREATE TABLE `test` (
    `video_id` int(0) NOT NULL COMMENT '视频ID',
    `dt` date NULL DEFAULT NULL,
    `if_follow` tinyint(0) NULL DEFAULT NULL COMMENT '是否关注'
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci ROW_FORMAT = Dynamic;
-- ----------------------------
-- Records of test
-- ----------------------------
INSERT INTO `test` VALUES (2001, '2021-09-24', 1);
INSERT INTO `test` VALUES (2001, '2021-10-03', 1);
INSERT INTO `test` VALUES (2001, '2021-10-02', 1);
INSERT INTO `test` VALUES (2001, '2021-10-01', 1);
INSERT INTO `test` VALUES (2002, '2021-09-25', 1);
INSERT INTO `test` VALUES (2002, '2021-09-25', 1);
INSERT INTO `test` VALUES (2002, '2021-09-26', 1);
INSERT INTO `test` VALUES (2002, '2021-09-27', 1);
INSERT INTO `test` VALUES (2002, '2021-09-28', 1);
INSERT INTO `test` VALUES (2002, '2021-09-29', 1);
INSERT INTO `test` VALUES (2002, '2021-09-30', 1);
INSERT INTO `test` VALUES (2002, '2021-10-01', 1);
INSERT INTO `test` VALUES (2002, '2021-10-02', 1);
INSERT INTO `test` VALUES (2002, '2021-10-03', 1);
语句:

select video_id,dt, sum(if_follow) over(partition by video_id order by dt rows BETWEEN CURRENT ROW and 1 following ) from test ;
http://dtstack-static.oss-cn-hangzhou.aliyuncs.com/2021bbs/files_user1/article/8c3d6014395b3056b1a00beda861755f..png
  

(2) RANGE按数据范围限定
表结构及测试数据:

DROP TABLE IF EXISTS `test`;
CREATE TABLE `test` (
    `video_id` int(0) NOT NULL COMMENT '视频ID',
    `dt` date NULL DEFAULT NULL,
    `if_follow` tinyint(0) NULL DEFAULT NULL COMMENT '是否关注'
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci ROW_FORMAT = Dynamic;
-- ----------------------------
-- Records of test
-- ----------------------------
INSERT INTO `test` VALUES (2001, '2021-09-24', 1);
INSERT INTO `test` VALUES (2001, '2021-10-03', 9);
INSERT INTO `test` VALUES (2001, '2021-10-02', 2);
INSERT INTO `test` VALUES (2001, '2021-10-01', 6);
INSERT INTO `test` VALUES (2002, '2021-09-25', 1);
INSERT INTO `test` VALUES (2002, '2021-09-25', 1);
INSERT INTO `test` VALUES (2002, '2021-09-26', 6);
INSERT INTO `test` VALUES (2002, '2021-09-27', 1);
INSERT INTO `test` VALUES (2002, '2021-09-28', 1);
INSERT INTO `test` VALUES (2002, '2021-09-29', 8);
INSERT INTO `test` VALUES (2002, '2021-09-30', 7);
INSERT INTO `test` VALUES (2002, '2021-10-01', 1);
INSERT INTO `test` VALUES (2002, '2021-10-02', 9);
INSERT INTO `test` VALUES (2002, '2021-10-03', 1);
下面这个语句执行会报错,因为当RANGE和PRECEDING/FOLLOWING一起使用时,order by的表达式必须为数字或者时间差

select video_id,dt, sum(if_follow) over(partition by video_id order by dt range BETWEEN 3 preceding and CURRENT ROW ) from test ;
报错内容如下:
select video_id,dt, sum(if_follow) over(partition by video_id order by dt range BETWEEN 3 preceding and CURRENT ROW ) from test
> 3587 - Window '<unnamed window>' with RANGE N PRECEDING/FOLLOWING frame requires exactly one ORDER BY expression, of numeric or temporal type

order by 数字
例1 汇总数据范围为:[当前行值,当前行值+3]
select video_id,dt, sum(if_follow) over(partition by video_id order by if_follow range BETWEEN CURRENT ROW and 3 following) from test ;
http://dtstack-static.oss-cn-hangzhou.aliyuncs.com/2021bbs/files_user1/article/e7691ee9fee0e8e0bfe1af53df656ae6..png
  

例2 汇总数据范围为:[当前行值-3,当前行值]
select video_id,dt, sum(if_follow) over(partition by video_id order by if_follow range BETWEEN 3 PRECEDING and CURRENT ROW ) from test ;
http://dtstack-static.oss-cn-hangzhou.aliyuncs.com/2021bbs/files_user1/article/ace250f83f33c8b6947af488e5c25c12..png
  

order by 时间
order by表达式的类型为时间(date、datetime)时,必须使用Interval

例1 [当前行日期,当前行日期+2]
select video_id,dt, sum(if_follow) over(partition by video_id order by dt range BETWEEN CURRENT ROW and interval 2 day following) from test ;
http://dtstack-static.oss-cn-hangzhou.aliyuncs.com/2021bbs/files_user1/article/f207ea64cd44d12f9934ada8ad1639df..png
  

例2 [当前行日期-2,当前行日期]
select video_id,dt, sum(if_follow) over(partition by video_id order by dt range BETWEEN interval 2 day PRECEDING and CURRENT ROW ) from test ;
http://dtstack-static.oss-cn-hangzhou.aliyuncs.com/2021bbs/files_user1/article/d78fd000da231e11b2b211451248701f..png
  

三、函数介绍

下面是mysql中能使用的

1.排序函数

rank()函数,如果有并列情况,会占用下一个名次的位置,比如,成绩为100的学生有三个并列第一,那么99分的学生是第二名,通过rank()函数,名次是:1,1,1,4;
dense()函数,如果有并列的情况,不会占用下一个名词,同用上个例子,名次是:1,1,1,2;
row_number()函数,会忽略并列的情况,同用上述例子,名次是:1,2,3,4;

2.聚合函数

count() over(partition by ... order by ...):求分组后的总数;
max() over(partition by ... order by ...):求分组后的最大值;
min() over(partition by ... order by ...):求分组后的最小值;
avg() over(partition by ... order by ...):求分组后的平均值;

3.比较函数

lag() over(partition by ... order by ...):取出向前第n行数据。  
lead() over(partition by ... order by ...):取出向后第n行数据。

lag(arg1,arg2,arg3)、lead(arg1,arg2,arg3)
第一个参数是列名,
第二个参数是偏移的offset,不能为负数,
第三个参数是超出记录窗口时的默认值。

表结构及测试数据:

DROP TABLE IF EXISTS `test`;
CREATE TABLE `test` (
    `video_id` int(0) NOT NULL COMMENT '视频ID',
    `dt` date NULL DEFAULT NULL,
    `if_follow` tinyint(0) NULL DEFAULT NULL COMMENT '是否关注'
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_ai_ci ROW_FORMAT = Dynamic;
-- ----------------------------
-- Records of test
-- ----------------------------
INSERT INTO `test` VALUES (2001, '2021-09-24', 1);
INSERT INTO `test` VALUES (2001, '2021-10-03', 9);
INSERT INTO `test` VALUES (2001, '2021-10-02', 2);
INSERT INTO `test` VALUES (2001, '2021-10-01', 6);
INSERT INTO `test` VALUES (2002, '2021-09-25', 1);
INSERT INTO `test` VALUES (2002, '2021-09-25', 1);
INSERT INTO `test` VALUES (2002, '2021-09-26', 6);
INSERT INTO `test` VALUES (2002, '2021-09-27', 1);
INSERT INTO `test` VALUES (2002, '2021-09-28', 1);
INSERT INTO `test` VALUES (2002, '2021-09-29', 8);
INSERT INTO `test` VALUES (2002, '2021-09-30', 7);
INSERT INTO `test` VALUES (2002, '2021-10-01', 1);
INSERT INTO `test` VALUES (2002, '2021-10-02', 9);
INSERT INTO `test` VALUES (2002, '2021-10-03', 1);
例1 lag 偏移为负数offset=-1
语法错误,偏移offset,不能为负数

select video_id,dt, lag(dt,-1,'偏移超出了') over(order by dt ) from test ;
1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-1,'偏移超出了') over(order by dt ) from test' at line 1

例2 lag取出向前第0行,即偏移为0
select video_id,dt, lag(dt,0,'偏移超出了') over(order by dt ) from test ;
http://dtstack-static.oss-cn-hangzhou.aliyuncs.com/2021bbs/files_user1/article/078adc74d641c05bf60c5adb805b86f6..png
  

例3 lag取出向前第2行,即偏移为2
select video_id,dt, lag(dt,2,'偏移超出了') over(order by dt ) from test ;
http://dtstack-static.oss-cn-hangzhou.aliyuncs.com/2021bbs/files_user1/article/f569d0752be2f82d3fe3b2d319777119..png
  

例4 换个字段,lag取出向前第2行,即向前偏移为2
select video_id,dt, lag(video_id,2,'偏移超出了') over(order by dt ) from test ;
http://dtstack-static.oss-cn-hangzhou.aliyuncs.com/2021bbs/files_user1/article/15b5d9eb83a52f5ebf1f8dd82481da40..png
  

例5 lead取出向后第2行,即向后偏移2
select video_id,dt, lead(video_id,2,'偏移超出了') over(order by dt ) from test ;
http://dtstack-static.oss-cn-hangzhou.aliyuncs.com/2021bbs/files_user1/article/a9981bf61d0c1d5795c15081a890c4df..png
  

例6 lead取出向后第2行,即向后偏移2,不加默认值
select video_id,dt, lead(video_id,2) over(order by dt ) from test ;
http://dtstack-static.oss-cn-hangzhou.aliyuncs.com/2021bbs/files_user1/article/bb7ae0bf5af4614dd81ccb3a8d0a558a..png
  

下面可能是Oracle函数,mysql没能使用出来:

first_value() over()和last_value() over(),分别是求分组中第一个和最后一个

ratio_to_report() over(partition by ... order by ...):ratio_to_report() 括号中就是分子,over() 括号中就是分母

percent_rank() over(partition by ... order by ...)




免责申明:


本文系转载,版权归原作者所有,如若侵权请联系我们进行删除!

《数据治理行业实践白皮书》下载地址:https://fs80.cn/4w2atu

《数栈V6.0产品白皮书》下载地址:
https://fs80.cn/cw0iw1

想了解或咨询更多有关袋鼠云大数据产品、行业解决方案、客户案例的朋友,浏览袋鼠云官网:
https://www.dtstack.com/?src=bbs

同时,欢迎对大数据开源项目有兴趣的同学加入「袋鼠云开源框架钉钉技术群」,交流最新开源技术信息,群号码:30537511,项目地址:
https://github.com/DTStack

0条评论
社区公告
  • 大数据领域最专业的产品&技术交流社区,专注于探讨与分享大数据领域有趣又火热的信息,专业又专注的数据人园地

最新活动更多
微信扫码获取数字化转型资料
钉钉扫码加入技术交流群