概述
分析函数是以一定的方法在一个与当前行相关的结果子集中进行计算,也称为窗口函数。
一般结构为
Function(arg1 , arg2 ……) over(partition by clause order by clause windowing clause )
Windowing clause : rows | range between start_expr and end_expr
Start_expr is unbounded preceding | current row | n preceding | n following
End_expr is unbounded following | current row | n preceding | n following
不是所有的分析函数都支持开窗子句。
创建测试表
SH@ prod> create table sales_fact as 2 select country_name country , country_subregion region , prod_name product , calendar_year year , calendar_week_number week , 3 sum(amount_sold) sale , sum(amount_sold* 4 (case when mod(rownum , 10 ) = 0 then 1.4 5 when mod(rownum , 5)= 0 then 0.6 6 when mod(rownum , 2)= 0 then 0.9 7 when mod(rownum , 2)=1 then 1.2 8 else 1 end ) ) receipts 9 from sales , times , customers , countries , products 10 where sales.time_id = times.time_id and 11 sales.prod_id = products.prod_id and 12 sales.cust_id = customers.cust_id and 13 customers.country_id = countries.country_id 14 group by country_name , country_subregion , prod_name , calendar_year , calendar_week_number ; Table created.
把聚合函数当作分析函数使用
分析函数列只是一列数值,每一行对应一个值,对于查询的其它方面没有任何影响。
从以下查询可以得出以下几点:
1.over分区条件中的列可以不在select列表中,但是必须在数据源中。
2.over排序条件中的列可以不在select列表中,但是必须在数据源中。
3.over排序条件是对所在分区中的数据进行排序,与select语句中的排序无关。但是会影响到分析函数的结果。
4.over中的开窗条件的范围一般仅限于分区本身。rows between unbounded preceding and current row表示从分区的最开始到当前行。
5.分析函数的数据来自结果集(施加了where条件之后的)。
下面的查询中的分析列表示该年从开始到该周的销售累计。
SH@ prod> select year , week , sale , 2 sum(sale) over( partition by region , year 3 order by week 4 rows between unbounded preceding and current row ) running_sum_ytd 5 from sales_fact 6 where country in ('Australia') and product='Xtend Memory' and week < 10 7 order by year , week ; YEAR WEEK SALE RUNNING_SUM_YTD ---------- ---------- ---------- --------------- 1998 1 58.15 58.15 1998 2 29.39 87.54 1998 3 29.49 117.03 1998 4 29.49 146.52 1998 5 29.8 176.32 1998 6 58.78 235.1 1998 9 58.78 293.88 1999 1 53.52 53.52 1999 3 94.6 148.12 1999 4 40.5 188.62 1999 5 80.01 268.63 1999 6 40.5 309.13 1999 8 103.11 412.24 1999 9 53.34 465.58 2000 1 46.7 46.7 2000 3 93.41 140.11 2000 4 46.54 186.65 2000 5 46.7 233.35 2000 7 70.8 304.15 2000 8 46.54 350.69 2001 1 92.26 92.26 2001 2 118.38 210.64 2001 3 47.24 257.88 2001 4 256.7 514.58 2001 5 93.44 608.02 2001 6 22.44 630.46 2001 7 69.96 700.42 YEAR WEEK SALE RUNNING_SUM_YTD ---------- ---------- ---------- --------------- 2001 8 46.06 746.48 2001 9 92.67 839.15 29 rows selected.
结果与上面相同,只是排序不同方式,分析列看起来就没有规律了。
SH@ prod> select year , week , sale , 2 sum(sale) over( partition by region , year 3 order by week 4 rows between unbounded preceding and current row ) running_sum_ytd 5 from sales_fact 6 where country in ('Australia') and product='Xtend Memory' and week < 10 7 order by year , sale ; YEAR WEEK SALE RUNNING_SUM_YTD ---------- ---------- ---------- --------------- 1998 2 29.39 87.54 1998 4 29.49 146.52 1998 3 29.49 117.03 1998 5 29.8 176.32 1998 1 58.15 58.15 1998 6 58.78 235.1 1998 9 58.78 293.88 1999 4 40.5 188.62 1999 6 40.5 309.13 1999 9 53.34 465.58 1999 1 53.52 53.52 1999 5 80.01 268.63 1999 3 94.6 148.12 1999 8 103.11 412.24 2000 4 46.54 186.65 2000 8 46.54 350.69 2000 1 46.7 46.7 2000 5 46.7 233.35 2000 7 70.8 304.15 2000 3 93.41 140.11 2001 6 22.44 630.46 2001 8 46.06 746.48 2001 3 47.24 257.88 2001 7 69.96 700.42 2001 1 92.26 92.26 2001 9 92.67 839.15 2001 5 93.44 608.02 YEAR WEEK SALE RUNNING_SUM_YTD ---------- ---------- ---------- --------------- 2001 2 118.38 210.64 2001 4 256.7 514.58 29 rows selected.
分区中的排序选取不恰当,则分析列结果没有什么意义了。分区开窗排序的选取与分析列的结果密切相关。
SH@ prod> select year , week , sale , 2 sum(sale) over( partition by region , year 3 order by sale 4 rows between unbounded preceding and current row ) running_sum_ytd 5 from sales_fact 6 where country in ('Australia') and product='Xtend Memory' and week < 10 7 order by year , week ; YEAR WEEK SALE RUNNING_SUM_YTD ---------- ---------- ---------- --------------- 1998 1 58.15 176.32 1998 2 29.39 29.39 1998 3 29.49 88.37 1998 4 29.49 58.88 1998 5 29.8 118.17 1998 6 58.78 235.1 1998 9 58.78 293.88 1999 1 53.52 187.86 1999 3 94.6 362.47 1999 4 40.5 40.5 1999 5 80.01 267.87 1999 6 40.5 81 1999 8 103.11 465.58 1999 9 53.34 134.34 2000 1 46.7 186.48 2000 3 93.41 350.69 2000 4 46.54 46.54 2000 5 46.7 139.78 2000 7 70.8 257.28 2000 8 46.54 93.08 2001 1 92.26 277.96 2001 2 118.38 582.45 2001 3 47.24 115.74 2001 4 256.7 839.15 2001 5 93.44 464.07 2001 6 22.44 22.44 2001 7 69.96 185.7 YEAR WEEK SALE RUNNING_SUM_YTD ---------- ---------- ---------- --------------- 2001 8 46.06 68.5 2001 9 92.67 370.63 29 rows selected.
分析函数的执行计划
虽然有分析函数还是只需要一次全表扫描,但是需要排序。
WINDOW SORT是分析函数的典型特征。
SH@ prod> explain plan for 2 select year , week , sale , 3 sum(sale) over( partition by region , year 4 order by sale 5 rows between unbounded preceding and current row ) running_sum_ytd 6 from sales_fact 7 where country in ('Australia') and product='Xtend Memory' and week < 10 8 order by year , week ; Explained. SH@ prod> select * from table(dbms_xplan.display()) ; PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------ Plan hash value: 173857439 ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 18 | 1890 | 311 (1)| 00:00:04 | | 1 | SORT ORDER BY | | 18 | 1890 | 311 (1)| 00:00:04 | | 2 | WINDOW SORT | | 18 | 1890 | 311 (1)| 00:00:04 | |* 3 | TABLE ACCESS FULL| SALES_FACT | 18 | 1890 | 309 (1)| 00:00:04 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("COUNTRY"='Australia' AND "PRODUCT"='Xtend Memory' AND "WEEK"<10) Note ----- - dynamic sampling used for this statement (level=2) 说明该表还没有统计信息。 20 rows selected.
不加分析列,只是少了一步window sort。
SH@ prod> explain plan for 2 select year , week , sale 3 from sales_fact 4 where country in ('Australia') and product='Xtend Memory' and week < 10 5 order by year , week ; Explained. SH@ prod> select * from table(dbms_xplan.display()) ; PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------ Plan hash value: 1978576542 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 18 | 1584 | 310 (1)| 00:00:04 | | 1 | SORT ORDER BY | | 18 | 1584 | 310 (1)| 00:00:04 | |* 2 | TABLE ACCESS FULL| SALES_FACT | 18 | 1584 | 309 (1)| 00:00:04 | --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("COUNTRY"='Australia' AND "PRODUCT"='Xtend Memory' AND "WEEK"<10) Note ----- - dynamic sampling used for this statement (level=2) 19 rows selected.
如何使窗口充满整个分区
SH@ prod> select year , week , sale , max(sale) over(partition by product , country , region , year 2 order by week 3 rows between unbounded preceding and unbounded following ) 4 max_sale 5 from sales_fact 6 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 7 order by product , country , year , week ; YEAR WEEK SALE MAX_SALE ---------- ---------- ---------- ---------- 1998 1 58.15 58.78 1998 2 29.39 58.78 1998 3 29.49 58.78 1998 4 29.49 58.78 1998 5 29.8 58.78 1998 6 58.78 58.78 1998 9 58.78 58.78 1999 1 53.52 103.11 1999 3 94.6 103.11 1999 4 40.5 103.11 1999 5 80.01 103.11 1999 6 40.5 103.11 1999 8 103.11 103.11 1999 9 53.34 103.11 2000 1 46.7 93.41 2000 3 93.41 93.41 2000 4 46.54 93.41 2000 5 46.7 93.41 2000 7 70.8 93.41 2000 8 46.54 93.41 2001 1 92.26 256.7 2001 2 118.38 256.7 2001 3 47.24 256.7 2001 4 256.7 256.7 2001 5 93.44 256.7 2001 6 22.44 256.7 2001 7 69.96 256.7 YEAR WEEK SALE MAX_SALE ---------- ---------- ---------- ---------- 2001 8 46.06 256.7 2001 9 92.67 256.7 29 rows selected.
两个边界都滑动的窗口
下面语句的窗口是往前两周,加往后两周,加当前周,一共五周。(到达边界时窗口会自动缩小)
SH@ prod> select year , week , sale , max(sale) over(partition by product , country , region , year 2 order by week 3 rows between 2 preceding and 2 following ) 4 max_sale 5 from sales_fact 6 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 7 order by product , country , year , week ; YEAR WEEK SALE MAX_SALE ---------- ---------- ---------- ---------- 1998 1 58.15 58.15 1998 2 29.39 58.15 1998 3 29.49 58.15 1998 4 29.49 58.78 1998 5 29.8 58.78 1998 6 58.78 58.78 1998 9 58.78 58.78 1999 1 53.52 94.6 1999 3 94.6 94.6 1999 4 40.5 94.6 1999 5 80.01 103.11 1999 6 40.5 103.11 1999 8 103.11 103.11 1999 9 53.34 103.11 2000 1 46.7 93.41 2000 3 93.41 93.41 2000 4 46.54 93.41 2000 5 46.7 93.41 2000 7 70.8 70.8 2000 8 46.54 70.8 这里只所以是70.8因为窗口缩小了。 2001 1 92.26 118.38 2001 2 118.38 256.7 2001 3 47.24 256.7 2001 4 256.7 256.7 2001 5 93.44 256.7 2001 6 22.44 256.7 2001 7 69.96 93.44 YEAR WEEK SALE MAX_SALE ---------- ---------- ---------- ---------- 2001 8 46.06 92.67 2001 9 92.67 92.67 29 rows selected.
默认窗口是什么?
一看便知。
SH@ prod> select year , week , sale , max(sale) over(partition by product , country , region , year 2 order by week ) 3 max_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , week ; YEAR WEEK SALE MAX_SALE ---------- ---------- ---------- ---------- 1998 1 58.15 58.15 1998 2 29.39 58.15 1998 3 29.49 58.15 1998 4 29.49 58.15 1998 5 29.8 58.15 1998 6 58.78 58.78 1998 9 58.78 58.78 1999 1 53.52 53.52 1999 3 94.6 94.6 1999 4 40.5 94.6 1999 5 80.01 94.6 1999 6 40.5 94.6 1999 8 103.11 103.11 1999 9 53.34 103.11 2000 1 46.7 46.7 2000 3 93.41 93.41 2000 4 46.54 93.41 2000 5 46.7 93.41 2000 7 70.8 93.41 2000 8 46.54 93.41 2001 1 92.26 92.26 2001 2 118.38 118.38 2001 3 47.24 118.38 2001 4 256.7 256.7 2001 5 93.44 256.7 2001 6 22.44 256.7 2001 7 69.96 256.7 YEAR WEEK SALE MAX_SALE ---------- ---------- ---------- ---------- 2001 8 46.06 256.7 2001 9 92.67 256.7 29 rows selected.
Lead和Lag(不支持开窗的函数)
有开窗语句时会报这样的错
rows between 2 preceding and 2 following ) * ERROR at line 3: ORA-00907: missing right parenthesis
LEAD是求下一个,而不是前一个。在分区的下边界处,LEAD处回空值。
SH@ prod> select year , week , sale , lead(sale) over(partition by product , country , region , year 2 order by week ) 3 former_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , week ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 1 58.15 29.39 1998 2 29.39 29.49 1998 3 29.49 29.49 1998 4 29.49 29.8 1998 5 29.8 58.78 1998 6 58.78 58.78 1998 9 58.78 1999 1 53.52 94.6 1999 3 94.6 40.5 1999 4 40.5 80.01 1999 5 80.01 40.5 1999 6 40.5 103.11 1999 8 103.11 53.34 1999 9 53.34 2000 1 46.7 93.41 2000 3 93.41 46.54 2000 4 46.54 46.7 2000 5 46.7 70.8 2000 7 70.8 46.54 2000 8 46.54 2001 1 92.26 118.38 2001 2 118.38 47.24 2001 3 47.24 256.7 2001 4 256.7 93.44 2001 5 93.44 22.44 2001 6 22.44 69.96 2001 7 69.96 46.06 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 8 46.06 92.67 2001 9 92.67 29 rows selected.
LAG求上一个,也就是前一个。在分区的上边界处返回空值。
SH@ prod> select year , week , sale , lag(sale) over(partition by product , country , region , year 2 order by week ) 3 former_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , week ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 1 58.15 1998 2 29.39 58.15 1998 3 29.49 29.39 1998 4 29.49 29.49 1998 5 29.8 29.49 1998 6 58.78 29.8 1998 9 58.78 58.78 1999 1 53.52 1999 3 94.6 53.52 1999 4 40.5 94.6 1999 5 80.01 40.5 1999 6 40.5 80.01 1999 8 103.11 40.5 1999 9 53.34 103.11 2000 1 46.7 2000 3 93.41 46.7 2000 4 46.54 93.41 2000 5 46.7 46.54 2000 7 70.8 46.7 2000 8 46.54 70.8 2001 1 92.26 2001 2 118.38 92.26 2001 3 47.24 118.38 2001 4 256.7 47.24 2001 5 93.44 256.7 2001 6 22.44 93.44 2001 7 69.96 22.44 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 8 46.06 69.96 2001 9 92.67 46.06 29 rows selected.
复杂的Lead和Lag
Lead和lag函数的第一参数为返回的列,第二参数为相隔行数(非负),第三个参数为不存在时的默认值(可以指定为当前行的值)。
SH@ prod> select year , week , sale , lag(sale , 2 , 0 ) over(partition by product , country , region , year 2 order by week ) 3 former_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , week ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 1 58.15 0 1998 2 29.39 0 1998 3 29.49 58.15 1998 4 29.49 29.39 1998 5 29.8 29.49 1998 6 58.78 29.49 1998 9 58.78 29.8 1999 1 53.52 0 1999 3 94.6 0 1999 4 40.5 53.52 1999 5 80.01 94.6 1999 6 40.5 40.5 1999 8 103.11 80.01 1999 9 53.34 40.5 2000 1 46.7 0 2000 3 93.41 0 2000 4 46.54 46.7 2000 5 46.7 93.41 2000 7 70.8 46.54 2000 8 46.54 46.7 2001 1 92.26 0 2001 2 118.38 0 2001 3 47.24 92.26 2001 4 256.7 118.38 2001 5 93.44 47.24 2001 6 22.44 256.7 2001 7 69.96 93.44 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 8 46.06 22.44 2001 9 92.67 69.96 29 rows selected.
将默认值指定为当前行的值。
SH@ prod> select year , week , sale , lag(sale , 2 , sale ) over(partition by product , country , region , year 2 order by week ) 3 former_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , week ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 1 58.15 58.15 1998 2 29.39 29.39 1998 3 29.49 58.15 1998 4 29.49 29.39 1998 5 29.8 29.49 1998 6 58.78 29.49 1998 9 58.78 29.8 1999 1 53.52 53.52 1999 3 94.6 94.6 1999 4 40.5 53.52 1999 5 80.01 94.6 1999 6 40.5 40.5 1999 8 103.11 80.01 1999 9 53.34 40.5 2000 1 46.7 46.7 2000 3 93.41 93.41 2000 4 46.54 46.7 2000 5 46.7 93.41 2000 7 70.8 46.54 2000 8 46.54 46.7 2001 1 92.26 92.26 2001 2 118.38 118.38 2001 3 47.24 92.26 2001 4 256.7 118.38 2001 5 93.44 47.24 2001 6 22.44 256.7 2001 7 69.96 93.44 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 8 46.06 22.44 2001 9 92.67 69.96 29 rows selected.
LEAD与LAG关于数据缺口的问题
LAG(sale , 10 ) 这表示与它相隔10行的数据,可是我想访问的10周前的数据。如果中间数据有缺口会出现严重的问题。
FIRST_VALUE和LAST_VALUE
这两个函数都可以与order by条件配合得到最大值和最小值。
First_value返回窗口中的第一个值。Ignore nulls表示忽略空值,如果第一个是空值返回第二个。
SH@ prod> select year , week , sale , first_value(sale ignore nulls) over(partition by product , country , region , year 2 order by week 3 rows between unbounded preceding and unbounded following ) 4 former_sale 5 from sales_fact 6 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 7 order by product , country , year , week ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 1 58.15 58.15 1998 2 29.39 58.15 1998 3 29.49 58.15 1998 4 29.49 58.15 1998 5 29.8 58.15 1998 6 58.78 58.15 1998 9 58.78 58.15 1999 1 53.52 53.52 1999 3 94.6 53.52 1999 4 40.5 53.52 1999 5 80.01 53.52 1999 6 40.5 53.52 1999 8 103.11 53.52 1999 9 53.34 53.52 2000 1 46.7 46.7 2000 3 93.41 46.7 2000 4 46.54 46.7 2000 5 46.7 46.7 2000 7 70.8 46.7 2000 8 46.54 46.7 2001 1 92.26 92.26 2001 2 118.38 92.26 2001 3 47.24 92.26 2001 4 256.7 92.26 2001 5 93.44 92.26 2001 6 22.44 92.26 2001 7 69.96 92.26 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 8 46.06 92.26 2001 9 92.67 92.26 29 rows selected.
Last_value返回窗口中的最后一个值。Respect nulls表示识别空值,如果最后一个是空值也将其返回。
SH@ prod> select year , week , sale , last_value(sale respect nulls) over(partition by product , country , region , year 2 order by week 3 rows between unbounded preceding and unbounded following ) 4 former_sale 5 from sales_fact 6 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 7 order by product , country , year , week ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 1 58.15 58.78 1998 2 29.39 58.78 1998 3 29.49 58.78 1998 4 29.49 58.78 1998 5 29.8 58.78 1998 6 58.78 58.78 1998 9 58.78 58.78 1999 1 53.52 53.34 1999 3 94.6 53.34 1999 4 40.5 53.34 1999 5 80.01 53.34 1999 6 40.5 53.34 1999 8 103.11 53.34 1999 9 53.34 53.34 2000 1 46.7 46.54 2000 3 93.41 46.54 2000 4 46.54 46.54 2000 5 46.7 46.54 2000 7 70.8 46.54 2000 8 46.54 46.54 2001 1 92.26 92.67 2001 2 118.38 92.67 2001 3 47.24 92.67 2001 4 256.7 92.67 2001 5 93.44 92.67 2001 6 22.44 92.67 2001 7 69.96 92.67 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 8 46.06 92.67 2001 9 92.67 92.67 29 rows selected.
NTH_VALUE访问分区别的任意指定行
FIRST_VALUE相当于NTH_VALUE(sale , 1 )或者NTH_VALUE(sale , 1 )from first respect nulls。
可以与排序配合求第几大,第几小。
SH@ prod> select year , week , sale , nth_value(sale , 1 ) from last ignore nulls over(partition by product , country , region , year 2 order by week 3 rows between unbounded preceding and unbounded following ) 4 former_sale 5 from sales_fact 6 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 7 order by product , country , year , week ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 1 58.15 58.78 1998 2 29.39 58.78 1998 3 29.49 58.78 1998 4 29.49 58.78 1998 5 29.8 58.78 1998 6 58.78 58.78 1998 9 58.78 58.78 1999 1 53.52 53.34 1999 3 94.6 53.34 1999 4 40.5 53.34 1999 5 80.01 53.34 1999 6 40.5 53.34 1999 8 103.11 53.34 1999 9 53.34 53.34 2000 1 46.7 46.54 2000 3 93.41 46.54 2000 4 46.54 46.54 2000 5 46.7 46.54 2000 7 70.8 46.54 2000 8 46.54 46.54 2001 1 92.26 92.67 2001 2 118.38 92.67 2001 3 47.24 92.67 2001 4 256.7 92.67 2001 5 93.44 92.67 2001 6 22.44 92.67 2001 7 69.96 92.67 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 8 46.06 92.67 2001 9 92.67 92.67 29 rows selected.
RANK函数(不能开窗,作用于整个分区)
必须有排序条件,rank就是根据order by条件中的列来定排名的。
RANK函数的排名中,如果出现并列,排名将不连续。
如:1 2(2) 4 5 6 7 8 9 。 如果有两个第二名,那么第三名就不存在了。
请注意空值,在排序子句中可以使用NULLS LAST来把空值放在最后面。
SH@ prod> select year , week , sale , rank() over(partition by product , country , region , year 2 order by sale ) 3 former_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , week ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 1 58.15 5 没有3 1998 2 29.39 1 1998 3 29.49 2 1998 4 29.49 2 1998 5 29.8 4 1998 6 58.78 6 1998 9 58.78 6 1999 1 53.52 4 1999 3 94.6 6 1999 4 40.5 1 1999 5 80.01 5 1999 6 40.5 1 1999 8 103.11 7 1999 9 53.34 3 2000 1 46.7 3 2000 3 93.41 6 2000 4 46.54 1 2000 5 46.7 3 2000 7 70.8 5 2000 8 46.54 1 2001 1 92.26 5 2001 2 118.38 8 2001 3 47.24 3 2001 4 256.7 9 2001 5 93.44 7 2001 6 22.44 1 2001 7 69.96 4 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 8 46.06 2 2001 9 92.67 6 29 rows selected.
DENSE_RANK(与RANK的区别在于排名一是连续的)
SH@ prod> select year , week , sale , dense_rank() over(partition by product , country , region , year 2 order by sale ) 3 former_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , week ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 1 58.15 4 第三名是存在的 1998 2 29.39 1 1998 3 29.49 2 1998 4 29.49 2 1998 5 29.8 3 1998 6 58.78 5 1998 9 58.78 5 1999 1 53.52 3 1999 3 94.6 5 1999 4 40.5 1 1999 5 80.01 4 1999 6 40.5 1 1999 8 103.11 6 1999 9 53.34 2 2000 1 46.7 2 2000 3 93.41 4 2000 4 46.54 1 2000 5 46.7 2 2000 7 70.8 3 2000 8 46.54 1 2001 1 92.26 5 2001 2 118.38 8 2001 3 47.24 3 2001 4 256.7 9 2001 5 93.44 7 2001 6 22.44 1 2001 7 69.96 4 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 8 46.06 2 2001 9 92.67 6 29 rows selected.
ROW_NUMBER(不支持开窗,不确定性函数)
为分区中的每一行指定一个递增的编号,如果排序的列的值相同,谁先谁后是随机的。
SH@ prod> select year , week , sale , row_number() over(partition by product , country , region , year 2 order by sale ) 3 former_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , sale ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 2 29.39 1 1998 4 29.49 2 1998 3 29.49 3 1998 5 29.8 4 1998 1 58.15 5 1998 6 58.78 6 1998 9 58.78 7 1999 4 40.5 1 1999 6 40.5 2 1999 9 53.34 3 1999 1 53.52 4 1999 5 80.01 5 1999 3 94.6 6 1999 8 103.11 7 2000 4 46.54 1 2000 8 46.54 2 2000 5 46.7 3 2000 1 46.7 4 2000 7 70.8 5 2000 3 93.41 6 2001 6 22.44 1 2001 8 46.06 2 2001 3 47.24 3 2001 7 69.96 4 2001 1 92.26 5 2001 9 92.67 6 2001 5 93.44 7 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 2 118.38 8 2001 4 256.7 9 29 rows selected.
Ratio_to_report(当前行的值与分区总和的比值)
这个函数不支持排序和开窗。
求各周的销量在每年中的比例以及在整个产品销量中的比例。
SH@ prod> select year , week , sale , 2 trunc(100* ratio_to_report(sale) over(partition by year ) , 2) sales_yr , 3 trunc(100* ratio_to_report(sale) over() , 2 ) sales_prod 4 from sales_fact 5 where country in ('Australia') and product = 'Xtend Memory' and week < 10 6 order by year , week ; YEAR WEEK SALE SALES_YR SALES_PROD ---------- ---------- ---------- ---------- ---------- 1998 1 58.15 19.78 2.98 1998 2 29.39 10 1.5 1998 3 29.49 10.03 1.51 1998 4 29.49 10.03 1.51 1998 5 29.8 10.14 1.52 1998 6 58.78 20 3.01 1998 9 58.78 20 3.01 1999 1 53.52 11.49 2.74 1999 3 94.6 20.31 4.85 1999 4 40.5 8.69 2.07 1999 5 80.01 17.18 4.1 1999 6 40.5 8.69 2.07 1999 8 103.11 22.14 5.28 1999 9 53.34 11.45 2.73 2000 1 46.7 13.31 2.39 2000 3 93.41 26.63 4.79 2000 4 46.54 13.27 2.38 2000 5 46.7 13.31 2.39 2000 7 70.8 20.18 3.63 2000 8 46.54 13.27 2.38 2001 1 92.26 10.99 4.73 2001 2 118.38 14.1 6.07 2001 3 47.24 5.62 2.42 2001 4 256.7 30.59 13.16 2001 5 93.44 11.13 4.79 2001 6 22.44 2.67 1.15 2001 7 69.96 8.33 3.58 YEAR WEEK SALE SALES_YR SALES_PROD ---------- ---------- ---------- ---------- ---------- 2001 8 46.06 5.48 2.36 2001 9 92.67 11.04 4.75 29 rows selected.
Percent_rank(排在前百分之几)
用来求当前行的排名的相对百分位置。
比如你对人说自己是第10名,别人可能觉得没什么,如果是100000中的第10名,那就是前1/10000,那就非常牛了。
这个函数与RANK的推导公式为:
PERCENT_RANK = (RANK – 1) / (N – 1) , N代表总行数。
RANK – 1代表排名大于自己的人数。
N – 1代表除自己以外的总人数。
总体的意思是除自己之外的其它中人,排名比自己高的人所占的比例。
SH@ prod> select year , week , sale , rank() over(partition by product , country , region , year 2 order by sale ) 3 former_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , sale ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 2 29.39 1 1998 4 29.49 2 1998 3 29.49 2 1998 5 29.8 4 1998 1 58.15 5 1998 6 58.78 6 1998 9 58.78 6 1999 4 40.5 1 1999 6 40.5 1 1999 9 53.34 3 1999 1 53.52 4 1999 5 80.01 5 1999 3 94.6 6 1999 8 103.11 7 2000 4 46.54 1 2000 8 46.54 1 2000 5 46.7 3 2000 1 46.7 3 2000 7 70.8 5 2000 3 93.41 6 2001 6 22.44 1 2001 8 46.06 2 2001 3 47.24 3 2001 7 69.96 4 2001 1 92.26 5 2001 9 92.67 6 2001 5 93.44 7 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 2 118.38 8 2001 4 256.7 9 29 rows selected. SH@ prod> select year , week , sale , 100*percent_rank() over(partition by product , country , region , year 2 order by sale ) 3 former_sale 4 from sales_fact 5 where country in ( 'Australia') and product = 'Xtend Memory' and week < 10 6 order by product , country , year , sale ; YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 1998 2 29.39 0 1998 4 29.49 16.6666667 1998 3 29.49 16.6666667 1998 5 29.8 50 1998 1 58.15 66.6666667 1998 6 58.78 83.3333333 1998 9 58.78 83.3333333 1999 4 40.5 0 1999 6 40.5 0 1999 9 53.34 33.3333333 1999 1 53.52 50 1999 5 80.01 66.6666667 1999 3 94.6 83.3333333 1999 8 103.11 100 2000 4 46.54 0 2000 8 46.54 0 2000 5 46.7 40 2000 1 46.7 40 2000 7 70.8 80 2000 3 93.41 100 2001 6 22.44 0 2001 8 46.06 12.5 2001 3 47.24 25 2001 7 69.96 37.5 2001 1 92.26 50 2001 9 92.67 62.5 2001 5 93.44 75 YEAR WEEK SALE FORMER_SALE ---------- ---------- ---------- ----------- 2001 2 118.38 87.5 2001 4 256.7 100 29 rows selected.
Percentile_cont(大体意思求排在某个百分比时所需的数值)
也可以说是,现在说这样一个值,向分区里面插入这个值,其排名在百分之N(percent_rank为N%),求这个值。
如果有一个行的percent_rank正好等于N,那么就是这个么的值。如果没有匹配的,则要计算概率最大的。
SH@ prod> select year , week , sale , 2 percentile_cont(0.5) within group(order by sale desc )over(partition by year) pc , 3 percent_rank() over( partition by year order by sale desc ) pr 4 from sales_fact 5 where country in ('Australia') and product = 'Xtend Memory' and week < 11 ; YEAR WEEK SALE PC PR ---------- ---------- ---------- ---------- ---------- 1998 10 117.76 43.975 0 1998 9 58.78 43.975 .142857143 1998 6 58.78 43.975 .142857143 1998 1 58.15 43.975 .428571429 1998 5 29.8 43.975 .571428571 1998 3 29.49 43.975 .714285714 1998 4 29.49 43.975 .714285714 1998 2 29.39 43.975 1 1999 8 103.11 62.76 0 1999 3 94.6 62.76 .142857143 1999 5 80.01 62.76 .285714286 1999 10 72 62.76 .428571429 1999 1 53.52 62.76 .571428571 1999 9 53.34 62.76 .714285714 1999 6 40.5 62.76 .857142857 1999 4 40.5 62.76 .857142857 2000 3 93.41 46.7 0 2000 7 70.8 46.7 .2 2000 5 46.7 46.7 .4 2000 1 46.7 46.7 .4 2000 4 46.54 46.7 .8 2000 8 46.54 46.7 .8 2001 4 256.7 81.11 0 2001 2 118.38 81.11 .111111111 2001 5 93.44 81.11 .222222222 2001 9 92.67 81.11 .333333333 2001 1 92.26 81.11 .444444444 YEAR WEEK SALE PC PR ---------- ---------- ---------- ---------- ---------- 2001 7 69.96 81.11 .555555556 2001 10 69.05 81.11 .666666667 2001 3 47.24 81.11 .777777778 2001 8 46.06 81.11 .888888889 2001 6 22.44 81.11 1 32 rows selected.
Percentile_disc(功能与Percentile_cont大体相同)
区别在于这个函数取到的值一定是在这个分区的行中的。
如果没有匹配的,Percentile_disc会按照排序取上一个。
SH@ prod> select year , week , sale , 2 percentile_disc(0.5) within group(order by sale desc )over(partition by year) pc , 3 percent_rank() over( partition by year order by sale desc ) pr 4 from sales_fact 5 where country in ('Australia') and product = 'Xtend Memory' and week < 11 ; YEAR WEEK SALE PC PR ---------- ---------- ---------- ---------- ---------- 1998 10 117.76 58.15 0 1998 9 58.78 58.15 .142857143 1998 6 58.78 58.15 .142857143 1998 1 58.15 58.15 .428571429 1998 5 29.8 58.15 .571428571 1998 3 29.49 58.15 .714285714 1998 4 29.49 58.15 .714285714 1998 2 29.39 58.15 1 1999 8 103.11 72 0 1999 3 94.6 72 .142857143 1999 5 80.01 72 .285714286 1999 10 72 72 .428571429 1999 1 53.52 72 .571428571 1999 9 53.34 72 .714285714 1999 6 40.5 72 .857142857 1999 4 40.5 72 .857142857 2000 3 93.41 46.7 0 2000 7 70.8 46.7 .2 2000 5 46.7 46.7 .4 2000 1 46.7 46.7 .4 2000 4 46.54 46.7 .8 2000 8 46.54 46.7 .8 2001 4 256.7 92.26 0 2001 2 118.38 92.26 .111111111 2001 5 93.44 92.26 .222222222 2001 9 92.67 92.26 .333333333 2001 1 92.26 92.26 .444444444 YEAR WEEK SALE PC PR ---------- ---------- ---------- ---------- ---------- 2001 7 69.96 92.26 .555555556 2001 10 69.05 92.26 .666666667 2001 3 47.24 92.26 .777777778 2001 8 46.06 92.26 .888888889 2001 6 22.44 92.26 1 32 rows selected. SH@ prod> select year , week , sale , 2 percentile_cont(0.5) within group(order by sale desc )over(partition by year) pc , 3 percent_rank() over( partition by year order by sale desc ) pr 4 from sales_fact 5 where country in ('Australia') and product = 'Xtend Memory' and week < 11 ; YEAR WEEK SALE PC PR ---------- ---------- ---------- ---------- ---------- 1998 10 117.76 43.975 0 1998 9 58.78 43.975 .142857143 1998 6 58.78 43.975 .142857143 1998 1 58.15 43.975 .428571429 1998 5 29.8 43.975 .571428571 1998 3 29.49 43.975 .714285714 1998 4 29.49 43.975 .714285714 1998 2 29.39 43.975 1 1999 8 103.11 62.76 0 1999 3 94.6 62.76 .142857143 1999 5 80.01 62.76 .285714286 1999 10 72 62.76 .428571429 1999 1 53.52 62.76 .571428571 1999 9 53.34 62.76 .714285714 1999 6 40.5 62.76 .857142857 1999 4 40.5 62.76 .857142857 2000 3 93.41 46.7 0 2000 7 70.8 46.7 .2 2000 5 46.7 46.7 .4 2000 1 46.7 46.7 .4 2000 4 46.54 46.7 .8 2000 8 46.54 46.7 .8 2001 4 256.7 81.11 0 2001 2 118.38 81.11 .111111111 2001 5 93.44 81.11 .222222222 2001 9 92.67 81.11 .333333333 2001 1 92.26 81.11 .444444444 YEAR WEEK SALE PC PR ---------- ---------- ---------- ---------- ---------- 2001 7 69.96 81.11 .555555556 2001 10 69.05 81.11 .666666667 2001 3 47.24 81.11 .777777778 2001 8 46.06 81.11 .888888889 2001 6 22.44 81.11 1 32 rows selected.
NTILE(类型于建立直方图,不支持开窗)
将排序后的数据均匀分配到指定个数据桶中,返回桶编号,如果不能等分,各个桶中的行数最多相差一行。
在以后的处理中可以通过去除首桶或尾去除异常值。
注意:并不是按值分配的。
SH@ prod> select year , week , sale , 2 ntile(10) over(order by sale ) group# 3 from sales_fact 4 where country in ('Australia') and product = 'Xtend Memory' and year = 1998 order by year , sale; YEAR WEEK SALE GROUP# ---------- ---------- ---------- ---------- 1998 50 28.76 1 1998 2 29.39 1 1998 4 29.49 1 1998 3 29.49 1 1998 5 29.8 2 1998 43 57.52 2 1998 35 57.52 2 1998 40 57.52 2 1998 46 57.52 3 1998 27 57.52 3 1998 45 57.52 3 1998 44 57.52 3 1998 47 57.72 4 1998 29 57.72 4 1998 28 57.72 4 1998 1 58.15 4 1998 41 58.32 5 1998 51 58.32 5 1998 14 58.78 5 1998 9 58.78 5 1998 15 58.78 6 1998 17 58.78 6 1998 6 58.78 6 1998 19 58.98 6 1998 21 59.6 7 1998 12 59.6 7 1998 52 86.38 7 YEAR WEEK SALE GROUP# ---------- ---------- ---------- ---------- 1998 34 115.44 8 1998 39 115.84 8 1998 42 115.84 8 1998 38 115.84 9 1998 23 117.56 9 1998 18 117.56 9 1998 26 117.56 10 1998 10 117.76 10 1998 48 172.56 10 36 rows selected.
Stddev计算标准差(方差的平方根,支持开窗)
SH@ prod> select year , week , sale , 2 stddev(sale) over( 3 partition by product , country , region , year 4 order by sale desc 5 rows between 2 preceding and 2 following ) stddv 6 from sales_fact 7 where country in ('Australia') and product = 'Xtend Memory' and week < 10 8 order by year , week ; YEAR WEEK SALE STDDV ---------- ---------- ---------- ---------- 1998 1 58.15 15.8453416 1998 2 29.39 .057735027 1998 3 29.49 .178021534 1998 4 29.49 12.7945918 1998 5 29.8 15.815738 1998 6 58.78 .36373067 1998 9 58.78 14.3880654 1999 1 53.52 22.178931 1999 3 94.6 21.7319902 1999 4 40.5 7.46550065 1999 5 80.01 22.9761992 1999 6 40.5 7.41317746 1999 8 103.11 11.6825953 1999 9 53.34 16.1305511 2000 1 46.7 21.0022332 2000 3 93.41 23.3589605 2000 4 46.54 .092376043 2000 5 46.7 10.8139207 2000 7 70.8 22.4285538 2000 8 46.54 .092376043 2001 1 92.26 20.3811452 2001 2 118.38 78.5152276 2001 3 47.24 26.5077898 2001 4 256.7 87.947194 2001 5 93.44 71.309193 2001 6 22.44 13.9900965 2001 7 69.96 22.9124643 YEAR WEEK SALE STDDV ---------- ---------- ---------- ---------- 2001 8 46.06 19.407678 2001 9 92.67 17.1409691 29 rows selected.
Listagg(把分区中的列按照顺序拼接起来,不支持开窗)
SH@ prod> col stddv for a60 SH@ prod> select year , week , sale , 2 listagg(sale , ' , ')within group(order by sale desc) over( 3 partition by product , country , region , year ) stddv 4 from sales_fact 5 where country in ('Australia') and product = 'Xtend Memory' and week < 5 6 order by year , week ; YEAR WEEK SALE STDDV ---------- ---------- ---------- ------------------------------------------------------------ 1998 1 58.15 58.15 , 29.49 , 29.49 , 29.39 1998 2 29.39 58.15 , 29.49 , 29.49 , 29.39 1998 3 29.49 58.15 , 29.49 , 29.49 , 29.39 1998 4 29.49 58.15 , 29.49 , 29.49 , 29.39 1999 1 53.52 94.6 , 53.52 , 40.5 1999 3 94.6 94.6 , 53.52 , 40.5 1999 4 40.5 94.6 , 53.52 , 40.5 2000 1 46.7 93.41 , 46.7 , 46.54 2000 3 93.41 93.41 , 46.7 , 46.54 2000 4 46.54 93.41 , 46.7 , 46.54 2001 1 92.26 256.7 , 118.38 , 92.26 , 47.24 2001 2 118.38 256.7 , 118.38 , 92.26 , 47.24 2001 3 47.24 256.7 , 118.38 , 92.26 , 47.24 2001 4 256.7 256.7 , 118.38 , 92.26 , 47.24 14 rows selected.
分析函数对谓词前推的影响
使用了分析函数的视图,会影响视图前推,因为分析函数的结果是跨行引用得来的,如果对数据源进行的剪裁,结果可能会不一样。
SH@ prod> create or replace view max_5_weeks_vw as 2 select country , product , region , year , week , sale , 3 max(sale) over( 4 partition by product , country , region , year order by year , week 5 rows between 2 preceding and 2 following ) max_weeks_5 6 from sales_fact ; View created. SH@ prod> select year , week , sale , max_weeks_5 from max_5_weeks_vw 2 where country in ('Australia' ) and product = 'Xtend Memory' 3 and region = 'Australia' and year = 2000 and week < 14 4 order by year , week ; YEAR WEEK SALE MAX_WEEKS_5 ---------- ---------- ---------- ----------- 2000 1 46.7 93.41 2000 3 93.41 93.41 2000 4 46.54 93.41 2000 5 46.7 93.41 2000 7 70.8 93.74 2000 8 46.54 93.74 2000 11 93.74 117.5 2000 12 46.54 117.67 2000 13 117.5 117.67 9 rows selected. SH@ prod> explain plan for 2 select year , week , sale , max_weeks_5 from max_5_weeks_vw 3 where country in ('Australia' ) and product = 'Xtend Memory' 4 and region = 'Australia' and year = 2000 and week < 14 5 order by year , week ; Explained. SH@ prod> select * from table(dbms_xplan.display()); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------ Plan hash value: 4167461139 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 90 | 5220 | 310 (1)| 00:00:04 | |* 1 | VIEW | MAX_5_WEEKS_VW | 90 | 5220 | 310 (1)| 00:00:04 | | 2 | WINDOW SORT | | 90 | 9450 | 310 (1)| 00:00:04 | |* 3 | TABLE ACCESS FULL| SALES_FACT | 90 | 9450 | 309 (1)| 00:00:04 | -------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("WEEK"<14) 3 - filter("COUNTRY"='Australia' AND "PRODUCT"='Xtend Memory' AND "REGION"='Australia' AND "YEAR"=2000) Note ----- - dynamic sampling used for this statement (level=2) 21 rows selected.
对比没有分析函数的视图。直接将谓词推入到视图里面。
SH@ prod> create or replace view max_5_weeks_vw1 as 2 select country , product , region , year , week , sale 3 from sales_fact ; View created. SH@ prod> explain plan for 2 select year , week , sale from max_5_weeks_vw1 3 where country in ('Australia' ) and product = 'Xtend Memory' 4 and region = 'Australia' and year = 2000 and week < 14 5 order by year , week ; Explained. SH@ prod> select * from table(dbms_xplan.display()); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------ Plan hash value: 1978576542 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 105 | 310 (1)| 00:00:04 | | 1 | SORT ORDER BY | | 1 | 105 | 310 (1)| 00:00:04 | |* 2 | TABLE ACCESS FULL| SALES_FACT | 1 | 105 | 309 (1)| 00:00:04 | --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("COUNTRY"='Australia' AND "PRODUCT"='Xtend Memory' AND "REGION"='Australia' AND "YEAR"=2000 AND "WEEK"<14) Note