PostgreSQL LIST、RANGE 表分区的实现方案

简介

pg分区：就是把逻辑上的一个大表分割成物理上的几块。

分区的优点

1. 某些类型的查询性能得到提升

2. 更新的性能也可以得到提升，因为某块的索引要比在整个数据集上的索引要小。

3. 批量删除可以通过简单的删除某个分区来实现。

4. 可以将很少用的数据移动到便宜的、转速慢的存储介质上。

分区实现原理

10.x版本之前pg表分区的实现原理：pg中是通过表的继承来实现的，建立一个主表，里面是空的，然后每个分区去继承它。无论何时，该主表里面都必须是空的

官网建议：只有当表本身大小超过了机器物理内存的实际大小时，才考虑分区。

原分区用法

以继承表的方式实现：

create table tbl( a int, b varchar(10) ); 
create table tbl_1 ( check ( a <= 1000 ) ) inherits (tbl); 
create table tbl_2 ( check ( a <= 10000 and a >1000 ) ) inherits (tbl);
create table tbl_3 ( check ( a <= 100000 and a >10000 ) ) inherits (tbl);

再通过创建触发器或者规则，实现数据分发，只需要向子表插入数据则会自动分配到子表中

create or replace function tbl_part_tg() 
returns trigger as $$ 
begin 
 if ( new. a <= 1000 ) then 
 insert into tbl_1 values (new.*); 
 elsif ( new. a > 1000 and new.a <= 10000 ) then 
 insert into tbl_2 values (new.*); 
 elsif ( new. a > 10000 and new.a <= 100000 ) then 
 insert into tbl_3 values (new.*); 
 elsif ( new. a > 100000 and new.a <= 1000000 ) then 
 insert into tbl_4 values (new.*); 
 else raise exception 'data out of range!'; 
 end if;
 return null; 
end;
 $$ 
language plpgsql; 
create trigger insert_tbl_part_tg
  before insert on tbl 
for each row execute procedure tbl_part_tg();

分区创建成功

如何实现分区过滤？

对于分区表来说，如果有50个分区表，对于某个条件的值如果能确定，那么很可能直接过滤掉49个分区，大大提高扫描速度，当然分区表也能放在不同的物理盘上，提高io速度。

对于查询是怎么实现分区表过滤呢？

约束排除是否使用约束排除通过postgresql.conf中参数constraint_exclusion 来控制，

只有三个值

 constraint_exclusion = on

on:所有情况都会进行约束排除检查

off:关闭，所有约束都不生效

partition：对分区表或者继承表进行约束排查，默认为partition

如：

select *from tbl where a = 12345;

首先找到主表tbl，然后通过tbl找到它的子表，找到后再对再拿着谓词条件a = 12345对一个个子表约束进行检查，不符合条件表就去掉不扫描，实现分区表过滤，下面简单介绍下约束排除源码逻辑。

如何实现数据分发?

基于规则的话，会在查询重写阶段按时替换规则生成新的插入语句，基于触发器会在insert主表前触发另外一个insert操作，这两个逻辑都比较简单，相关代码不再介绍。

错误描述：在新建分区主表时提示以下错误信息

错误原因：在本地postgresql.conf 配置了 search_path = ‘$user’ ,所以在使用的时候需要先创建当前用户对应的schema,如果不存在，则会提示错误

解决方法：在创建表时指定创建的schemal,即可成功。

postgresql 10.x list分区方案

postgres=# create table list_parted (
postgres(# a int
postgres(# ) partition by list (a);
create table
postgres=# create table part_1 partition of list_parted for values in (1);
create table
postgres=# create table part_2 partition of list_parted for values in (2);
create table
postgres=# create table part_3 partition of list_parted for values in (3);
create table
postgres=# create table part_4 partition of list_parted for values in (4);
create table
postgres=# create table part_5 partition of list_parted for values in (5);
create table
postgres=#
postgres=# insert into list_parted values(32); --faled
error: no partition of relation "list_parted" found for row
detail: failing row contains (32).
postgres=# insert into part_1 values(1);
insert 0 1
postgres=# insert into part_1 values(2);--faled
error: new row for relation "part_1" violates partition constraint
detail: failing row contains (2).
postgres=# explain select *from list_parted where a =1;
       query plan
-----------------------------------------------------------------
 append (cost=0.00..41.88 rows=14 width=4)
 -> seq scan on list_parted (cost=0.00..0.00 rows=1 width=4)
   filter: (a = 1)
 -> seq scan on part_1 (cost=0.00..41.88 rows=13 width=4)
   filter: (a = 1)
(5 rows)

上面是list分区表，建表是先建主表，再建子表，子表以 partition of 方式说明和主表关系，约束条件应该就是后面的in里面。

explain 执行sql解析计划

cost：数据库自定义的消耗单位，通过统计信息来估计sql消耗。（查询分析是根据analyze的固执生成的，生成之后按照这个查询计划执行，执行过程中analyze是不会变的。所以如果估值和真是情况差别较大，就会影响查询计划的生成。）

rows：根据统计信息估计sql返回结果集的行数。

width：返回结果集每一行的长度，这个长度值是根据pg_statistic表中的统计信息来计算的。

postgresql 10.x range分区

创建range分区

postgres=# create table range_parted (
postgres(# a int
postgres(# ) partition by range (a);
create table
postgres=# create table range_parted1 partition of range_parted for values from (1) to (1000);
create table
postgres=# create table range_parted2 partition of range_parted for values from (1000) to (10000);
create table
postgres=# create table range_parted3 partition of range_parted for values from (10000) to (100000);
create table
postgres=#
postgres=# insert into range_parted1 values(343);
insert 0 1
postgres=#
postgres=# explain select *from range_parted where a=32425;
        query plan
---------------------------------------------------------------------
 append (cost=0.00..41.88 rows=14 width=4)
 -> seq scan on range_parted (cost=0.00..0.00 rows=1 width=4)
   filter: (a = 32425)
 -> seq scan on range_parted3 (cost=0.00..41.88 rows=13 width=4)
   filter: (a = 32425)
(5 rows)
postgres=# set constraint_exclusion = off;
set
postgres=# explain select *from range_parted where a=32425;
        query plan
---------------------------------------------------------------------
 append (cost=0.00..125.63 rows=40 width=4)
 -> seq scan on range_parted (cost=0.00..0.00 rows=1 width=4)
   filter: (a = 32425)
 -> seq scan on range_parted1 (cost=0.00..41.88 rows=13 width=4)
   filter: (a = 32425)
 -> seq scan on range_parted2 (cost=0.00..41.88 rows=13 width=4)
   filter: (a = 32425)
 -> seq scan on range_parted3 (cost=0.00..41.88 rows=13 width=4)
   filter: (a = 32425)
(9 rows)

上述操作中的 a的取值范围为【0，1000）即插入值若为1000边界值，则会保存在第二个分区表中和list差不多，就是语法略有不同，范围表值是一个连续的范围，list表是单点或多点的集合。

从上面例子可以看到，显然还是走的约束排除过滤子表的方式。

constraint_exclusion = “on ,off,partition ”; 该参数为postgresql.conf中的参数

on 表示所有的查询都会执行约束排除

off 关闭，所有的查询都不会执行约束排除

partition :表示只对分区的表进行约束排除

分区列的类型必须支持btree索引接口(几乎涵盖所有类型, 后面会说到检查方法)。

更新后的数据如果超出了所在分区的范围，则会报错

postgresql 分区注意事项

语法

1、创建主表

[ partition by { range | list } ( { column_name | ( expression ) } [ collate collation ] [ opclass ] [, ... ] ) ]

2、创建分区

partition of parent_table [ ( 
 { column_name [ column_constraint [ ... ] ] 
 | table_constraint } 
 [, ... ] 
) ] for values partition_bound_spec 
and partition_bound_spec is: 
{ in ( expression [, ...] ) -- list分区 
 | 
 from ( { expression | unbounded } [, ...] ) to ( { expression | unbounded } [, ...] ) } -- range分区, unbounded表示无限小或无限大

语法解释

partition by 指定分区表的类型range或list指定分区列，或表达式作为分区键。

range分区表键：支持指定多列、或多表达式，支持混合（键，非表达式中的列，会自动添加not null的约束）

list分区表键：支持单个列、或单个表达式

分区键必须有对应的btree索引方法的ops(可以查看系统表得到)

select typname from pg_type where oid in (select opcintype from pg_opclass);

主表不会有任何数据，数据会根据分区规则进入对应的分区表

如果插入数据时，分区键的值没有匹配的分区，会报错

不支持全局的unique, primary key, exclude, foreign key约束，只能在对应的分区建立这些约束

分区表和主表的列数量，定义必须完全一致，（包括oid也必须一致，要么都有，要么都没有）

可以为分区表的列单独增加default值，或约束。

用户还可以对分区表增加表级约束

如果新增的分区表check约束，名字与主表的约束名一致，则约束内容必须与主表一致

当用户往主表插入数据库时，记录被自动路由到对应的分区，如果没有合适的分区，则报错

如果更新数据，并且更新后的key导致数据需要移动到另一分区，则会报错，（意思是分区键可以更新，但是不支持更新后的数据移出到别的分区表）

修改主表的字段名，字段类型时，会自动同时修改所有的分区

truncate 主表时，会清除所有继承表分区的记录（如果有多级分区，则会一直清除到所有的直接和间接继承的分区）

如果要清除单个分区，请对分区进行操作

如果要删除分区表，可以使用drop table的ddl语句，注意这个操作会对主表也加access exclusive lock。

补充：对postgresql语法分析中 targetlist 的理解

在 gram.y 中:

simple_select:                
      select  opt_distinct  target_list          
      into_clause   from_clause   where_clause          
      group_clause   having_clause  window_clause          
        {        
          selectstmt *n = makenode(selectstmt);      
          n->distinctclause = $2;      
          n->targetlist = $3;      
          n->intoclause = $4;      
          n->fromclause = $5;      
          n->whereclause = $6;      
          n->groupclause = $7;      
          n->havingclause = $8;      
          n->windowclause = $9;      
          $$ = (node *)n;      
        }        
……

把它修改一下，增加：

simple_select:                
      select  opt_distinct  target_list          
      into_clause   from_clause   where_clause          
      group_clause   having_clause  window_clause          
        {        
          selectstmt *n = makenode(selectstmt);      
          n->distinctclause = $2;      
          n->targetlist = $3;      
          n->intoclause = $4;      
          n->fromclause = $5;      
          n->whereclause = $6;      
          n->groupclause = $7;      
          n->havingclause = $8;      
          n->windowclause = $9;      
          $$ = (node *)n;
          fprintf(stderr,"length of list: %d\n", n->targetlist->length);      
        }      
……

psql 中执行：

select id, name from a8;

后台出现：

length of list: 2

以上为个人经验，希望能给大家一个参考，也希望大家多多支持www.887551.com。如有错误或未考虑完全的地方，望不吝赐教。