使用single table hash cluster提高检索性能

single table hash cluster是cluster的一种,它的行存储位置是根据hash列的hash value来的,
当数据插入的时候oracle会运用hash function来计算出一个hash value,然后根据hash value
存放在特定的block.当我们读取一条记录的时候,也运用同样的hash value计算出所在的block,
这样通常只需要一次logical(physical)read就够了.下面来对比一下single table hash cluster
和普通heap table及btree index的性能对比.


创建一张heap table

SQL 10G>create table test_clu as select * from dba_objects;

Table created.

在object_id列上创建index并分析表和索引

SQL 10G>create index ind_object_id on test_clu(object_id);

Index created.

SQL 10G>analyze index ind_object_id compute statistics;

Index analyzed.

SQL 10G>analyze table test_clu compute statistics;

Table analyzed.

查看平均行长,用于指定创建hash cluter时的SIZE

SQL 10G>select AVG_ROW_LEN from user_tables where table_name='TEST_CLU';

AVG_ROW_LEN
-----------
97

查看记录数,用于指定创建hash cluter时的HASHKEYS number

SQL 10G>SELECT COUNT(*) FROM TEST_CLU;

COUNT(*)
----------
51714

创建hash cluster,指定size 为110,比97稍大,hashkeys为52000,比51714稍大

drop table test_hash;
drop cluster clu_a;
CREATE CLUSTER clu_a (object_id number)
size 110
single table HASHKEYS 52000;

create table test_hash
cluster clu_a(object_id)
as
select *
from test_clu;
来看一下在heap table加btree index下执行的性能

select * from test_clu where object_id=20;
Execution Plan
----------------------------------------------------------

----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 87 | 2 (0)|
| 1 | TABLE ACCESS BY INDEX ROWID| TEST_CLU | 1 | 87 | 2 (0)|
|* 2 | INDEX RANGE SCAN | IND_OBJECT_ID | 1 | | 1 (0)|
----------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
1201 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

可以看到上面的语句走了index range scan,一共是4个logical read
再看看hash cluster的效率

SQL 10G>select * from test_hash where object_id=20;
Execution Plan
———————————————————-

——————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
——————————————————————–
| 0 | SELECT STATEMENT | | 10 | 1770 | 0 (0)|
|* 1 | TABLE ACCESS HASH| TEST_HASH | 10 | 1770 | |
——————————————————————–
Statistics
———————————————————-
0 recursive calls
0 db block gets
1 consistent gets
0 physical reads
0 redo size
1197 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

从执行计划可以看出上面的语句走的是TABLE ACCESS HASH,用了一个logical read
,是heap table的4倍.
看看当前的记录保存在哪里,在下图可以看到记录保存在file 12 block 8150 row 32

SQL 10G>select object_id,dbms_rowid.ROWID_RELATIVE_FNO(rowid) file#,
2 dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block#,
3 dbms_rowid.ROWID_ROW_NUMBER(rowid) row# from
4 test_hash where object_id=20;

OBJECT_ID FILE# BLOCK# ROW#
———- ———- ———- ———-
20 12 8150 32
再看看hash cluster出现overflow的时候效率怎么样

SQL 10G>insert into test_hash select * from test_hash where object_id=20;

1 row created.

SQL 10G>commit;

Commit complete.

插入一条记录,从下面可以看到hash cluster发生了overflow,这条新的记录被
保存在file 12 block 8266 row 5

SQL 10G>select object_id,dbms_rowid.ROWID_RELATIVE_FNO(rowid) file#,
2 dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block#,
3 dbms_rowid.ROWID_ROW_NUMBER(rowid) row# from
4 test_hash where object_id=20;

OBJECT_ID FILE# BLOCK# ROW#
———- ———- ———- ———-
20 12 8150 32
20 12 8266 5

dump一下file 12 block 8150

SQL 10G>alter system dump datafile 12 block 8150;
tab 0, row 32, @0×1e6f
tl: 9 fb: K-H-FLPN lb: 0×1 cc: 0 nk: 0×0300204a.2

从dump文件里面可以看到,在block 8150的row 32里面有个link指向了下一个overflow的
block 8266,nk: 0×0300204a.2

SQL 10G>select to_number(’204a’,'xxxxxxx’) from dual;

TO_NUMBER(’204A’,'XXXXXXX’)
—————————
8266

看一下存在overflow时的效率

SQL 10G>alter system flush buffer_cache;

System altered.

设置10046事件跟踪读取了哪些block

SQL 10G>alter session set events’10046 trace name context forever,level 12′;

Session altered.

设置10200事件跟踪一致性读

SQL 10G>alter session set events ‘10200 trace name context forever,level 1′;
Session altered.

SQL 10G>select * from test_hash where object_id=20;
Execution Plan
———————————————————-

——————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
——————————————————————–
| 0 | SELECT STATEMENT | | 10 | 1770 | 0 (0)|
|* 1 | TABLE ACCESS HASH| TEST_HASH | 10 | 1770 | |
——————————————————————–
Statistics
———————————————————-
0 recursive calls
0 db block gets
3 consistent gets
2 physical reads
0 redo size
1234 bytes sent via SQL*Net to client
385 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed

SQL 10G>alter session set events’10046 trace name context off’;

Session altered.

SQL 10G>alter session set events ‘10200 trace name context off’;
Session altered.
Consistent read started for block 4 : 03001fd6
WAIT #4: nam=’db file sequential read’ ela= 37 file#=12 block#=8150 blocks=1 obj#=107023 tim=1147597232487274
Consistent read finished for block 4 : 3001fd6
Consistent read started for block 4 : 03001fd6
Consistent read finished for block 4 : 3001fd6
Consistent read started for block 4 : 0300204a
WAIT #4: nam=’db file sequential read’ ela= 16 file#=12 block#=8266 blocks=1 obj#=107023 tim=1147597232488233
Consistent read finished for block 4 : 300204a

上面的语句依然走的是TABLE ACCESS HASH,发生了2次physical read,3次
logical read.

block 8150发生一次physical read,2次logical read
block 8266发生一次physical read,1次logical read

可以看到如果出现overflow的话hash cluster的性能会有一些损失.

另外需要注意的是hash cluster对于range scan的话无能为力,它会采取full table scan来操作
SQL 10G>select * from test_hash where object_id between 20 and 30
2 ;

24 rows selected.
Execution Plan
———————————————————-

——————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
——————————————————————–
| 0 | SELECT STATEMENT | | 14 | 1302 | 620 (38)|
|* 1 | TABLE ACCESS FULL| TEST_HASH | 14 | 1302 | 620 (38)|
——————————————————————–

Predicate Information (identified by operation id):
—————————————————

1 - filter(”OBJECT_ID”< =30 AND “OBJECT_ID”>=20)

Note
—–
- ‘PLAN_TABLE’ is old version
Statistics
———————————————————-
1 recursive calls
0 db block gets
1763 consistent gets
0 physical reads
0 redo size
2085 bytes sent via SQL*Net to client
395 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
24 rows processed
综上所述,hash cluster比较适用的范围是用于单值扫描,最好使用唯一列做hash key,在设定
合理的size和hashkeys后能获得比较理想的性能.

具体的hash cluster的用法和注意事项请参考oracle document
http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14231/hash.htm
http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14211/data_acc.htm#i7690

5 Responses to “使用single table hash cluster提高检索性能”


  1. 1 anysql

    为什么不是hash cluster的表上再加一个索引试试呢?

  2. 2 Fenng

    重出江湖啦?

    厉害!

  3. 3 blue_prince

    比较适合单值访问较多,并且列长度固定的表。anysql的意思是在HASH KEY上创建一个索引?可以预见这个索引的cluster_factor将会很差。

  4. 4 木匠

    我已经用了1年多了,单行主键检索速度平均提高了50倍.

    FYI,
    CREATE CLUSTER abelisting.book_clus(listingsid NUMBER(20,0))
    SIZE 23 SINGLE TABLE
    HASH IS listingsid
    pctfree 5
    HASHKEYS 7200000
    tablespace abe_data_8m;

    CREATE TABLE abelisting.book_delete_check (
    listingsid NUMBER(20,0), –PRIMARY KEY,
    upd_dd NUMBER(2,0)
    )
    CLUSTER abelisting.book_clus (listingsid);

  5. 5 swg credits

    我得试试看

Leave a Reply




Subscribe

Subscribe to my RSS Feeds