Sql loader的使用分享

sql loader的使用分享

1. sqlldr概述

sqlldr 为一种大量数据加载工作，在cmd命令行模式下实现，调用语句如下：

sqlldr userid/password @servicename control=ctlfilename。

满足前提是存储数据的oracle表是已经存在的。

2. ctl文件内容

options ( {[skip=integer] [ load = integer ] [errors = integer] [rows=integer][bindsize=integer] [silent=(all|feedback|error|discard) ] } )

load[data]

[

{ infile | inddn } {file | * }

[stream | record | fixed length [blocksizesize]|variable [length] ]

[ { badfile | baddn } file ]

{discards | discardmax} integr

]

[ {inddn |infile} . . . ]

[ append |replace | insert ]

[reclentinteger]

[ {concatenate integer | continueif { [this | next] (start[: end])last } operator{ ‘string’ | x ‘hex’ } } ]

into table[user.]table

[append |replace | insert]

[whencondition [and condition]…]

[fields[delimiter] ]

(

column {

recnum | constant value | sequence

( { integer | max |count} [, increment] ) |[position ( { start [end] | * [ + integer] }) ]

datatype

[terminated [ by ] {whitespace| [x]’character’ } ]

[ [optionally] enclose[by] [x]’charcter’]

[nullif condition ]

[defaultif condotion]

}

[ ,…]

)

[intotable…]

[begindata]

line1：

option({[skip==integer] [ load = integer] [ errors = integer] [ rows = integer][ bindsize = integer][silent = (all|feedback|error|discard)]})

其中skip，skip = 1 表示用来跳过数据中的第一行；load = 200000表示不导入所有的数据，只导入跳过skip参数后的200000条数据；errors = 100 表示出错100次后，停止加载；rows=1000表示一次加载的行数，默认值为64；bindsize=33554421，表示每次提交记录缓冲区的大小，默认为256k。

line2:

load data

line3:

infile [*]‘datapath’

[ { badfile | baddn } file ]

{discards | discardmax} integer ]

其中*是在ctl文件中没有包含数据文件时使用，若有数据文件的时候最好使用绝对路径并且需要带上单引号,badfile内为抛出的坏文件名，integer为文件的记录大小，discards为被抛弃的文件名。

line4：

insert/append/truncate/replace

insert为默认插入数据的方式，只有在表为空表的时候才可进行数据导入

append 为在表中数据的末端将数据导入

replace：（用 delete from table 语句），替换成新装载的记录。是数据操作语句（dml），这个操作会放到 rollbacksegement 中，事务提交之后才生效；如果有相应的 trigger，执行的时候将被触发。

truncate：删除旧记录(用 truncate table 语句)，替换成新装载的记录是定义语言(ddl)，操作立即生效，原数据不放到 rollbacksegment 中，不能回滚，操作不触发 trigger。。

line5：

into table table_name

表示将数据插入某一个表中

line6：

field terminatedby”,”optionally enclosed by “”

表示数据以，（逗号）划分，以“”（空格）换行

line7：

when condition

比如when id = id _ demo为提取id 为id_demo的过滤条件

line8：

(

（1）virtual column filter 表示过率没有名字的第一行

filler：控制文件中指定 filler，表示该列值不导入表中

（2）position(m:n)：指从第 m 个字符开始截止到第 n 个字符作为列值

　 position(*+2:15)：直接指定数值的方式叫做绝对偏移量，如果使用*号，则为相对偏移量，表示上一个字段哪里结束，这次就哪里开始，相对便宜量也可以再做运算。

　 position(*) char(9)：这种相对偏移量+类型和长度的优势在于，你只需要为第一列指定开始位置，其他列只需要指定列长度就可以。

（3）若导入的列比表中的列要少，需要在末尾加上comm”0”

（4）column + date ‘yyyy-mm-dd’表示将数据转换为日期类型

（5）

)

line9:

若要导入不同表，即在表的末尾加上into table anothertablename，再加上限制条件以及column相关的内容

line10：

begindate

+输入的数据

3. sql*loader 的性能与并发操作

（1）、rows 的默认值为 64，你可以根据实际指定更合适的 rows 参数来指定每次提交记录数。

（2）、常规导入可以通过使用 insert语句来导入数据。direct导入可以跳过数据库的相关逻辑(direct=true)，而直接将数据导入到数据文件中，可以提高导入数据的性能。当然，在很多情况下，不能使用此参数(如果主键重复的话会使索引的状态变成unusable!)。

（3）、通过指定 unrecoverable选项，可以关闭数据库的日志(是否要 alter table table_namenologging?)。这个选项只能和 direct 一起使用。

（4）、对于超大数据文件的导入就要用并发操作了，即同时运行多个导入任务.

sqlldr userid=/ control=result1.ctl direct=true parallel=true

sqlldr userid=/ control=result2.ctl direct=true parallel=true

当加载大量数据时（大约超过10gb），最好抑制日志的产生：

alter tabletable_name nologging;

这样不产生redo log，可以提高效率。然后在 control文件中load data上面加一行unrecoverable，此选项必须要与direct共同应用。

在并发操作时，oracle声称可以达到每小时处理100gb数据的能力！其实，估计能到 1－10g 就算不错了，开始可用结构相同的文件，但只有少量数据，成功后开始加载大量数据，这样可以避免时间的浪费。

4. 实例

load data

infile ‘c:\users\shen_potato\desktop\emps.csv’

into table emp10

fields terminated by “,”

(

virtual_column filler,

empno “seq_eseq.nextval”,

ename,

job,

mgr,

hiredate date ‘yyyy-mm-dd’,

sal,

comm,

deptno

)

Sql loader的使用分享

相关推荐