比较direct方式使用并行和非并行选项的性能差异。并行测试创建测试表
create table sqlldr_paral
nologging
partition by hash (sfzh)
(partition p1 tablespace users , partition p2tablespace zdry_tbs2 )
as select *from sfxx2 where 1=2 ;
sqlldr_paral.ctl
load data
infile 'd:\flatfile\flatfile.dat'
append
into table sqlldr_paral
fields terminated by x'23'
trailing nullcols
(zj,
…字段太多省略
cjdwdm)
sqlldr_paral.par
userid=zdrygk/zdrygk
control='d:\flatfile\sqlldr_paral.ctl'
data='d:\flatfile\flatfile.dat'
log='d:\flatfile\sqlldr_paral.log'
direct=true
parallel=true
errors=10000
执行
d:\flatfile>sqlldrparfile=d:\flatfile\sqlldr_paral.par
sql*loader: release 11.2.0.1.0 - production onmon jan 7 23:21:42 2013
copyright (c) 1982, 2009, oracle and/or itsaffiliates. all rights reserved.
load completed - logical record count 3022375.
sqlldr_paral.log
control file: d:\flatfile\sqlldr_paral.ctl
data file: d:\flatfile\flatfile.dat
badfile: d:\flatfile\flatfile.bad
discardfile: none specified
(allowall discards)
number to load: all
number to skip: 0
errors allowed: 10000
continuation: none specified
path used: direct- with parallel option. -----这里指定了并行选项
table sqlldr_paral, loaded from every logicalrecord.
insert option in effect for this table: append
trailing nullcols option in effect
table sqlldr_paral:
3017264rows successfully loaded.
5111rows not loaded due to data errors.
0 rows not loaded because all when clauseswere failed.
0 rowsnot loaded because all fields were null.
dateconversion cache disabled due to overflow (default size: 1000)
partition p1: 1509091 rows loaded.
partition p2: 1508173 rows loaded.
bind array size not used in direct path.
column array rows : 5000
stream buffer bytes: 256000
read buffer bytes: 1048576
total logical records skipped: 0
total logical records read: 3022375
total logical records rejected: 5111
total logical records discarded: 0
total stream buffers loaded by sql*loader mainthread: 1301
total stream buffers loaded by sql*loader loadthread: 3901
run began on mon jan 07 23:21:42 2013
run ended on mon jan 07 23:22:18 2013
elapsed time was: 00:00:35.91
cpu time was: 00:00:25.79
非并行测试创建测试表
create table sqlldr_nopal
nologging
partition by hash (sfzh)
(partition p1 tablespace users , partition p2tablespace zdry_tbs2 )
as select *from sfxx2 where 1=2 ;
sqlldr_nopar.ctl
load data
infile 'd:\flatfile\flatfile.dat'
append
into tablesqlldr_nopar
fields terminated by x'23'
trailing nullcols
(zj,
…字段太多省略
cjdwdm)
sqlldr_nopar.par
这里删除了并行选项
userid=zdrygk/zdrygk
control='d:\flatfile\sqlldr_nopar.ctl'
data='d:\flatfile\flatfile.dat'
log='d:\flatfile\sqlldr_nopar.log'
direct=true
errors=10000
执行
d:\flatfile>sqlldrparfile=d:\flatfile\sqlldr_nopar.par
sql*loader: release 11.2.0.1.0 - production onmon jan 7 23:20:54 2013
copyright (c) 1982, 2009, oracle and/or itsaffiliates. all rights reserved.
load completed - logical record count 3022375.
sqlldr_nopar.log
control file: d:\flatfile\sqlldr_nopar.ctl
data file: d:\flatfile\flatfile.dat
badfile: d:\flatfile\flatfile.bad
discardfile: none specified
(allowall discards)
number to load: all
number to skip: 0
errors allowed: 10000
continuation: none specified
path used: direct ----这里只是指定了直接路径加载
table sqlldr_nopar, loaded from every logicalrecord.
insert option in effect for this table: append
trailing nullcols option in effect
table sqlldr_nopar:
3017264rows successfully loaded.
5111rows not loaded due to data errors.
0 rows not loaded because all when clauseswere failed.
0 rowsnot loaded because all fields were null.
dateconversion cache disabled due to overflow (default size: 1000)
partition p1: 1509091 rows loaded.
partition p2: 1508173 rows loaded.
bind array size not used in direct path.
column array rows : 5000
stream buffer bytes: 256000
read buffer bytes: 1048576
total logical records skipped: 0
total logical records read: 3022375
total logical records rejected: 5111
total logical records discarded: 0
total stream buffers loaded by sql*loader mainthread: 1301
total stream buffers loaded by sql*loader loadthread: 3901
run began on mon jan 07 23:20:54 2013
run ended on mon jan 07 23:21:28 2013
elapsed time was: 00:00:33.81
cpu time was: 00:00:25.32
分析 load mode
elapsed time(ss.99)
direct
33.81
direct + parallel
35.91
笔者做了多次试验,除了第一次消耗时间较长(因为表空间自动拓展),后面的实验中两种模式的用时大致相同。自己猜测在使用direct 直接路径加载的情况下,多线程并不能提高效率。
在util中找到关于sql*loader command line的描述
parallel(parallel load)
default: false
parallelspecifies whether direct loads canoperate in multiple concurrent sessions
to load data into the same table.
parallel这个参数用来设定使用direct loads的时候是否使用并发的session去加载数据到相同的表中。
于是测试了多文件,多session加载数据的情况。
d:\flatfile>sqluldr2 user=zdrygk/zdrygk@orclquery=sfxx2 degree=8 file=d:\flatfile\ldrfiles%t.dat field=0x23 size=300mb
0rows exported at 2013-01-08 00:07:26, size 0 mb.
output file d:\flatfile\ldrfiles1357574846.dat closed at 707972 rows,size 304 mb.
output file d:\flatfile\ldrfiles1357574861.dat closed at 697498 rows,size 604 mb.
output file d:\flatfile\ldrfiles1357574876.dat closed at 701379 rows,size 904 mb.
output file d:\flatfile\ldrfiles1357574891.dat closed at 692556 rows,size 1204 mb.
222868rows exported at 2013-01-08 00:08:31, size 1300 mb.
output file d:\flatfile\ldrfiles1357574906.dat closed at 222868 rows,size 1300 mb.
一共5个文件。
多文件并行测试如果指定了parallel选项,sqlldr运行的时候会对表加4级锁,允许多个sqlldr session同时对同一个表执行加载作用。在我们执行了truncate 操作后对表进行并行加载操作。
parallelpar1文件
userid=zdrygk/zdrygk
control='d:\flatfile\multifile\paral\sqlldr_paral_1.ctl'
data='d:\flatfile\multifile\ldrfiles1357574846.dat'
log='d:\flatfile\multifile\paral\sqlldr_paral_1.log'
direct=true
parallel=true
errors=10000
parallel par2、3、4、5的内容大致相同。
在windows下只能打开多个窗口,通过复制粘贴的方法保证并行了。
执行
sqlldrparfile=d:\flatfile\multifile\paral\sqlldr_paral_1.par
sqlldrparfile=d:\flatfile\multifile\paral\sqlldr_paral_2.par
sqlldrparfile=d:\flatfile\multifile\paral\sqlldr_paral_3.par
sqlldr parfile=d:\flatfile\multifile\paral\sqlldr_paral_4.par
sqlldrparfile=d:\flatfile\multifile\paral\sqlldr_paral_5.par
最后得到了5个log文件关键内容:
log1
run began on tue jan 08 01:51:11 2013
run ended on tue jan 08 01:51:21 2013
elapsed time was: 00:00:09.42
cpu time was: 00:00:06.43
…
中间的几个log文件省略
log5
run began on tue jan 08 01:51:34 2013
run ended on tue jan 08 01:51:38 2013
elapsed time was: 00:00:03.27
cpu time was: 00:00:02.03
因为最后一个sqlldr session 是最后一个执行结束的,所以需要计算并行sqlldr的执行消耗时间,只需要使用log5中的结束时间减去log1中的开始时间即可。
01:51:38-01:51:11=22秒
因为不是使用程序执行这几个sqlldr程序,复制、粘贴、执行之间会有空隙。真实用时应该比22秒这个数字要小的多。
多文件非并行测试因为如果没有指定parallel选项,sqlldr运行的时候会对表加6级锁,其他sqlldr session会无法工作,所以只能串行的执行sqlldr操作。在我们执行了truncate操作之后对表进行串行加载操作。
noparallelpar1文件
userid=zdrygk/zdrygk
control='d:\flatfile\multifile\nopar\sqlldr_nopar_1.ctl'
data='d:\flatfile\multifile\ldrfiles1357574846.dat'
log='d:\flatfile\multifile\nopar\sqlldr_nopar_1.log'
direct=true
errors=10000
noparallel par2、3、4、5的内容大致相同。
执行
sqlldrparfile=d:\flatfile\multifile\nopar\sqlldr_nopar_1.par
sqlldr parfile=d:\flatfile\multifile\nopar\sqlldr_nopar_2.par
sqlldrparfile=d:\flatfile\multifile\nopar\sqlldr_nopar_3.par
sqlldrparfile=d:\flatfile\multifile\nopar\sqlldr_nopar_4.par
sqlldrparfile=d:\flatfile\multifile\nopar\sqlldr_nopar_5.par
最后得到了5个log文件关键内容:
log1
run began on tue jan 08 01:47:35 2013
run ended on tue jan 08 01:47:48 2013
elapsed time was: 00:00:12.61
cpu time was: 00:00:06.92
…
中间的几个log文件省略
log5
run began on tue jan 08 01:48:26 2013
run ended on tue jan 08 01:48:30 2013
elapsed time was: 00:00:03.68
cpu time was: 00:00:02.01
最后执行时间相加
12.61+ 12.54+ 12.09+ 11.40+ 03.68=52.32秒
总结 load mode
elapsed time(ss.99)
direct
52.32
direct + parallel
22
sqlldr是cpu密集型的操作,并且多线程,只有在使用多个数据文件,并且开启多个sqlldr session的时候才会有效果。并且效果非常明显。在消耗更多cpu和内存的同时能够给予非常高的效率,在真实情况下与单文件加载相比效率相差有1倍以上。
