我们的一个ETL应用程序遇到了一个奇怪的问题.实际上,该过程打开游标以从一个DB中提取数据,执行一些转换,然后使用批量插入插入另一个DB.对于ETL中的所有表,我们的提交间隔设置为1000行.因此,在读取每行1k行并执行转换...
我们的一个ETL应用程序遇到了一个奇怪的问题.实际上,该过程打开游标以从一个DB中提取数据,执行一些转换,然后使用批量插入插入另一个DB.
对于ETL中的所有表,我们的提交间隔设置为1000行.因此,在读取每行1k行并执行转换后,我们对目标数据库执行单个批量插入(使用Java,Spring Batch,OJDBC7 v12.1.0.2).
但是,有些表格非常慢.我们首先确定FK已关闭(他们是).然后我们检查以确保触发器被禁用(它们是).我们添加了日志记录以获取每个批处理插入中的行(除了每个线程的最终插入之外,它是1000).
最后,查询v $sql,似乎对于某些表,我们看到接近1000行/执行,这是我们所期望的.然而,对于痛苦的桌子,它经常徘徊在一个!我们期望大多数表都有点在900以上,因为线程的最终提交可能没有完整的1k行,但是在某些表上每次执行的极低行是一个真正令人头疼的问题.
一些宽表(100列)存在问题,但其他表很好.一些高度分区(100个分区)的表很慢,但其他表很好.所以我很困惑.谁看过这个吗?我的想法已经用完了!
谢谢!
这是我们在v $sql中看到的内容(表名混淆):
SELECT *
FROM
(SELECT REGEXP_SUBSTR (sql_text, 'Insert into [^\(]*') sql_text,
sql_id,
TRUNC(
CASE
WHEN SUM (executions) > 0
THEN SUM (rows_processed) / SUM (executions)
END,2) rows_per_execution
FROM v$sql
WHERE parsing_schema_name = 'PFT010_RAPP_PRO'
AND sql_text LIKE 'Insert into%'
GROUP BY sql_text,
sql_id
)
ORDER BY rows_per_execution ASC;
SQL_STATEMENT SQL_ID ROWS_PER_EXECUTION
---------------------------------------------------------------------------------
Insert into C__PFT010.S_T___V_R_L_A_ agwu1dd1wr2ux 1.04
Insert into C__PFT010.S_T___G_L_A___T_ 7ymw7jtdd9g53 1.25
Insert into C__PFT010.S_T___F_L_A_ 7cynt9fmtpz83 1.44
Insert into C__PFT010.S_T___Q_L_A___A_ 27v3fuj028cy6 1.57
Insert into C__PFT010.S_T___E_R_P_Y_A_P_S_A_ 2t544j11a286z 1.80
Insert into C__PFT010.S_T___I_S_R_ anu8aac070sut 1.84
Insert into C__PFT010.S_T___R_C_R___T_T_ 0ydz33s6guvcn 2.05
Insert into C__PFT010.S_T_R___D_R_P_Y_A_P_ 7y76r10dmzqvh 2.14
Insert into C__PFT010.S_T___S_L_A___Y_T_S_S_ d7136fg9w033w 2.25
Insert into C__PFT010.S_T___R_C_R___T_T_ 2pswt3cmp48s4 2.31
Insert into C__PFT010.S_T___F_R_P_Y_A_P_S_P_ 170c7v23yyrms 2.46
Insert into C__PFT010.C_M_N_C___R_S_ fw3wbt4p08kx4 2.66
Insert into C__PFT010.T_A_H_N_T___E_A_Y_ dk5rwm58qqy8b 2.68
Insert into C__PFT010.O_G_L_A___N_O_ gtd4azc32gku4 3.05
Insert into C__PFT010.N_L_S_D___I_B_S_G_ a1a01vthwf2yk 3.15
Insert into C__PFT010.S_T___Q_L_A___A_ 7ac6dqwb1jfyh 3.56
Insert into C__PFT010.S_T___J_P_M___A_A_ 8n5z68bgkuan1 3.88
Insert into C__PFT010.S_T_R___F_R_P_Y_A_P_S_P 1r62s9qgjucy8 4.25
Insert into C__PFT010.L_A___W_E_S_I_ 19rxcmgvct74c 4.28
Insert into C__PFT010.C___U___T_D_T_P_ fdzfdbpdzd18c 4.40
Insert into C__PFT010.S_T_R___U_T_A_S_E_ gs6z5szk9x1n2 4.61
Insert into C__PFT010.S_T_R___H_S_B_I_Y_L_S_ 0zsz69pa3ahga 6.58
Insert into C__PFT010.C___F___U_R_P_T_ 13xgutdszxab1 8.00
Insert into C__PFT010.S_T_R___J_P_M___A_A_ 355gqx1sspdr0 20.19
Insert into C__PFT010.C___D___O___V_ 4dmu2bqrra0fg 22.40
Insert into C__PFT010.S_T_R___Q_L_A___A_ dsx0nsrxkz5cf 36.14
Insert into C__PFT010.S_T___V_R_L_A___E_R_ 2urs0mbjn3nm2 126.96
Insert into C__PFT010.S_S_C_S___E_A_L_S_G_ awq4fzkk3rsww 179.48
Insert into C__PFT010.S_S_D_S___C_I_I_Y_S_G_ 7hpw0kv2z5nsh 417.87
Insert into C__PFT010.S_T_R___D_P_S___M_I_ cjgdmgfznapdk 502.36
Insert into C__PFT010.C___F___E_ 6hv4smzmm4hx8 531.00
Insert into C__PFT010.N_L_S_E_R___R_ 61zu9j25kgn2u 533.50
Insert into C__PFT010.S_T___B_P_S___A_T_R_ 31xpaj7afk054 714.94
Insert into C__PFT010.S_T_R___C_L_A___O_G_V_ dx4mna12hdh9c 749.66
Insert into C__PFT010.S_T___C_P_S___D_R_S_ b7z4y1mruk714 784.56
Insert into C__PFT010.S_T___S_L_A___Y_T_S_S_ 29qbqkzhmt83h 792.63
Insert into C__PFT010.A_H_C_R_T_ c6kmyt3a410ch 801.67
Insert into C__PFT010.S_T___X_P_S___H_N_ g6cbtus4bccm8 826.19
Insert into C__PFT010.S_T___K_R_B_T___T_T_ 0xps4ddmw322h 873.36
Insert into C__PFT010.C___O___C_L___M_ fz91ju8jw22yc 928.90
Insert into C__PFT010.S_T___H_L_A___T_T_ 44rh8722j51fm 982.16
Insert into C__PFT010.C___C_L_S_C_R_T_ 4vpnstj8qxy80 991.75
Insert into C__PFT010.S_T___P_L_A___E_U_D_ fgunfbpddf2af 994.50
Insert into C__PFT010.S_T___A_S___I___O_S_ 0d0x5ymp2y248 996.09
Insert into C__PFT010.S_T___K_R_B_T___T_T_ 61rmgzvqrbudh 999.25
Insert into C__PFT010.S_T___D_P_S___M_I_ bu3hc03yugc8h 999.88
Insert into C__PFT010.L___R_E___E_L_R_C_P_2_00 bvrxzq2v3npc6 999.91
Insert into C__PFT010.N_L_S_G_A_T_S_N___R_C 7sj2ydm7m2z6u 999.96
Insert into C__PFT010.S_T___V_R_L_A___E_E_E 8n6nbsjfpvu70 999.98
Insert into C__PFT010.S_T___L_I_T_B_N_F_T 5b89j9um2jkuu 999.98
Insert into C__PFT010.S_T___D_P_S___M_I_ 906jnw4jarsxk 999.98
Insert into C__PFT010.S_T___T_E_R_M_T 9a8vnhnbp5jpn 1000.00
更新:此时数据有点陈旧(所有快速线程都已完成),但这里有一些带有SQL ID,执行计数和行/执行的计数.所有这些表都有(或将有)数千万行
SQL ID Executions Rows/Execution
agwu1dd1wr2ux 118043 1.04
anu8aac070sut 194768 1.84
dr8qxkcx1xybj 11635084 1.85
a37vqfjqcyd3j 4939754 2.36
8n5z68bgkuan1 2642091 3.95
4sps6y4bkkr6p 268739 13.77
5tdhpn96vpz6d 240227 166.85
其他SQL跟踪数据……:
这是一个表格的插入,工作得很好
PARSING IN CURSOR #139935830739792 len=315 dep=0 uid=845 oct=2 lid=845 tim=2116001679604 hv=581690290 ad='c168de130' sqlid='906jnw4jarsxk'
Insert into ___PFT010.S_T__A__P_S__E_A_L
(A_A_D_ID, CREATE_DTM, DOC_TXN_ID, EFF_DT, EFF_END_DT, EFF_START_DT, EXTRACT_DT, G_O__A_D__F_G, MAINT_DTM, MAINT_USERID, P_R_O__E_A_L, P_R_O__R_L_, R_C_RD_T_P, S__ID, S_R_I_E__ID )
values (:1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 )
END OF STMT
PARSE #139935830739792:c=0,e=25,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=2116001679603
WAIT #139935830739792: nam='SQL*Net more data from client' ela= 72 driver id=675562835 #bytes=3 p3=0 obj#=-1 tim=2116001679871
WAIT #139935830739792: nam='db file sequential read' ela= 551 file#=99 block#=78343664 blocks=1 obj#=1255124 tim=2116001680643
* * * * * * * * * * * * * * * * * *
* * * a bunch more of these
* * * * * * * * * * * * * * * * * *
WAIT #139935830739792: nam='db file sequential read' ela= 750 file#=99 block#=66416561 blocks=1 obj#=1255124 tim=2116001788121
WAIT #139935830739792: nam='db file sequential read' ela= 176 file#=99 block#=45513746 blocks=1 obj#=1255124 tim=2116001787117
WAIT #139935830739792: nam='db file sequential read' ela= 750 file#=99 block#=66416561 blocks=1 obj#=1255124 tim=2116001788121
* * * * * * * * * * * * * * * * * *
* * * r=1000, indicating 1000 rows were written
* * * * * * * * * * * * * * * * * *
EXEC #139935830739792:c=57991,e=109295,p=131,cr=69,cu=3313,mis=0,r=1000,dep=0,og=1,plh=0,tim=2116001788944
STAT #139935830739792 id=1 cnt=0 pid=0 pos=1 obj=0 op='LOAD TABLE CONVENTIONAL SAT1_AD_PRSN_EMAIL (cr=69 pr=131 pw=0 time=109260 us)'
XCTEND rlbk=0, rd_only=0, tim=2116001789025
CLOSE #139935830739792:c=0,e=12,dep=0,type=1,tim=2116016169474
这是令人讨厌的一个.这次,它只在执行中获得1行
PARSING IN CURSOR #139935830737584 len=520 dep=0 uid=845 oct=2 lid=845 tim=2116016176184 hv=1904916192 ad='97e96dc98' sqlid='355gqx1sspdr0'
Insert into ___PFT010.S_TE_R_BJ_P_M__D_T_
(A_A_D_ID, CREATE_DTM, DOC_TXN_ID, EFF_END_DT, EFF_START_DT, ERR_CD, ERR_FIELD, EXTRACT_DT, MAINT_USERID, P_M__A_R_I_T_A_T, P_M__A_T, P_M__C_P_I_T_A_T, P_M__C_T_H_P_AMT, P_M__E_F_DT, P_M__N_G_A_R__A_T, P_M__N_N_C_P_I_T_A_T, P_M__O_T_F_E_A_T, P_M__P_I_B_L_A_T, P_M__T_P, R_C_RD_T_P, S__ID, S_R_I_E__ID, T_A_S_I__D_, Z_R__P_M__I_D )
values (:1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 , :16 , :17 , :18 , :19 , :20 , :21 , :22 , :23 , :24 )
END OF STMT
PARSE #139935830737584:c=0,e=62,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=2116016176183
PARSE #139935830738688:c=0,e=14,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=140787661,tim=2116016176703
EXEC #139935830738688:c=0,e=49,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=140787661,tim=2116016176780
FETCH #139935830738688:c=0,e=38,p=0,cr=3,cu=0,mis=0,r=0,dep=1,og=4,plh=140787661,tim=2116016176837
CLOSE #139935830738688:c=0,e=4,dep=1,type=3,tim=2116016176862
* * * * * * * * * * * * * * * * * *
* * * r=1, indicating only 1 row affected by execution
* * * * * * * * * * * * * * * * * *
EXEC #139935830737584:c=999,e=1065,p=0,cr=4,cu=5,mis=1,r=1,dep=0,og=1,plh=0,tim=2116016177301
STAT #139935830737584 id=1 cnt=0 pid=0 pos=1 obj=0 op='LOAD TABLE CONVENTIONAL SATERR_BJ_PYMT_DATA (cr=1 pr=0 pw=0 time=50 us)'
XCTEND rlbk=0, rd_only=0, tim=2116016177362
WAIT #139935830737584: nam='log file sync' ela= 396 buffer#=92400 sync scn=2454467328 p3=0 obj#=-1 tim=2116016177846
WAIT #139935830737584: nam='SQL*Net message to client' ela= 0 driver id=675562835 #bytes=1 p3=0 obj#=-1 tim=2116016177877
WAIT #139935830737584: nam='SQL*Net message from client' ela= 1045 driver id=675562835 #bytes=1 p3=0 obj#=-1 tim=2116016178938
CLOSE #139935830737584:c=0,e=4,dep=0,type=0,tim=2116016178981
这是相同的表,有34行而不是1.它不一致的事实是让我烦恼的是什么
PARSING IN CURSOR #139935830737584 len=520 dep=0 uid=845 oct=2 lid=845 tim=2116016169849 hv=1904916192 ad='97e96dc98' sqlid='355gqx1sspdr0'
Insert into ___PFT010.S_TE_R_BJ_P_M__D_T_
(A_A_D_ID, CREATE_DTM, DOC_TXN_ID, EFF_END_DT, EFF_START_DT, ERR_CD, ERR_FIELD, EXTRACT_DT, MAINT_USERID, P_M__A_R_I_T_A_T, P_M__A_T, P_M__C_P_I_T_A_T, P_M__C_T_H_P_AMT, P_M__E_F_DT, P_M__N_G_A_R__A_T, P_M__N_N_C_P_I_T_A_T, P_M__O_T_F_E_A_T, P_M__P_I_B_L_A_T, P_M__T_P, R_C_RD_T_P, S__ID, S_R_I_E__ID, T_A_S_I__D_, Z_R__P_M__I_D )
values (:1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 , :16 , :17 , :18 , :19 , :20 , :21 , :22 , :23 , :24 )
END OF STMT
PARSE #139935830737584:c=0,e=326,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=2116016169848
PARSE #139935830738688:c=0,e=19,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=140787661,tim=2116016170242
EXEC #139935830738688:c=0,e=59,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=140787661,tim=2116016170329
FETCH #139935830738688:c=0,e=44,p=0,cr=3,cu=0,mis=0,r=0,dep=1,og=4,plh=140787661,tim=2116016170393
CLOSE #139935830738688:c=0,e=3,dep=1,type=3,tim=2116016170421
* * * * * * * * * * * * * * * * * *
* * * r=34, indicating only 34 row affected by execution. WHAT IS HAPPENING?!?!
* * * * * * * * * * * * * * * * * *
EXEC #139935830737584:c=5000,e=4592,p=0,cr=11,cu=48,mis=1,r=34,dep=0,og=1,plh=0,tim=2116016174513
STAT #139935830737584 id=1 cnt=0 pid=0 pos=1 obj=0 op='LOAD TABLE CONVENTIONAL SATERR_BJ_PYMT_DATA (cr=8 pr=0 pw=0 time=3648 us)'
XCTEND rlbk=0, rd_only=0, tim=2116016174622
WAIT #139935830737584: nam='log file sync' ela= 684 buffer#=92313 sync scn=2454467326 p3=0 obj#=-1 tim=2116016175452
WAIT #139935830737584: nam='SQL*Net message to client' ela= 1 driver id=675562835 #bytes=1 p3=0 obj#=-1 tim=2116016175551
WAIT #139935830737584: nam='SQL*Net message from client' ela= 481 driver id=675562835 #bytes=1 p3=0 obj#=-1 tim=2116016176058
CLOSE #139935830737584:c=0,e=6,dep=0,type=0,tim=2116016176107
解决方法:
好的,这是一个有趣的,不幸的是这个答案只解决了99%的问题……
首先,我们通过查看绑定变量确定我们绑定的参数类型正在翻转,每次发生时,我们都会执行先前的语句并解析一个新的语句(尽管我们只发出一个executeBatch()命令) PreparedStatement的).所以,我们最终在跟踪日志中看到了这一点:
Row # Bind :1 Bind :2 Bind :3 Bind :4 Bind :5
----- -------------- -------------- -------------- -------------- --------------
--parse--
1 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
2 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
3 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
4 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
--execute & parse--
5 VARCHAR2(128) VARCHAR2(32) TIMESTAMP CLOB VARCHAR2(2000)
--execute & parse--
6 VARCHAR2(128) NUMBER TIMESTAMP VARCHAR2(32) VARCHAR2(2000)
--execute & parse--
7 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
8 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
9 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
10 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
--execute & parse--
11 VARCHAR2(2000) NUMBER VARCHAR2(32) CLOB VARCHAR2(2000)
--execute & parse--
12 VARCHAR2(2000) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
13 VARCHAR2(2000) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
--execute--
通过更多挖掘,我们确定JDBC无法以非空值的方式自动确定空对象的数据类型.当列是一致的(总是为null或总是填充)时,这不是问题,但是当数据存在可变性时,它是残酷的.
由于我们是从文件加载的,因此我们没有源数据类型,但幸运的是我们DID能够获取目标数据类型(应该匹配),因此我们能够指定在我们设置每个参数时PreparedStatement的.
这一变化带来了一些重大改进,但我们仍然看到以下情况:
Row # Bind :1 Bind :2 Bind :3 Bind :4 Bind :5
----- -------------- -------------- -------------- -------------- --------------
--parse--
1 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
2 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
3 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
4 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
5 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
--execute & parse--
6 VARCHAR2(128) NUMBER TIMESTAMP VARCHAR2(32) VARCHAR2(2000)
--execute & parse--
7 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
8 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
9 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
10 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
--execute & parse--
11 VARCHAR2(2000) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
12 VARCHAR2(2000) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
13 VARCHAR2(2000) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
--execute--
肯定是一个改进,但我们没有修复CLOB,我们看到有时扩展VARCHAR2的大小.经过一番研究,我们偶然发现了this thread about High Version Count due to bind_mismatch,听起来很有希望.我们的数据很好且一致的表格没有问题,但电子邮件地址等不同长度的字段会对性能造成严重破坏.所以我们运行以下命令强制VARCHAR2绑定到4000的大小:
ALTER SYSTEM SET EVENTS '10503 trace name context forever, level 2001';
之后,我们再次尝试并获得以下内容:
Row # Bind :1 Bind :2 Bind :3 Bind :4 Bind :5
----- -------------- -------------- -------------- -------------- --------------
--parse--
1 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
2 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
3 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
4 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
5 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
--execute & parse--
6 VARCHAR2(40000) NUMBER TIMESTAMP VARCHAR2(32) VARCHAR2(4000)
--execute & parse--
7 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
8 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
9 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
10 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
11 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
12 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
13 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
--execute--
我们现在几乎是完美的,但是当我们得到一个空的CLOB时,我们无法弄清楚如何阻止JDBC绑定VARCHAR2.幸运的是,我们只有几个具有可空CLOB列的表,因此我们大大提高了性能并减少了更改绑定的影响.但是我肯定有一部分想要获得最后1%……任何建议?
本文标题为:java – Oracle – DB似乎打破了JDBC批量插入
基础教程推荐
- Java如何接收JSON数据 2023-10-08
- Java阻塞队列BlockingQueue详解 2023-03-22
- Mybatis-Plus实现公共字段自动赋值的方法 2023-02-11
- 基于Java实现的大乐透号码生成器工具类 2023-04-06
- Spring Boot整合阿里开源中间件Canal实现数据增量同步 2023-01-29
- Java聊天室之实现聊天室服务端功能 2023-06-30
- Java面试突击为什么要用HTTPS及它的优点 2023-03-07
- Java类型转换valueOf与parseInt区别探讨解析 2023-06-01
- Spring 循环依赖之AOP实现详情 2023-02-19
- JavaSwing实现小型学生管理系统 2022-11-01