Database Tech Blog: Writing Efficient SQL Code for Oracle

Introduction

This article aims to provide you with information and tips that will help you write efficient SQL code for Oracle databases. Part 1 covers batch environment.

Why is writing efficient SQL code important?
If you would like your application to scale well, then you would need to have SQL code that executes efficiently.

How do you define efficient SQL code?
Any SQL code that does what it does, using minimum resources and in minimum possible time.

In part 1 of this article I cover the following areas where one can achieve big efficiency gains by writing efficient sql for batch environment.

Huge Inserts
Huge Updates
Huge Deletes
Efficient elimination of duplicates
Efficient verification of foreign keys

Oracle does the following when you issue a DML
After the normal syntax, semantics and security checking etc...
it would...
1. Insert/modify the data into database buffers
- This is the actual data that would subsequently be written to data files on the disks
2. Create redo log records
- These records are written to redo log files
- These are used to recover from an instance failure or crash
3. Create undo records (Rollback info)
- These records are written to rollback segments
- These records are used in case you decide to rollback

So, as you can see each DML, results in a flurry of activity which means batch jobs that deal with huge amounts of data would have unacceptable performance unless special measures are taken as described next.

Handling huge inserts
Normal insert
SQL> insert
2 into abc
3 select * from xyz ;

978816 rows created.

Elapsed: 00:08:18.08

Statistics
-------------------------------------------------
106586580 redo size

Obviously the above insert is not very efficient. It took about 8+ minutes to insert just under a million rows.

Now let us look at a turbo charged version of insert command called Direct Load Insert

SQL> insert /*+ append */
2 into abc
3 select * from xyz
4 nologging ;

978816 rows created.

Elapsed: 00:00:12.30

Statistics
----------------------------------------------------------
241716 redo size

“Direct load insert” works differently than a Normal insert. It completely by passes the 3 steps (mentioned previously) that occur for a normal DML and directly writes to the data files of the oracle database.

You should use “Direct load insert” whenever you do huge inserts.

Handling large Updates

Say you have a 30 million row table and your update affects 5 Millions rows - How would you do it ?
The normal approach would be to execute an update statement similar to the one shown in the example below

update aBigTable
set column1 = column1 + column1 * 0.10
where column3 = 'Y'
;

What do you think will happen if you executed the previous update?

Best case scenario
1. Your update finishes after running zillion hours without actually running out of rollback segment space
2. Because your update brought everything else to a grinding halt - Help desk gets inundated by calls from other users of the database wanting to know if the database has left the building!!!.

Worst case scenario
1. Your update takes zillion hours and then fails after running out of rollback segment space
2. Because your update brought everything else to a grinding halt - Help desk gets inundated by calls from other users of the database .
3. Your access privileges to the database may get revoked
As you can see even the best case scenario is not particularly good…
let us look at a better way of doing large updates
--
-- Create an empty table
--
create table aBigTable_temp
nologging
as select * from aBigTable where rownum < 1 ;
insert /*+ append */
into aBigTable_temp (column1,column2,column3)
select decode (column3, 'Y', (column1 + column1 * 0.10), column1)
, column2
, column3
from aBigTable ;
commit ;
rename aBigTable to aBigTable_old ;
rename aBigTable_temp to aBigTable ;
--
-- Notice how column1 is handled in the select clause of the insert ..select .. statement
--
--
-- create any needed indices on aBigTable
--
Your index creation statements go here.
--
-- Check if data in table aBigTable is OK and drop aBigTable_old
--
drop table aBigTable_old ;

The above steps would complete in far less time than it takes the normal update

Handling large Deletes
Say you have a 30 million row table and you need to delete 5 Millions rows - How would you do it
The normal approach would be to execute the following sql statement

Delete from xyz
where column3 = 'Y‘ ;

What do you think will happen ?
The best and worst case scenarios would be more or less the same as that for an update described previously.

Recommended approach
--
-- Create an empty table
--
create table temp_table
nologging
as select * from xyz where rownum < 1 ;
--
-- Load the rows that you want to keep in the temp table
--
insert /*+ append */
into temp_table (column1,column2,column3)
select column1 , column2 , column3
from xyz
where column3 != 'Y‘ ;
commit ;

rename xyz xyz_old ;
rename xyz_temp xyz ;
--
-- create any needed indices on xyz
--
Your index creation code goes here.
--
-- Check if data in table xyz is OK and drop xyz_old
--
drop table xyz_old ;

Efficient Duplicates elimination

Many times we are faced with situations that require us to design a load script that loads data with duplicates.

Normal method of eliminating duplicates
1.Define a unique index on the table and load data.
2.Load the data using sqlldr conventional path (direct path load can not be used because of the index)
sqlldr rejects Duplicate rows and writes them to badfile. But this unique key checking incurs a very big performance penalty. For example in my testing sqlldr took 28 minutes to load little over a million rows with one duplicate row. Using the normal method, the time it takes to eliminate duplicates grows exponentially with the size of data. So this method is not practical for bulk data loading.

Method to eliminate duplicates efficiently
Oracle assigns a unique rowid to each row in a table. So each row in a table has unique rowid even if the data within the row is duplicated.
In oracle the fastest path to data is the path that uses rowids. So if we can identify the rowids of duplicates then we can use those rowids to delete them from the table very fast.
The sample delete statement given next , identifies the duplicates and deletes them.

This sample delete statement is explained using an example in subsequent slides

delete from xyz
where rowid in
(
--
-- This select gets all the rowids
--
select a.rowid
from xyz
minus
--
-- This select gets rowids of the first row for each duplicated value for the
-- column in the group by clause
--
select min(rowid)
from xyz
group by unique_key_col
) ;

The minus operation gives us the row ids of all the duplicates that can be deleted from the table. Thus, we will have only one row for each set of duplicates in the table after the deletion th
us ridding the table of duplicates.

I use a table called emp_sal to explain the above delete statement whose contents are shown below
EMPID DEPTID SAL
---------- ---------- ----------
1 SALES 36000
2 SALES 38000
3 SALES 42000
4 ACCTS 36000
5 ACCTS 38000
5 ACCTS 42000
5 ACCTS 42000
5 ACCTS 38000
7 SALES 42000
7 SALES 35000
9 IT 42000
10 IT 45000
11 IT 52000
12 IT 55000
13 SALES 35000
15 rows selected.

This gives all rows including duplicates

SQL> select a.rowid , a.empid from emp_sal a
2 order by empid ;
ROWID EMPID
------------------ ----------
AAAYPOAAJAABio4AAA 1
AAAYPOAAJAABio4AAB 2
AAAYPOAAJAABio4AAC 3
AAAYPOAAJAABio4AAD 4
AAAYPOAAJAABio4AAE 5
AAAYPOAAJAABio4AAF 5
AAAYPOAAJAABio4AAO 5
AAAYPOAAJAABio4AAN 5
AAAYPOAAJAABio4AAG 7
AAAYPOAAJAABio4AAH 7
AAAYPOAAJAABio4AAI 9
AAAYPOAAJAABio4AAJ 10
AAAYPOAAJAABio4AAK 11
AAAYPOAAJAABio4AAL 12
AAAYPOAAJAABio4AAM 13
15 rows selected.

The following select statement gives all rows but with only one row per duplicated data

SQL> select min(a.rowid) min_rowid, a.empid
2 from emp_sal a
3 group by empid ;

MIN_ROWID EMPID
------------------ ----------
AAAYPOAAJAABio4AAA 1
AAAYPOAAJAABio4AAB 2
AAAYPOAAJAABio4AAC 3
AAAYPOAAJAABio4AAD 4
AAAYPOAAJAABio4AAE 5
AAAYPOAAJAABio4AAG 7
AAAYPOAAJAABio4AAI 9
AAAYPOAAJAABio4AAJ 10
AAAYPOAAJAABio4AAK 11
AAAYPOAAJAABio4AAL 12
AAAYPOAAJAABio4AAM 13
11 rows selected.

If we do a diff (minus operation) on the above two sets of data then we get the rowids of the duplicates which we use for deleting.

SQL> select a.rowid , a.empid from emp_sal a
2 minus
3 select min(a.rowid) , a.empid
4 from emp_sal a
5 group by empid
6 order by empid ;

ROWID EMPID
------------------ ----------
AAAYPOAAJAABio4AAF 5
AAAYPOAAJAABio4AAN 5
AAAYPOAAJAABio4AAO 5
AAAYPOAAJAABio4AAH 7
4 rows selected.

Here is the delete (With all components combined) …

SQL> delete from emp_sal
2 where rowid in
3 (
4 select a.rowid
5 from emp_sal a
6 minus
7 select min(a.rowid)
8 from emp_sal a
9 group by empid
10 ) ;
4 rows deleted.

So in this method you

1. Load the table using sqlldr direct path
2. Use a delete statement similar to the one shown on the previous slide to eliminate duplicates

Using this method I repeated the test of loading a little over million rows with one duplicate row

1.sqlldr took 58 seconds to load 1065393 rows
2.delete statement took 46 seconds.

So it took less than 2 minutes to load the table and eliminate the duplicates

Efficient Verification of Foreign Key

This is another area that causes performance problems in batch environment.
If you have a foreign key constraint defined for a column in a table, then all the inserts and updates cause Oracle to do a foreign Key checking to make sure that referential integrity is maintained. This means for each insert, Oracle would query the referenced table (parent table) to make sure FK constraint is satisfied. This does not pose a problem if you are dealing with low volumes of data. But this becomes a big issue if you do huge data loads.

So how do you load millions of rows into a table that has foreign keys defined ?

Following two methods are the most widely used but inefficient methods.
Do the load with foreign keys enabled and only rows that satisfy fk constraints get loaded
or
Do the load with foreign keys disabled.

Then try enabling the fk with a stat
ement similar to the one below

ALTER TABLE emp
ENABLE VALIDATE CONSTRAINT fk_deptno
EXCEPTIONS INTO exceptions_table

Both of the above methods are very inefficient and would require big batch windows.

The following method is very efficient for loading tables with FK constraints. It involves 3 steps…
1. Load the table with FK disabled (This will be quick since no FK checking is done)
2. Identify the rows that do not satisfy the FK constraint and delete them
3. Enable foreign key without validating (This will be quick since no FK checking is done)
Following example illustrates the use of the above method.

Desc emp
Name Type
-------------- ------------
EMPNO NUMBER(6)
ENAME VARCHAR2(10)
JOB VARCHAR2(9)
MGR NUMBER(6)
HIREDATE DATE
SAL NUMBER(7,2)
COMM NUMBER(7,2)
DEPTNO NUMBER(2)

Desc dept

Name Type
-------------- -------------
DEPTNO NUMBER(2)
DNAME VARCHAR2(14)
LOC VARCHAR2(13)

Deptno in EMP table references Deptno in DEPT table.
So any new rows inserted into EMP table should have a DEPTNO value that exists in DEPT table.
Now to do a huge data load into EMP table
First Load the data into EMP table with FK disabled.
Then Delete from EMP table deptno values that do not exist in DEPT table as shown next
delete from emp
where rowid in
(-- This gets us the rowids that need to be deleted
select rowid
from emp
where dept_no in
( -- This identifes deptno values that do not exist in DEPT table
select distinct deptno
from emp
minus
select deptno
from dept
)
) ;

After the above step, You can re-enable the foreign key constraint on EMP table without validating. This is because we know that EMP table will have only valid deptno values after the above delete

ALTER TABLE emp
ENABLE NOVALIDATE CONSTRAINT fk_deptno ;

This concludes part 1. I have used all the techniques that were described here to make Oracle batch environment perform optimally. I hope you find these techniques useful.

Database Tech Blog

Saturday, June 30, 2007

Writing Efficient SQL Code for Oracle - Part 1

Introduction

No comments:

Post a Comment

Web Sites

Categories

Blog Archive