Introduction
In part 2 of this article I cover writing Efficient SQL Code for the OLTP environment.
Following topics are covered
- Correlated Queries
- Implicit data type conversions
- Making sure indexes are considered by the Optimizer
- Reduce the network traffic
Correlated Queries
Exercise caution while writing SQL statements using correlated queries. If you use lot of correlated queries in your code and if they get executed frequently then you might want to re-examine them. Correlated queries could get very expensive in terms of resources as explained in the next few slides with the help of an example.
The query shown next lists employees that have been hired between Jan 2002 and March 2002 and have been employed the longest in each division.
SQL> select emp_no, emp_division, (emp_term_dt - emp_hire_dt) emp_span
2 from emp o
3 where emp_hire_dt >= '01-JAN-2002'
4 and emp_hire_dt < '31-MAR-2002'
5 and (o.emp_term_dt - o.emp_hire_dt)
6 in
7 ( select max(i.emp_term_dt - i.emp_hire_dt)
8 from emp i
9 where i.emp_division = o.emp_division
10 and emp_hire_dt >= '01-JAN-2002'
11 and emp_hire_dt < '31-MAR-2002' ) ;
62 rows selected.
Elapsed: 00:00:43.58
Statistics
-------------------------------------------
977818 physical reads
The above sql is a correlated query. The output from the query is not shown here. Notice the execution time and physical reads.Next slide will show this query re-written without using the correlated sub query
The same sql re-written as shown below and it does not use correlated sub query. Output from the query is not shown here. Notice the drastic reduction in the execution time and the physical reads
SQL> select a.emp_division, a.emp_no, b.emp_span
2 from emp a
3 , ( select emp_division, max(i.emp_term_dt - i.emp_hire_dt) emp_span
4 from emp i
5 where emp_hire_dt >= '01-JAN-2002'
6 and emp_hire_dt < '31-MAR-2002'
7 group by emp_division ) b
8 where a.emp_division = b.emp_division
9 and (a.emp_term_dt - a.emp_hire_dt) = b.emp_span
10 and a.emp_hire_dt >= '01-JAN-2002'
11 and a.emp_hire_dt < '31-MAR-2002' ;
62 rows selected.
Elapsed: 00:00:01.82
Statistics
----------------------------------
28725 physical reads
Let us modify the query to consider the whole emp table.
Select statement with a correlated subquery
SQL> select a.emp_division, a.emp_no, (a.emp_term_dt - a.emp_hire_dt) emp_span
2 from emp a
3 , ( select emp_division, max(i.emp_term_dt - i.emp_hire_dt) max_span
4 from emp i
5 group by emp_division ) b
6 where a.emp_division = b. emp_division
7 and (a.emp_term_dt - a.emp_hire_dt) = b.max_span
8 order by emp_span ;
305 rows selected.
Elapsed: 00:29:08.75
Statistics
----------------------------------------------------------
39094846 physical reads
As can be seen above, the resource intensive nature of Correlated queries becomes very evident as the size of the driving table in the outer query increases. Now the above correlated sub query took more than 29 minutes and a huge number of physical reads.
The above query is re-written as shown below and does not use the correlated sub query and notice that the execution time and physical reads still are very low although now we are considering the entire emp table
SQL> select a.emp_division, a.emp_no, b.emp_span
2 from emp a
3 , ( select emp_division
4 , max(i.emp_term_dt - i.emp_hire_dt) emp_span
5 from emp i
6 group by emp_division ) b
7 where a.emp_division = b.emp_division
8 and (a.emp_term_dt - a.emp_hire_dt) = b.emp_span ;
305 rows selected.
Elapsed: 00:00:02.84
Statistics
-----------------------------------
28726 physical reads
Making sure indexes are considered by the Optimizer
Oracle optimizer may not consider the indexes when evaluating query plans if the columns in the where clause have any kind of transformation done on them.
For example in the following select statement notice how the ename column is transformed.
Select * from emp
Where upper(ename) = ‘&ename’
If an index exists on ename column Oracle optimizer will not consider it when evaluating the access paths for executing the above statement.
Next I explain with an example how we can make sure that indexes are considered by the Optimizer.
Consider the emp table that we used in the previous examples. This table has an index on hiredate column.
Some of the columns of emp table are shown below…
Name Type
-------------- ------------
EMPNO NUMBER(6)
ENAME VARCHAR2(10)
HIREDATE DATE
DEPTNO NUMBER(2)
Following select statement retrieves the data for all employees hired yesterday.
Select * from emp
where hiredate + 1 = sysdate ; *
When evaluating the access paths optimizer does not consider the hiredate index, instead it looks for an index on hiredate+10 and does not find one.
* Actually Oracle date columns have time component also, but I am ignoring that to keep this example simple.
If you want the optimizer to use the index then you have to rewrite the above select statement as shown below …
Select * from emp
where hiredate = sysdate -1 ;
Now the optimizer looks for an index on hiredate and finds one and uses it, if that gives a better access path.
To summarize, if you do any transformations on columns in the where clause of your SQL statement, then indexes (if any) on those columns would not be considered by Oracle optimizer while evaluating access paths.
Avoid implicit data type conversions
What is implicit data type conversion?
Implicit data type conversion occurs when the database converts the data types while applying a condition in the where clause during the execution of a query.
Consider the following table
SQL> describe SALESCENTER_CHANNEL_SUMMARY
Name Null? Type
----------------------------------- -------- -----------
SALES_CENTER_NUM NOT NULL VARCHAR2(6)
CHANNEL_NBR NOT NULL NUMBER(3)
DIVISION_NO NOT NULL NUMBER(2)
SALES NOT NULL NUMBER
SALES_CENTER_NUM column is defined as datatype VARCHAR2(6) in the above table.
select count(*)
from SALESCENTER_CHANNEL_SUMMARY
where SALES_CENTER_NUM = 213502 ;
COUNT(*)
----------
37
The above select statement worked because Oracle did the data type conversion for us when it executed the query. And because the sales_center_num data in the table happened to contain only numeric values.
If you looked at what Oracle did behind the scenes for this query, it would be
1. For each row of SALESCENTER_CHANNEL_SUMMARY
apply the where clause condition SALES_CENTER_NUM = 213502
by first converting the SALES_CENTER_NUM
into a number and then comparing against the value 213502
2. Return the filtered result set from step 1
In effect, it is as if your query is rewritten as follows…
select count(*)
from SALESCENTER_CHANNEL_SUMMARY
where to_number(SALES_CENTER_NUM) = 213502 ;
Now let us add another row to the table that has a non numeric values for sales_center num column as shown below.
SALES_ CHANNEL_NBR DIVISION_NO BASIS_COMPANY_ID BA
------ ----------- ----------- ---------------- --
4070ZP 14 40 70 ZP
Now when we execute the same query again we get the following error
SQL> select count(*)
2 from SALESCENTER_CHANNEL_SUMMARY
3 where SALES_CENTER_NUM = 213502 ;
where SALES_CENTER_NUM = 213502
*
ERROR at line 3:
ORA-01722: invalid number
We got an error because now there is a sales center 4070ZP in the data that could not be converted into a number. The query worked as long as the data values could be implicitly converted but failed when there is a data value that could not be converted.
Moral of the story : Do not depend on implicit data conversions.
There is a side effect to implicit conversions that needs to be mentioned also. Implicit conversions prevent the optimizer from choosing the index even if there is an index existing for that column. Implicit data conversions almost always result in full table scans. That is another reason why we should not depend on implicit data conversions.
Reduce the network traffic
High network traffic saturates the available network bandwidth in a WAN causing severe performance problems.
So we need to minimize the network traffic between
- the client application and database server if it is a two tier architecture
- the App server and the database server if it is a three tier architecture
Network traffic consists of
1. Sql statements sent from the application to the database server
2. Results sent from the database server to the application
Minimizing traffic from the application to the database server
To do this we need to minimize the amount of data going from from the application to the database server.
This can be done by moving the sql statements from the application to the database stored procedures. This means instead of sending lengthy sql statements across the network, application just invokes database stored procedures with appropriate parameters.
This results is far less traffic being generated from the application to the database server.
This approach has another benefit as well, it reduces the parsing activity to a minimum – because stored procedures are pre-parsed and can be pinned in the memory. Since parsing is a very CPU intensive activity, less parsing means less load on the CPU.
Minimizing traffic from the database server to the application
In this case, we can not minimize the amount of data sent from the database server to the application. But we can still reduce the traffic by packing more data in each of the tcp/ip packets which results in less number of packets going across the network.
This can be done by tweaking the sqlnet ARRAYSIZE. I am not going into details here but just know that it is possible to tweak it and take the help of your DBA to do that.
Conclusion
Many applications fail to scale because of the SQL code that is not efficient. Writing SQL code that works alone is not enough. It is important to make sure that your SQL code does what it does very efficiently.
I have used the techniques described here very successfully to solve many tuning problems over the last several years. I hope you find the information presented here useful.
No comments:
Post a Comment