The purpose of normalization process , to state in layman's terms - is to eliminate data redundancy, data maintenance (insert, update and delete) anomalies and to establish relationships between the tables that make logical sense
Generally speaking, normalization process involves splitting of a non normalized table (or a lump of data) into multiple normalized tables.
Let us start by looking at the following fictitious invoices and try to design tables that are in 3rd normal form and that can capture the data in the fictitious invoices.
---------------------------------------------------------------------------------------
Inv No : 345987
Inv Dt : 08/08/2005
Customer Type : R Customer Type Desc : Retail
Custome Id : 621
PC Center Inc
Address : 123, Peach street, Atlanta, GA 30035
Item no Description Quantity Amount (USD)
123 Keyboard 50 500.00
987 IDE hard disk 80 GB 30 1200.00
---------------
Total 1700.00
---------------------------------------------------------------------------------------
Inv No : 355987
Inv Dt : 08/10/2006
Customer Type : W Customer Type Desc : Wholesale
Custome Id : 525
PC Mart Inc
Address : 56, Carrington Cove, Atlanta, GA 30029
Item no Description Quantity Amount (USD)
481 Optical Mouse 500 2500.00
987 IDE hard disk 80 GB 300 10000.00
----------------
Total 12500.00
---------------------------------------------------------------------------------------
As a first cut, we create a table that will capture the above data. Each invoice will be a record in this table.
Let us call this table invoice_lump and choose inv_no as the primary key.
shown below is this invoice_lump table with the above invoice data. Click on the table name if you want to see an unwrapped version of this table
Table_Name : invoice_lump Primary key : inv_no
Inv_no Inv_date CustId CustName CustAddr CustCity CustState CustZip CustType CustTypeDesc Item_1_No Item_1_Desc Item_1_Qty Item_1_Amt Item_2_No Item_2_Desc Item_2_Qty Item_2_Amt
345987 08/08/2005 621 PC Center Inc 123, Peach street Atlanta GA 30035 R Retail 123 Keyboard 50 500.00 987 IDE hard disk(80 GB) 30 1200.00
355987 08/10/2006 525 PC Mart Inc 56, Carrington Cove Atlanta GA 30029 W Wholesale 481 Optical Mouse 500 2500.00 987 IDE hard disk(80 GB) 300 10000.00
You can see straight away the problems with the above table as listed below...
If an invoce contains more than 2 items then we will need to add more columns to accomadate those items.
The customer and item information is redundently repeated in the table.
If we need to update a customer address or item desc then we will have to do a full table scan which is ineffecient.
You can not add a new customer that never placed an order or add a new item information that does not have sales.
The process of normalization is aimed at resolving the above issues by designing tables in such a way that the above
problems are eliminated. So let us start normalizing the invoice_lump table.
First Normal Form
A table is said to be in First Normal Form if it has no repeating elements or groups of elements.
The repeating groups should be moved into their own tables to conform to First Normal Form.
We can immediately see that Item information is repeating since each invoice can have multiple items. So we split the invoice_lump table into 2 tables as shown below.
Table : INV_HDR, Primary Key ; Inv_No
Inv_no Inv_date CustId CustName CustAddr CustCity CustState CustZi
p CustType CustTypeDesc
345987 08/08/2005 621 PC Center Inc 123, Peach street Atlanta GA 30035 R Retail
355987 08/10/2006 525 PC Mart Inc 56, Carrington Cove Atlanta GA 30029 W Wholesale
Table : INV_LINES, Primary Key ; Inv_No ,Item_No
Inv_no Item_No Item_Desc Item_Qty Item_Amt
345987 123 Keyboard 50 500.00
345987 987 IDE hard disk(80 GB) 30 1200.00
355987 481 Optical Mouse 500 2500.00
355987 987 IDE hard disk(80 GB) 300 10000.00
The above tables have no repeating elements or groups of elements. So they are in First Normal Form.
The above table still have the problem of data redundancy. To address this we will need to further normalize the tables.
Second Normal Form
A table is said to be in Second Normal Form if it meets the First Normal Form requirement and
if every non-key field in that table is functionally dependent on the entire PK. There should be no partial dependencies.
Now let us examine the above two tables (inv_hdr and inv_lines) and see if they conform to Second Normal Form.
These tables already conform to First Normal Form so they partially meet Second Normal Form requirement already.
We can see that in the inv_hdr table the Customer attributes are not functionally dependent on inv_no (which is the PK ).
Hence the Customer attributes need to be separated into their own table.
In the inv_lines table, the item attributes are only dependent on Item_no column of the PK which constitutes partially dependency.
Hence the item attributes need to be separated into their own table.
So We split inv_hdr table into inv_hdr and Customer_mstr tables and we split inv_lines into inv_lines and Item_mstr tables.
Table : INV_HDR, Primary Key : Inv_No
Inv_no Inv_date CustId
345987 08/08/2005 621
355987 08/10/2006 525
Table : CUSTOMER_MSTR, Primary Key ; CustId
CustId CustName CustAddr CustCity CustState CustZip CustType CustTypeDesc
621 PC Center Inc 123, Peach street Atlanta GA 30035 R Retail
525 PC Mart Inc 56, Carrington Cove Atlanta GA 30029 W Wholesale
Table : INV_LINES, Primary Key : Inv_No ,Item_No
Inv_no Item_No Item_Qty Item_Amt
345987 123 50 500.00
345987 987 30 1200.00
355987 481 500 2500.00
355987 987 300 10000.00
Table : ITEM_MSTR, Primary Key : Item_No
Item_No Item_Desc
123 Keyboard
481 Optical Mouse
987 IDE hard disk(80 GB)
The above tables are in Second Normal Form because
- they meet the requirement for First Normal Form
- all non-key fields are functionally dependent on the PK in each of these tables
Third Normal Form
A table is said to be in Third Normal Form if it meets the Second Normal Form requirement
and there are no transitive dependencies
If a non-key field is functionally dependent on another non-key field and is indirectly dependent on the primary key through that non-key field then that constitutes a transitive dependency .
Of the above 4 tables we can only see the transitive dependency in the customer_mstr table.
The rest of the tables have no transitive dependencies, hence they already conform to TNF.
The Customer_mstr table has a field called CustTypeDesc which is dependent on a non-key field called CustType.
The field CustTypeDesc is dependent on the CustId field (which is the PK) through the CustType field.
This constitutes transitive dependency.
Hence we have to split that relationship into another table in order to bring the Customer_Mstr table into TNF
Table : CUSTOMER_MSTR, Primary Key ; CustId
CustId CustName CustAddr CustCity CustState CustZip CustType
621 PC Center Inc 123, Peach street Atlanta GA 30035 R
525 PC Mart Inc 56, Carrington Cove Atlanta GA 30029 W
Table : CUSTTYPE_LKUP, Primary Key ; CustId
CustType CustTypeDesc
R Retail
W Wholesale
The above tables are in Third Normal Form because
- they meet the requirement for Second Normal Form
- there are no transitive dependencies
This completes the normalization process of the INVOICE_LUMP table.
We started with a single non normalized table and ended up with 5 separate tables that are in third normal form.
You can see that in each of the above normalized tables, every non-key field depends on the primary key, the complete primary key and nothing but the primary key.
I hope this article helps you understand the process of normalization.




No comments:
Post a Comment