Wednesday, January 26, 2011

DATABASE NORMALIZATION

In this post I am going to talk about normalization. No matter at what is the professional age of a software engineer he has to know about database concepts and normalization is one if those important concept. And when it comes to giving interview, I bet every software engineer search internet to revise the concepts regarding normalization and same is the case with me.

So decide to write a post on normalization and dedicate it people who want to give interview. So what is normalization?

Normalization It is the method of breaking down complex table structures in to simple table structures by using certain rules in order to reduce redundancy in a table and eliminate inconsistency problems and disk space usage.

OR

It is a process of elimination of duplication from the set of data and transforms it into specifically identifiable objects.

Benefits from normalization following are the benefits of normalizing database.
  • Avoiding repetitive entries
  • Reducing required storage space
  • Preventing the need to restructure existing tables to accommodate new data.
  • Increased speed and flexibility of queries, sorts, and summaries

There are five stages of normalization from 1st NF to 5th NF. It has been seen that 1st and 2nd normal forms are used extensively and 3rd normal form is used only when it is needed. 4th and 5th normal are generally for academics and as far as there usage is concern they cause performance problems.

First normal form (1st NF) – A table is in first normal form (1NF) if all underlying domains contain atomic values only. So what does atomic values means?

Atomic Values – In most general sense, ‘Atomic’ means anything which has one value. It means each column in a tuple must have one and only one value, it should not be multi valued. To understand this lets look into following example:

Example # 1
Table: Customer
CustomerId
Name
Address
PhoneNumber
Postal Code
Area
City
Province
1
Ali Raza
ABC Society
0345-6687675
0300-6421665
0321-4509861
74600
Defence
Islamabad
Sindh
2
Asif Fattah
XYZ Society
0333-2721835
0321-2125367
45240
Gulberg
Haiderabad
Punjab










In the above example column PhoneNumber does not have atomic values because it has multiple values and because of this column, Customer table is not in the 1st NF. To make this table satisfy 1st NF PhoneNumber column must be placed in a separate table and reference it via foreign key, like I have done in the following example:

Table: Customer
CustomerId
First Name
Last Name
Address
Postal Code
Area
City
Province
1
Ali
Raza
ABC Society
74600
Defence
Islamabad
Sindh
2
Asif
Fattah
XYZ Society
45240
Gulberg
Haiderabad
Punjab

Table: CustomerPhoneNo
CustomerId
PhoneNumber
2
0333-2721835
2
0321-2125367
1
0321-4509861
1
0300-6421665
1
0345-6687675

Example # 2
Table: Meeting
MeetingId
Date
Attendee
123
1/2/2009
Ali, Asif, Asad

In the above example Attendees is multivalve column. To apply 1st NF on this table Attendee column should move in a separate table like I’ve done below:

 Table: Meeting
MeetingId
Date
123
1/2/2009

Table: MeetingAttendee
MeetingId
Attendee
123
Ali
123
Asif
123
Asad


If we have a look of above example it can be figured out that each tuple (row) in Meeting table can have zero/one or many rows in MeetingAttendee table against it and in MeetingAttendee table one or many rows must have one and only one tuple (row) in Meeting table against it.

So, we can say that 1st NF requires to put multi valued column in separate and connect them by making one-to-many relationship.

Second normal form (2nd NF) – An entity is in second normal form (2nd NF) if it fulfill the requirement of 1st NF + (entity who does not have combinatorial key or composite key, all none key elements should be fully dependent on primary key)

Combinatorial Key or Composite Key – key consisted on more than one field is called combinatorial key or Composite Key.

Example # 1
Table: Student
StudentId (pk)
CourseId (pk)
CourseName
StudentName
Address
PhoneNumber
1
1
C++
Ali
ABC Society
0345-6687675
0300-6421665
0321-4509861
2
2
Applied Calculus
Asif
XYZ Society
0333-2721835
0321-2125367


In the above example studentId and CourseId both are primary keys means they are making combinatorial key.  It is obvious from the Student table that not all the fields are completely dependent on StudentId or CourseId. We can figure out that there are two entities stored in one table. To make the Student table satisfy 2nd NF we need to break combinatorial key and put the information dependent on CourseId field in another table let’s say in Course table.

Table: Student
StudentId (pk)
CourseId (fk)
StudentName
Address
PhoneNumber
1
1
Ali
ABC Society
0345-6687675
0300-6421665
0321-4509861
2
2
Asif
XYZ Society
0333-2721835
0321-2125367

Table: Course
CourseId (pk)
CourseName
1
C++
2
Applied Calculus

Table: CustomerPhoneNo
CustomerId(fk)
PhoneNumber
2
0333-2721835
2
0321-2125367
1
0321-4509861
1
0300-6421665
1
0345-6687675

So we can say that 2nd NF requires putting multi valued attributes in separate table.

Third normal form (3rd NF) – An entity is in third normal form (3rd  NF) if it fulfill the requirement of 2nd NF + (all non key element must be functionally dependent on primary key)

Functionally dependent – it means all non key elements must be dependent on primary key.

Example # 1
Table: Customer
CustomerId
Name
Address
Postal Code
Area
City
Province
1
Ali
ABC Society
74600
Nazimabad
Karachi
Sindh
2
Asif
XYZ Society
45240
Sanda
Lahore
Punjab
3
John
DEF society
74600
North Nazimabad
Karachi
Sindh









In the above example City and Province are dependent on Postal Code, both of these fields are not dependent on CustomerId (Primary Key), so to apply 3rd NF on this table we can put City and Country in a separate table and connect both tables via Postal Code field.

If we look it in other perspective we are separating our static data and minimizing storage. Area, City and Country is static data and for each postal code it will remain same (hear I am assuming that postal code is unique for each area in the city).

 Table: Customer
CustomerId
Name
Address
Postal Code
1
Ali
ABC Society
74600
2
Asif
XYZ Society
45240
3
John
DEF society
74600





Table: PostalCode
Postal Code
Area
City
Province
74600
Nazimabad
Karachi
Sindh
45240
Sanda
Lahore
Punjab





So we can say that 3rd form of normalization is about moving that data into other table that is functionally dependent on some other field rather than primary key.I have described the process of normalization in the sequence of 1st, 2nd, and 3rd phase, however; the order of these phases is not that important most of the times. The reason is while designing database the entities are already in 1st or 2nd NF. So you don’t need to apply 1st or 2nd NF. So the rule of thumb is that remember the rules of normalization instead of getting confuse with sequence.

1 comment:

  1. well done good knowledge in very summarize way

    ReplyDelete