Monday, 15 May 2023

Normalization

Functional Dependency
Functional Dependency is a constraint between two sets of attributes in a relation
from a database. A functional dependency is denoted by arrow (→). If an attributed A
functionally determines B, then it is written as A → B.
For example, employee_id → name means employee_id functionally determines the
name of the employee.
A function dependency A → B means for all instances of a particular value of A,
there is the same value of B
For example in the below table A → B is true, but B → A is not true as there are
different values of A for B = 2

A B
1 2
3 2
5 3
1 2
5 3
2 7

 

Trivial Functional Dependency
X → Y is trivial only when Y is subset of X.
Examples
ABC→ AB
ABC→ A
Non Trivial Functional Dependencies
X → Y is a non trivial functional dependency when Y is not a subset of X.
X → Y is called completely non-trivial when X intersects Y is NULL.

 

 

 

Def: Normalization is a body of rules addressing analysis and conversion of data structures in to relations that exhibit more desirable properties of internal consistency, minimal redundancy and maximum stability.


Def: Normalization is the process of decomposing a set of relations with anamolies to produce smaller and well structured relations that contain minimum or no redundancy.


Def: Normalization is the process of reducing /decomposing a relation into a set of small relations free from data redundancy and ensuring data integrity.


Def: Normalization is a procedure of successive reduction of a given collection of relational schemas based on their FDS and primary keys to achieve desirable form of minimized redundancy, minimized insertion, deletion and minimized update analmolies.

Database normalization is the process of organizing the attributes of the database to reduce or eliminate data redundancy (having the same data but at different places) Data redundancy unnecessarily increases the size of the database as the same data is repeated in many places. Inconsistency problems also arise during insert, delete and update operations.

 

 Problems without Normalization


If a table is not properly normalized and have data redundancy then it will not only eat up extra memory space but will also make it difficult to handle and update the database, without facing data loss. Insertion, Updation and Deletion Anomalies are very frequent if database is not normalized. To understand these anomalies let us take an example of a Student table.

Rollno Name Branch Hod Office_tel
101         A     BCA     X     12345
102         B     BCA     X     12345
103         C     BCA     X     12345
104         D     BCA     X     12345 

In the table above, we have data of 4 BCA students. As we can see, data for the fields Branch, Hod (Head of Department) and Office_tel is repeated for the students who are in the same branch in the college, this is Data Redundancy.

OBJECTIVES
· The basic objective of normalization is to reduce the data redundancy and prevent loss of information or avoid inconsistencies when the data base is altered.
· The types of alterations needed for relations are: 

1. Insertion of new data values to a relation.
2. Deletion of tuple or a row in a relation.
3. Updating or changing the value of an attribute in a tuple.
· The normalized relation does not include spurious inform ation when the original schema is reconstructed.
· Preserves dependencies present in the original database /shed


Data redundancy: If data in the database exist in two or more places / locations i.e. direct redundancy or if data can be calculated from other data items i.e. indirect data redundancy. Then the data base is said to be redundant.
· Data should only be stored once and avoid storing data that can be calculated from other data.
· During normalization redundancy is removed of course not at the cost of integrity
(rules) of the database.
Data Integrity: All the data in the database are consistent and satisfies all integrity constraints.
INTEGRITY CONSTRAINTS = An integrity constraint is a rule that restricts the values
that may be present in the database.
· The relational data model includes constraints that are used to verity the validity
or the data as well as adding meaningful structure to it. Value in each row.
Entity Integrity: The primary key must have unique not null.
Referential Integrity: Every foreign key must either be null or its values must be the
actual value of a key in another relation


FIRST NORMAL FORM:
· First Normal Form (INF) deals with the shape of the record type.
· The relation is said to be in INF if and only if, it. It contains no repeating attributes
or groups attributes.
· The relation/table is said to be in INF when each cell of the tuble/relation contains
precisely one value (atomic)
· The relation is said to be INF if each cell value is atomic.
Let us consider a relation student (Roll no, name, dob, subject grade)


Roll No

Name

DOB

Subject

Grade

Teacher

 

 

 

 

 

 

101

Rakesh

14.11.1985

RDBMS

E

Smith

 

 

 

DEC

A

Clark

 

 

 

MATH-II

O

Allen

 

 

 

PHY-II

E

Johnes

 

 

 

NT

A

Martin

 

 

 

 

 

 

102

Tapan

18.07.1986

DEC

A

Clark

 

 

 

MATH-II

E

Allen

 

 

 

PHY-II

O

Johnes

 

 

 

RDBMS

E

Smith

 

 

 

 

 

 


                    

 

 

 

NT

B

Martin

 

 

 

 

 

 

104

Mohan

19.07.1985

RDBMS

O

Smith

 

 

 

MATH-II

A

Allen

 

 

 

PHY-II

E

Johnes

 

 

 

DEC

B

Clark

 

 

 

NT

A

Martin

 

 

 

 

 

 


Here the table students is not normalized because the cells in subjects and grade are not atomic.

 By the definition of INF, each cell value must be atomic so, to normalize the relation we need to fill all the columns of rollno, name, dob by repeating values    for each rows corresponding to subject and grade or each student. 


Roll No

Name

DOB

Subject

Grade

Teacher

 

 

 

 

 

 

101

Rakesh

14.11.1985

RDBMS

E

Smith

101

Rakesh

14.11.1985

DEC

A

Clark

101

Rakesh

14.11.1985

MATH-II

O

Allen

101

Rakesh

14.11.1985

PHY-II

E

Johnes

101

Rakesh

14.11.1985

NT

A

Martin

 

 

 

 

 

 

102

Tapan

18.07.1986

DEC

A

Clark

102

Tapan

18.07.1986

MATH-II

E

Allen

102

Tapan

18.07.1986

PHY-II

O

Johnes

102

Tapan

18.07.1986

RDBMS

E

Smith

102

Tapan

18.07.1986

NT

B

Martin

 

 

 

 

 

 

103

Mohan

19.07.1985

RDBMS

O

Smith

103

Mohan

19.07.1985

MATH-II

A

Allen

103

Mohan

19.07.1985

PHY-II

E

Johnes

103

Mohan

19.07.1985

DEC

B

Clark

103

Mohan

19.07.1985

NT

A

Martin

 

 

 

 

 

 


The table is normalized to INF but it can lead to several undesirable problems

 Redundancy exists which leads to wastage of memory also.  The multiple copies of same fact leads to update anamolies. Ex: Change in date of birth leads to change in all tuples.

 

 If the teacher teaching a subject, it can be entered only if a student enrolls to the subject. 

So it leads to insertion anomalies.

 

If the only student in a given course discontinues the information as to which course a professor is teaching will be lost if the student information is deleted.

 

Prime attribute: An attribute of a relation schema R is called a prime attribute of R if it is a member of some candidate key of R.

 

Non prime attribute: An attribute is called a non prime attribute if it is not a prime attribute i.e. an attribute is said to be a non-prime attribute if it is not a part of the candidate keys.

If every non-key attribute functionally dependent on the primary key, then the relation will be in

(a)    1 NF

(b)    2NF

(c)    3NF

(d)    BCNF

 

No comments:

Post a Comment

Timestamp-Based Protocols

Timestamp-Based Protocols Each transaction T i  is issued a timestamp TS( T i ) when it enters the system. •        Each transac...