ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 1
Copyright © 2015 – 2018, Victorian Institute of Technology.
The contents contained in this document may not be reproduced in any form or by any means, without the written permission of VIT,
other than for the purpose for which it has been supplied. VIT and its logo are trademarks of Victorian Institute of Technology.
NORMALIZATION
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 2
Topics
• Normalization
• Normalization Steps
• Normalization Forms
• First Normal Form
• Second Normal Form
• Third Normal Form
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 3
Normalization
• Having studied functional dependencies and some of their
properties, we are now ready to use them to specify some
aspects of the semantics of relation schemas.
• We assume that a set of functional dependencies is given for
each relation, and that each relation has a designated
primary key; this information combined with the tests
(conditions) for normal forms drives the normalization
process for relational schema design. Most practical
relational design projects take one of the following two
approaches:
– First perform a conceptual schema design using a conceptual model
such as ER or EER and then map the conceptual design into a set of
relations.
– Design the relations based on external knowledge derived from an
existing implementation of files or forms or reports.
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 4
Normalization (Cont…)
• Initially, Codd proposed three normal forms, which he called
first, second, and third normal form. A stronger definition of
3NF-called Boyce-Codd normal form (BCNF)-was proposed
later by Boyce and Codd. All these normal forms are based
on the functional dependencies among the attributes of a
relation.
• Normalization of data can be looked upon as a process of
analyzing the given relation schemas based on their FDs and
primary keys to achieve the desirable properties of
– Minimizing redundancy
– Minimizing the insertion, deletion, and update anomalies
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 5
Normalization Steps
• Normalization is a mechanical process to evaluate and
correct the table structure, with an aim to eliminate data
redundancies
– Goes through a number of stages known as “normal form”
– First stage is called the first normal form (1NF)
• Generally, (n+1)-NF is better than the n-NF
– For most situations, 3NF is the furthest we go
– Higher normal forms has less redundancies
– There are 10 Normal Forms exist
– We will look up to 3rd Normal Form
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 6
Normalization Forms
User Users’ Requirement Specification Sources describing the enterprise such
as Data Dictionary and Data Model
Unnormalized Form (UNF)
Transfer attribute into table format
First Normal Form (1NF)
Remove repeating groups
Second Normal Form (2NF)
Remove partial dependencies
Third Normal Form (3NF)
Remove transitive dependencies
Boyce-Codd normal Form (BCNF)
Make sure every determinant is candidate key
Fourth Normal Form (4NF)
Remove multi-valued dependencies
Special case of 3NF
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 7
First Normal Form
• A table is in First Normal Form when
– It has a primary key
– All fields are atomic (indivisible).
• For example in a shopping Website’s database an attribute
named “Credit Card Details” would NOT be atomic, and should
be broken up into the atomic attributes “credit card number”,
“expiry data”, “account name”, “name on card”, etc.
– Repeating groups have been removed.
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 8
First Normal Form (Cont…)
• Primary Key
– We know that a Primary Key is an attribute that contain a unique
values for each record – an identifier
– We often create a new column containing an “auto number” for this
job, and normally call it “ID”. We can though use composite keys
(also known as compound keys), which means combining 2 or more
attributes to give us that unique value. Take this example of a table
recording when a manager first meets members:
| ManagerID |
MemberID |
IntroductionDate |
| S12 |
C21 |
1/12/2015 |
| S15 |
C21 |
1/12/2015 |
| S15 |
C32 |
3/12/2015 |
| S12 |
C54 |
15/12/2016 |
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 9
First Normal Form (Cont…)
• Primary Key
– None of the 3 fields alone give a record a unique identity. Our
choices for a primary key are either to create a 4th column containing
an auto-number, or combining ManagerID and MemberID into a
composite key. S12C21, for example, will be a unique value in the
table. Only existing for that first record and never being repeated.
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 10
First Normal Form (Cont…)
• All fields are atomic
– An atomic field is one that can’t logically or usefully be subdivided.
– Examples
• A field named “Customer Name” is probably not atomic, because
breaking up into “first name” and “last name” would probably be
logical and useful in your situation e.g. if you think it would useful
to search for customers based on their surnames.
• A field named “Street” could be divided into “street number” and
“street name”, but if that level of detail is unimportant for your
database, “Street” may be considered to be at the atomic level.
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 11
First Normal Form (Cont…)
• All fields are atomic
– Example
• The author details are in this table are non-atomic
• They would be better broken up like this:
• Or even better, like this:
| ID |
Title |
Authors |
| 12301 |
Harry Potter and the Philosopher’s Stone |
A1234 J.K. Rowling |
| ID |
Title |
AuthorID |
AuthorName |
| 12301 |
Harry Potter and the Philosopher’s Stone |
A1234 |
J.K. Rowling |
| ID |
Title |
AuthorID |
GivenNames |
FamilyName |
| 12301 |
Harry Potter and the Philosopher’s Stone |
A1234 |
J.K. |
Rowling |
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 12
First Normal Form (Cont…)
• No Repeating Groups
– You can see that as you add more books, you may find that the
author details become a repeating group
– You should therefore spilt the table up, and maintain a connection by
placing the repeating table’s primary key in the other table (the table
on the many side – the author has many books):
| ID |
Title |
AuthorID |
| 12301 |
Harry Potter and the Philosopher’s Stone |
A1234 |
| 12302 |
Harry Potter and the Chamber of Secrets |
A1234 |
| ID |
Title |
AuthorID |
GivenName |
FamilyName |
| 12301 |
Harry Potter and the Philosopher’s Stone |
A1234 |
J.K. |
Rowling |
| 12302 |
Harry Potter and the Chamber of Secrets |
A1234 |
J.K. |
Rowling |
| AuthorID |
GivenNames |
FamilyName |
| A1234 |
J.K. |
Rowling |
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 13
First Normal Form (Cont…)
• No Repeating Groups
– Some of you may be thinking that a book can be written by many
authors. This would be the case, and so we now recognise that we
once again have a repeating group or non-atomic fields.
– Our solution for this is to
• create multiple author ID rows, but how many? Not a neat
solution.
• Flatten the table, as you can see on the next slide. Again, this is
not a neat solution, because it gets us back to repeating rows
| ID |
Title |
AuthorID |
| 12301 |
Harry Potter and the Philosopher’s Stone |
A1234 |
| 32134 |
Full House |
A9999, A888 |
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 14
First Normal Form (Cont…)
• No Repeating Groups
– Repeating values (the title and ID are repeating)
– So, once again we split them up
| ID |
Title |
AuthorID |
| 12301 |
Harry Potter and the Philosopher’s Stone |
A1234 |
| 32134 |
Full House |
A9999 |
| 32134 |
Full House |
A8888 |
| ID |
Title |
| 12301 |
Harry Potter and the Philosopher’s Stone |
| 32134 |
Full House |
| Book ID |
AuthorID |
| 12301 |
A1234 |
| 32134 |
A9999 |
| 32134 |
A8888 |
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 15
First Normal Form (Cont…)
• No Repeating Groups
– Remember that we always need a primary key in 1NF for each
table. The title table still has one, but not the Book/Author table we
just created. We can either create a new attribute (here called “ID”)
or we can combine the Book ID and Author ID into a composite
Primary key for the new table, or an auto number could be added:
| ID |
Title |
| 12301 |
Harry Potter and the Philosopher’s Stone |
| 32134 |
Full House |
| ID |
BookID |
AuthorID |
| 1 |
12301 |
A1234 |
| 2 |
32134 |
A9999 |
| 3 |
32134 |
A8888 |
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 16
First Normal Form (Cont…)
• To summarise that example, we now have a situation where
an author can write many books and a book can be written
by many authors (and so we have changed one table into 3)
| ID |
Title |
| 12301 |
Harry Potter and the Philosopher’s Stone |
| 32134 |
Full House |
| ID |
BookID |
AuthorID |
| 1 |
12301 |
A1234 |
| 2 |
32134 |
A9999 |
| 3 |
32134 |
A8888 |
| AuthorID |
GivenNames |
FamilyName |
| A1234 |
J.K. |
Rowling |
| A9999 |
Janet |
Evanovich |
| A8888 |
Charlotte |
Hughes |
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 17
First Normal Form (Cont…)
• What about if we add telephone numbers field (Phone)?
– Is it a repeating group?
– It may make sense to more columns – one for home phone, mobile,
work etc.
– But how many columns? How many phone numbers?
– You might decide creating columns is an impractical waste of space.
In that case, what you really have is repeating information, as you
can see if you flatten the table out (see next slide)
| AuthorID |
GivenNames |
FamilyName |
Phone |
| A1234 |
J.K. |
Rowling |
98674321, 0444123123 |
| A9999 |
Janet |
Evanovich |
| A8888 |
Charlotte |
Hughes |
97778888 |
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 18
First Normal Form (Cont…)
– “flattening” the table reveals the repeating group
– So, the solution is, once again, to create 2 tables:
– Would you be confident that ID and Phone would form a composite
key (a unique value)?
– Notice too that we lose the “NULL” value in Janet’s phone number.
| ID |
GivenNames |
LastName |
Phone |
| A1234 |
J.K. |
Rowling |
98674321 |
| A1234 |
J.K. |
Rowling |
0444123123 |
| A9999 |
Janet |
Evanovich |
| A8888 |
Charlotte |
Hughes |
97778888 |
| ID |
Phone |
| A1234 |
98674321 |
| A1234 |
0444123123 |
| A8888 |
97778888 |
| ID |
GivenNames |
LastName |
| A1234 |
J.K. |
Rowling |
| A9999 |
Janet |
Evanovich |
| A8888 |
Charlotte |
Hughes |
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 19
First Normal Form (Cont…)
• Our original guess at a table may have looked like this:
• In 1NF it becomes (with foreign and primary keys marked)
| ID |
Title |
Authors |
| 12301 |
Harry Potter and the Philosopher’s Stone |
A1234 J.K. Rowling 98674321, 0444123123 |
| 32134 |
Full House |
A9999 Janet Evanovich, A8888 Charlotte Hughes 97778888 |
| ID |
GivenNames |
LastName |
| A1234 |
J.K. |
Rowling |
| A9999 |
Janet |
Evanovich |
| A8888 |
Charlotte |
Hughes |
| ID |
Phone |
| A1234 |
98674321 |
| A1234 |
0444123123 |
| A8888 |
97778888 |
| ID |
Title |
| 12301 |
Harry Potter and the Philosopher’s Stone |
| 32134 |
Full House |
| Book ID |
AuthorID |
| 12301 |
A1234 |
| 32134 |
A9999 |
| 32134 |
A8888 |
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 20
Second Normal Form
• Remove Partial Dependency
– 2NF need only be considered if you have a table that uses a
composite Primary Key
– A table is in 2NF if
• It is in 1NF,
and
• A non-key attribute is dependent on all parts of the composite key
(not just one of the composite key columns).
• In other words, if an attribute, doesn’t describe the entity it
belongs to, and instead only describes part of the Primary Key, it
shouldn’t be in the table.
• Removing it helps to eliminate redundant data
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 21
Second Normal Form (Cont…)
• Trivial functional dependency
– For example, our Phone table from the previous example contains
a composite key, and if we were to expand the table like so:
– You can see that the City data is relevant to the author alone – it’s
not connected with the phone number. We therefore have to pull it
out of this table, but where to?
Author table.
– What about Default Number? It is dependent on both the Author ID
and Phone #, so it should stay in this table.
| ID |
Phone |
DefaultNumber |
City |
| A1234 |
98674321 |
Y |
Melbourne |
| A1234 |
0444123123 |
N |
Sydney |
| A8888 |
97778888 |
Y |
Perth |
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 22
Second Normal Form (Cont…)
• Another example (Electric Toothbrush Models) taken from
Wikipedia(https://en.wikipedia.org/wiki/Second_normal_form).
• The composite key is Manufacturer/Model.
• Model Full Name is dependent on both the Manufacturer and
Model (that is, for example Forte X-Prime is relevant to Forte’s
X-Prime, not Hoch’s X-Prime)
| Manufacturer |
Model |
Model Full Name |
ManufacturerCountry |
| Forte |
X-Prime |
Forte X-Prime |
Italy |
| Forte Ultraclean |
Ultraclean |
Forte Ultraclean |
Italy |
| Dent-o-Fresh |
Ezbrush |
Dent-o-Fresh Ezbrush |
USA |
| Kobayashi |
ST-60 |
Kobayashi ST-60 |
Japan |
| Hoch |
Toothmaster |
Hoch Toothmaster |
Germany |
| Hoch |
X-Prime |
Hoch X-Prime |
Germany |
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 23
Second Normal Form (Cont…)
• Manufacturer Country is related to the Manufacturer and
has nothing to do with the Models of the manufacturer. So, it
doesn’t belong to that table.
| Manufacturer |
Model |
ModelFullName |
| Forte |
X-Prime |
Forte X-Prime |
| Forte Ultraclean |
Ultraclean |
Forte Ultraclean |
| Dent-o-Fresh |
Ezbrush |
Dent-o-Fresh Ezbrush |
| Kobayashi |
ST-60 |
Kobayashi ST-60 |
| Hoch |
Toothmaster |
Hoch Toothmaster |
| Hoch |
X-Prime |
Hoch X-Prime |
| Manufacturer |
Manufacturer Country |
| Forte |
Italy |
Forte Ultraclean |
Italy |
| Dent-o-Fresh |
USA |
| Kobayashi |
Japan |
| Hoch |
Germany |
| Hoch |
Germany |
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 24
Third Normal Form
• A table is in 3NF if
– It is in 2NF, and
– Transitive dependencies are removed, which means all non-key
attributes are dependent on the primary key, not another non-key
attribute. If you aren’t in 3NF, it often means you’ve allocated a field
to the wrong table.
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 25
Third Normal Form (Cont…)
• Example (https://en.wikipedia.org/wiki/Third_normal_form)
• Note that 3NF is not concerned with nature of the primary
key (Composite or not). You can see that Winner’s date of
birth is relevant to the Winner column, which is a nonprimary key. It’s not connected directly with Tournament or
Year, and so it doesn’t belong in the table.
| Tournament |
Year |
Winner |
WinnerDateofBirth |
| Indiana Invitational |
1998 |
Al Fredrickson |
21 July 1975 |
| Cleveland Open |
1999 |
Bob Albertson |
28 September 1968 |
| Des Moines Masters |
1999 |
Al Fredrickson |
21 July 1975 |
| Indiana Invitational |
1999 |
Chip Masterson |
14 March 1977 |
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 26
Third Normal Form (Cont…)
| Tournament |
Year |
Winner |
| Indiana Invitational |
1998 |
Al Fredrickson |
| Cleveland Open |
1999 |
Bob Albertson |
| Des Moines Masters |
1999 |
Al Fredrickson |
| Indiana Invitational |
1999 |
Chip Masterson |
| Winner |
DateofBirth |
| Chip Masterson |
14 March 1977 |
| Al Fredrickson |
21 July 1975 |
| Bob Albertson |
28 September 1968 |
• Update anomalies cannot occur in these tables, because
unlike before, Winner is now a primary key in the second
table, thus allowing only one value for Date of Birth for
each Winner.
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 27
Third Normal Form (Cont…)
• Summary
– A table is in 1NF if repeating groups are eliminated (and a primary
key has been defined and all attributes are atomic)
– A table is in 2NF if it’s in 1NF and all partial dependencies have been
removed
– A table is in 3NF if it’s in 2NF and all transitive dependencies have
been removed
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 28
Normalization Forms
• 1NF – First Normal Form
• 2NF – Second Normal Form
• 3NF – Third Normal Form
• EKNF – Elementary Key Normal Form
• BCNF – Boyce–Codd Normal Form
• 4NF – Fourth Normal Form
• ETNF – Essential Tuple Normal Form
• 5NF – Fifth Normal Form
• DKNF – Domain/Key Normal Form
• 6NF – Sixth Normal Form
ITDA1001 [Lesson 4] Copyright © 2018 VIT, All Rights Reserved 29
Summary
• Revision of Key Concepts
• Class Work
– Lesson 4 – Activity 4 (Lesson_4 Activity.docx)
• Questions and Answer
The post without the written permission of VIT appeared first on My Assignment Online.