Definition

Relational databases use a concept called normalisation to reduce the amount of duplication and redundancy in the database, and also to help improve data integrity.

As an example, assume you have a single table holding orders. Some of the fields in that table will hold the customer’s name and address, and if the same customer places multiple orders with you, those details will be repeated many times in your data.

If you split your order table into two, and put the customer names and addresses into their own table, creating only a single record for each customer, then you can link it to the order table using the primary key from the customer table as a foreign key in the order table.

A simple way to think about normalising your data is to consider how a database table could hold only one type of information – a customer, an order, a product, and so on.

Types of normalisation

There are quite a few different ways of normalising data, and if you sit down with a database developer each will probably have their own take on how far down the slippery slope you should be going!

The different levels are called Normal Forms, and they run through a sequence of First Normal Form, Second Normal Form and so on. There are six, plus variants, but most applications will only use the first or second.

First Normal Form

This isn’t intended as a full-on tutorial, but as an indication of what we are talking about so that if anyone throws the phrase “first normal form” at you (in some bizarre situation that I can’t immediately visualise) you will have at least a vague idea what they are talking about.

So, the First Normal Form requires that some fairly basic and logical rules are followed:

  • Each record must have a unique identifier (see primary key above)
  • Each field or column must contain items of the same data type (dates, numbers or text)
  • Each field or column must have a unique name
  • Each field or column should only contain a single data element – don’t stick two phone numbers in the same field!

You may be wondering, in the case of that last one, what happens if the contact has two phone numbers?

This is where normalisation comes in, because what happens there is that you need a separate table to hold phone numbers, joined to the contact table (using the contact’s primary key as a foreign key, remember) and then you can add as many phone numbers as you need, or none, against each contact.

Not only does this make it easier to search and work with your database, but it means that if you have no phone numbers, there is no empty field in a contact record, so no “wasted” space, and that’s what it’s all about.

I think you get the idea, or I hope you do. Think about how you can structure your data to avoid duplication, redundancy, lack of flexibility, and ease of querying, and you are most of the way there. The additional types of normal form just build on this logic, and essentially mean splitting data down into ever more discrete (not necessarily discreet!) chunks.

Image by mcmurryjulie from Pixabay