The database approach
A database is a computer-based system to record and maintain information about anything of significance to an organisation. Modern databases almost always store their contents in structures called tables like the example below.
Imagine a case where we want to store facts about employees in a company. Such facts could include their name, address, date of birth, and salary. A table looks very much like a spreadsheet page with different employees as the rows, and the facts (e.g. their names) as columns. The table must have a name, and this one could be called EMP (short for employee).
Name | Address | Date of Birth | Salary |
---|---|---|---|
Jim Smith | 1 Apple Lane | 1/3/1991 | 11000 |
Jon Greg | 5 Pear St | 7/9/1992 | 13000 |
Bob Roberts | 2 Plum Road | 3/2/1990 | 12000 |
The database contains two types of information:
-
The schema is the structure of data including the names of tables and columns. The schema defines the container in which real data is stored.
-
Data are the "facts" that are stored in the databases tables. The schema defines the rules that the data must obey.
From this information the schema would define that EMP has four components, "NAME", "ADDRESS", "DOB", "SALARY". As designers we can name the columns however we like, but making them meaningful helps. In addition to the name, we want to try and make sure that people don't accidentally store a name in the DOB column, or some other silly error. Protecting the database against bad data is one of the most important database design steps, and is what much of this course is about. From what we know about the facts, we can say things like:
-
NAME is a string, and needs to hold at least 12 characters.
-
ADDRESS is a string, and needs to hold at least 12 characters.
-
DOB is a date... The company forbids people over 100 years old or younger than 18 years old working for them.
-
SALARY is a number. It must be greater than zero. These rules are called constraints and can be enforced by the database management system (DBMS).
Constraints are identified during the design phase of a database schema. The more constraints there are, the harder it is to enter poor quality data.