AFF604A: Modelling Humanities Data

SQL Syntax

Agenda

Modelling Relational Data
Understanding Relationships
How to Normalise Data
Best Practices
Crafting ERDs & Schemas
 

Key concepts

Entity: “table” or “relation”
Attribute: “field” or “column”
Key: value which uniquely identifies a row
Constraint: anything which dictates permissible values (data type, nullable, unique, etc)

What is Relational Modelling?

The key is in the name: relationships
Abstract data to reduce repetition
 
 

Flat Model

What might be some issues with building something very flat?
 
 
modelling_flat.png
modelling_flat_spreadsheet.png

“Normalised” Model

What are some advantages to modelling something more robustly?
 
 
modelling_normal.png

Why Use Relational Modelling?

Groups data into logical collections
Reduces Repetition
Allows for complex relationships between entities
 
 

Understanding Relationships

Groups data into logical collections
Reduces Repetition
Allows for complex relationships between entities
 
 

How Relationships Work

Relationships are reliant upon “keys”
Primary Keys
Composite keys
Foreign Keys
Parent key
Child key
Recursive key
Best practice: make keys a whole number value when possible
 
 

How Relationships Work: Primary Keys

Most important field in your table
Must be unique!
Can NEVER be null!
Assist with retrieval
Often only known by the backend
Should generally be a whole number (but exceptions can exist)
n determine data ordering
 
 

How Relationships Work: Composite Keys

Consists of two separate fields that combined create a discrete value
Same constraints as Primary keys (unique and not null)
Not as commonave performance impact
 
 

How Relationships Work: Foreign Keys

Reference to a primary key in another table
Not required per say but is a best practice
Ensure cascading data is properly handled (eliminates orphaned data)
rent” key is the primary key of the owner
“Child” key is the key being referenced in the table itself.
 
 

How Relationships Work: Recursive Keys

References a relationship to another row in the same table.
Not as common but can be useful.
Can create a very complex structure to navigate
Cannot describe a relationship beyond single parent / child.
 
 

Activity Time: Understanding Keys! (10 mins)

Give me an example of each of the following and why you would use each:
 
Primary Keyposite Key
Foreign Key (parent & child)
Recursive Key
 
Be prepared to discuss.
 
 
learningExercise.jpeg

Normalisation

[T]he process the columns (attributes) and tables (relations) of a relational database to reduce data redundancy and improve data integrity
 
~ Wikipedia.com

Understanding Database Normalisation

riginally developed by Edgar Codd (father of relational model) in 1970
Goal is to remove redundancy
Separates data so that two different types aren’t structurally reliant
Consider the example of customer purchases
Specific rules govern normalisation
 

Understanding Database Normalisation

Originally developed by Edgar Codd (father of relational model) in 1970
Goal is to remove redundancy
Separates data so that two different types aren’t structurally reliant
Consider the example of customer purchases
Specific rules govern normalisation
 

Types of Normalisation

Describe the level to which data is normalised
Data is considered “normalised” to its lowest level
Currently, there are 10 recognised “forms”:
1st Normal Form (1NF)
2nd Normal Form (2NF)
3rd Normal Form (3NF)
Elementary Key Normal Form (EKNF)
Boyce-Codd Normal Form (BCNF)
4th Normal Form (4NF)
Essential Tuple Normal Form (ETNF)
5th Normal Form (5NF)
6th Normal Form (6NF)
Domain Key Normal Form (DKNF)
 
 
Describe the level to which data is normalised
Data is considered “normalised” to its lowest level
Currently, there are 10 recognised “forms”:
1st Normal Form (1NF)
2nd Normal Form (2NF)
3rd Normal Form (3NF)
Elementary Key Normal Form (EKNF)
Boyce-Codd Normal Form (BCNF)
4th Normal Form (4NF)
Essential Tuple Normal Form (ETNF)
5th Normal Form (5NF)
6th Normal Form (6NF)
Domain Key Normal Form (DKNF)
 
 

Types of Normalisation: 1NF

Table must not contain repeating groups of data
Primary key must be present
 

Types of Normalisation: 1NF Example

modeling_0NF.gif
modeling_1NF.gif

Types of Normalisation: 2NF

Meets constraints of 1NF
Any “non-key” attributes are dependent on the primary key
 

Types of Normalisation: 2NF Example

modeling_1NF.gif
modeling_2NF.gif

Types of Normalisation: 3NF

Meets constraints of 2NF
All non-key attributes are “directly dependent” on the primary key
“[Every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key.” ~ Bill Kent
 

Types of Normalisation: 3NF Example

modeling_2NF.gif
modeling_3NF.gif

Types of Normalisation: Beyond 3NF

One method: the removal of data that can be calculated
Nice but not necessary
Can become cumbersome from a data management perspective
Can have an impact on performance
 

Types of Normalisation: Denormalisation

Normalisation creates many tables / joins
Queries across multiple tables can be expensive
Sometimes collapsing the data makes sense (especially for high volume tables)
 

Activity Time! (20 mins)

Break into groups of 3-4. For the table to the left, create 3 models: 1NF, 2NF, 3NF. Each should build off the last. Be prepared to share your work.
 
 
nfExercise.png
28

Students example

Tables:
 
 
 
 
 
Still implemented as files, but behind the scenes can be quite complex
Students:
Takes:
Courses:
data independence” = separate logical view
from physical implementation
 
29

Queries

Find all courses that “Mary” takes
 
 
 
 
 
 
 
What happens behind the scene ?
Query processor figures out how to answer the query efficiently.
SELECT  C.name
FROM  Students S, Takes T, Courses CWHERE  S.name=“Mary” and  S.ssn = T.ssn and T.cid = C.cid
30

Queries, behind the scene

Imperative query execution plan:
SELECT  C.name
FROM Students S, Takes T, Courses C
WHERE S.name=“Mary” and
  S.ssn = T.ssn and T.cid = C.cid
Declarative SQL query
 
 
 
 
 
 
 
 
 
 
The optimizer chooses the best execution plan for a query
 
Students
Takes
 
 
Courses
 
 
 
 
 
 
 
 
 
 
 
sid=sid
 
 
 
sname
 
 
name=“Mary” 
 
 
 
 
cid=cid

Exercise

  • Model Student Course Example
  • Using: http://sqlfiddle.com/

Structured Query Language

  • Syntax overview
  •  
    • https://www.w3schools.com/sql/default.asp

PRIMARY KEY

  • The PRIMARY KEY constraint uniquely identifies each record in a database table.
  • Primary keys must contain UNIQUE values, and cannot contain NULL values.
  • A table can have only one primary key, which may consist of single or multiple fields.
  •  

FOREIGN KEY

  • A FOREIGN KEY is a key used to link two tables together.
  • A FOREIGN KEY is a field (or collection of fields) in one table that refers to the PRIMARY KEY in another table.
  • The table containing the foreign key is called the child table, and the table containing the candidate key is called the referenced or parent table.
  •  

Solution

Crafting ERDs and Schemas

Documenting Your Model

ERDs & Schemas: What are they

Entity Relationship Diagram (ERD) is an abstract model of the database
Shows relationships and attributes
Can be conceptual, logical, or physical
Relational Schema describes the implementation of the data model
Shows all entities, attributes, and relationships
Details data types
Can be easily converted into a database schema
 

Exercise

  • Build an ERD Schema for the course example
  • Expand the schema to include all the entity on the real example
  • Fill the database with real data from the example
  • Count how many students don’t have a course
  • Count how many students enrolled on more than 10 course for each year
  • Expand the schema to take into account exams and grades
  •  
 
Iontas, Room 1.02