From Tables and Spreadsheets to Data Cubes

  • A data warehouse is based on a multidimensional data model which views data in the form of a data cube
  • A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions
    • Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year)
    • Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables
  • In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.

Cube: A Lattice of Cuboids

Conceptual Modeling of Data Warehouses

  • Modeling data warehouses: dimensions & measures
    • Star schema: A fact table in the middle connected to a set of dimension tables
    • Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake
    • Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation

Example of Star Schema

Example of Snowflake Schema

Example of Fact Constellation

A Concept Hierarchy: Dimension (location)

Data Cube Measures: Three Categories

  • Distributive: if the result derived by applying the function to n aggregate values is the same as that derived by applying the function on all the data without partitioning
      • E.g., count(), sum(), min(), max()
  • Algebraic: if it can be computed by an algebraic function with M arguments (where M is a bounded integer), each of which is obtained by applying a distributive aggregate function
      • E.g., avg(), min_N(), standard_deviation()
  • Holistic: if there is no constant bound on the storage size needed to describe a subaggregate.
      • E.g., median(), mode(), rank()

View of Warehouses and Hierarchies

Multidimensional Data

  • Sales volume as a function of product, month, and region

A Sample Data Cube

Cuboids Corresponding to the Cube

Typical OLAP Operations

  • Roll up (drill-up): summarize data
    • by climbing up hierarchy or by dimension reduction
  • Drill down (roll down): reverse of roll-up
    • from higher level summary to lower level summary or detailed data, or introducing new dimensions
  • Slice and dice: project and select
  • Pivot (rotate):
    • reorient the cube, visualization, 3D to series of 2D planes
  • Other operations
    • drill across: involving (across) more than one fact table
    • drill through: through the bottom level of the cube to its back-end relational tables (using SQL)

Typical OLAP Operations

A Star-Net Query Model

Browsing a Data Cube