Accessibilité Articles

Data Vault: Good practices to create links - Part 3


May 2011

Published by

Luc Durand

This article is the third of a series of articles dealing with an alternative to data warehouse modeling: the Data Vault approach.

In the first article, we introduced the overall Data Vault approach and the three types of entities. The structural entities are represented by hubs that identify the business concepts used by one or several sectors of the organisation and by links that connect the business concepts. The descriptive entities are represented by satellites that describe and contextualize hubs and links.

Figure 1 shows an example of the Data Vault model. The Student, the Admission and the Program entities are hubs and the Admission Program entity is a link.

Stay on top of the latest analytics trends

In the second article, we focused more specifically on hubs.

This third article is addressing links between hubs.A link is a dependant entity representing a relationship between at least two concepts (hubs) upon which it depends.This is the case of a star model, link’s granularity (its level of detail) is dictated by the hubs in relation.

Here are some criteria to respect to be a good link.

A link:

  • Is connected to at least two similar concepts (hubs) (note: in the case of a hierarchical relationship, you can connect twice to the same hub);
  • Does not contain any descriptive data element (e.g.: the relation’s beginning and the end dates). Links’ satellites contain the descriptive part;
  • Is the only entity type containing relationships between concepts;
  • Is used regardless of the relationship’s cardinality between two concepts. It does not consider cardinalities more specific than many-to-many relationships.
  • Therefore, a change in cardinality will not affect the model at all (note: a more specific cardinality can be documented in the metadata);
  • A lways contains at least two pieces of information allowing traceability: the source where the link comes from and the moment when the instance of the link was brought into the warehouse;
  • Is implemented by foreign keys : the hubs’ (internal) identifiers;
  • Is defined with the finest grain possible, except for redundant links defined with a higher granularity for performance reasons;
  • Is created if the grain changes and the former link stays, which prevents reengineering of the existing model and guaranties verifiability.

Several of these criteria are used to make sure that the existing model’s structure will not need reengineering whenever changes occur in the business environment. The Data Vault model’s structure is designed so that when changes occur, they have no impact on existing parts of the model. Links are added without revising the existing structure. Thus, the structure is flexible and updates are added with a lot less efforts. There is absolutely no need to convert existing data in the new structure.

In the next article of the series, we will introduce the third Data Vault’s entity type: the satellites.