Data Vault

For a scalable data warehouse

triangle down left

What is Data Vault 2.0

The Data Vault 2.0 System of Business Intelligence was invented to solve complex issues, such as agility, scalability, flexibility, auditability and consistency. It gives standards and guidelines to build a scalable data warehouse.

The first version of Data Vault was focusing on modelling; Data Vault 2.0 goes way beyond it.

stroke line triangle down right

Foundation pillars of Data Vault

Data Vault consists of methodology, architecture and model. The methodology is repeatable, measurable and agile. It offers a scalable multi-tier architecture, which supports NoSQL. Its hub and spoke models are hash-based and contribute to great flexibility and scalability.

In their fundamental book Data Architecture: A Primer for the Data Scientist, W.H. Inmon and Daniel Linstedt describe the system as comprised of methodology, architecture and model.

system of business intelligence3

What is Data Vault modelling

Dan Lindstedt, the inventor of data vault modelling, describes it as a “detail-oriented, historical tracking and uniquely linked set of normalised tables that support one or more functional areas of business”. He says it is, “the best of breed between 3rd normal form (3nf) and star schema.”

Data Vault is often referred to as a single source of fact because it integrates source data, keeps the original data intact and stores historical changes as well.

Components

Data vault model has 3 simple components which can be used to model complex processes in an agile way. Just like LEGO bricks can be used to create buildings.

3 basic entity types are:

Hub

Unique list of business keys

Hubs store all business keys from each source system provided that the semantic meaning and the granularity remain unchanged.

Link

Unique list of relationships

Links connect Hubs. Instead of adding the foreign key to a table, the relationship is stored as data, which makes it very flexible to change.

Satellite

Descriptive data with change history

Satellites can be created for Hubs or Links too, depending on whether we store data related to a business object or a relationship/event.

data-vault-modelling

When should you use data vault?

Data Vault is designed to handle complex, large scale data warehouse systems.

It is a good choice when you need

Integration of systems

Fast-changing multiple-source systems benefit the most. Decoupled architecture and a business-focusing approach let you integrate entities and connections without reengineering and breaking processes.

Audit and traceability

If you need to provide audit information because of regulations (GDPR, HIPAA, PII, CCPA) or data trust issues. You can always show when the information has been loaded in the system and what the source was.

Data Warehouse Automatisation

Template-based
simple components facilitate automatisation and code generation. You avoid the boring stuff, boost productivity and remove the manual errors from the implementation process.

Agile
development

You can start by modelling only one part of your system and go to production with it. This would give you business value from sprint one. Then you can just keep adding new features to your solution.

Raw atomic data with history

You may need atomic data with all the changes which give you flexibility and failsafe when the business logic is changing, or you have made a mistake. You can always go back to the raw data vault when the unexpected happens.

Adaptability to change

Decoupled components and architecture are essential when you expect changes. You need minimum to zero reengineering efforts to be able to follow changes in the source systems or on a business level.

Near real-time & parallel loading

Data vault entities can be loaded separately, which fits well with real-time report and parallel loading. If you are already using DV, you can switch to NRT without changing the data models or the loading patterns.

When should you opt for another solution?

Simple system

If you work with a few consistent data sources only and there are no integration challenges and no issues with your system.

One-off solution

If you need a one-off quick solution, you have no repeatable tasks (analytics/reporting) and you don't need a long-term data warehouse solution.

Constant system

If your system is not changing much, so you do not usually experience any reengineering pains or regular integration issues.

You don’t believe in standards

If you do not believe in using standards, patterns and automatisation or, alternatively, you
would like to experiment and create your own standards.

You need direct reporting

Wide tables and dimensional models prove the best practice for reporting. Build a dimensional model on top of the data vault one to accommodate BI and analytics.

Use case

UseCase Data Vault

Overview

One of our clients, a successful e-commerce company, was aiming to improve the productivity of their data ecosystem. They had data coming from 36 frequently changing sources, which needed to be integrated.

Issues and actions:

Solution

Using Data Vault, we could integrate new sources without breaking existing processes. We stored atomic data with history when the business logic changed and we could modify or create new versions of BI reports.

Passive integration enabled us to consolidate customer data that came from different CRM systems and provided seamless reporting capabilities.

On top of the data vault model, we used wide tables and dimensional models to serve BI requirements and provide a single source of truth.

With agile development we could provide business value starting from sprint one and keep expanding the data model, integrating new data sources and adding new features after each sprint.

architecture data vault 2

Why choose Infinite Lambda

The experience

Infinite Lambda is a dbt preferred consulting provider, having successfully leveraged the technology in over 15 projects over the last 2 years. We continuously upgrade our skills and stay up to date with the latest developments.

The approach

We emphasise the full transparency at each stage of the project. Working with us, you will take an active part of the development process and will be familiar with everything we do with your system and resources.

The training

We do not shy away from sharing expertise and will organise engaging training sessions to teach you how to use dbt. Once we complete the project, you will be able to take full care of your infrastructure on your own.

Use Data Vault with

new-dbt-dark-grey-logo
green-blue triangle pointing right

We can help your business make better use of data, so you can:

Get in touch

Fullstack Data

Bring all your real-time or batch data into a single integrated analytics platform

Cloud Development

Develop flexible cloud processes that are robust, secure and scalable

DevOps

DevOps

Leverage our flexible DevОps services to build and maintain a core cloud infrastructure

training

Training

Provide your team with regular training on the newest tools in BI, DevOps & more