For a scalable data warehouse
What is Data Vault 2.0
The Data Vault 2.0 System of Business Intelligence was invented to solve complex issues, such as agility, scalability, flexibility, auditability and consistency. It gives standards and guidelines to build a scalable data warehouse.
The first version of Data Vault was focusing on modelling; Data Vault 2.0 goes way beyond it.
Foundation pillars of Data Vault
Data Vault consists of methodology, architecture and model. The methodology is repeatable, measurable and agile. It offers a scalable multi-tier architecture, which supports NoSQL. Its hub and spoke models are hash-based and contribute to great flexibility and scalability.
In their fundamental book Data Architecture: A Primer for the Data Scientist, W.H. Inmon and Daniel Linstedt describe the system as comprised of methodology, architecture and model.
What is Data Vault modelling
Dan Lindstedt, the inventor of data vault modelling, describes it as a “detail-oriented, historical tracking and uniquely linked set of normalised tables that support one or more functional areas of business”. He says it is, “the best of breed between 3rd normal form (3nf) and star schema.”
Data Vault is often referred to as a single source of fact because it integrates source data, keeps the original data intact and stores historical changes as well.
Components
Data vault model has 3 simple components which can be used to model complex processes in an agile way. Just like LEGO bricks can be used to create buildings.
3 basic entity types are:
Hub
Unique list of business keys
Hubs store all business keys from each source system provided that the semantic meaning and the granularity remain unchanged.
Link
Unique list of relationships
Links connect Hubs. Instead of adding the foreign key to a table, the relationship is stored as data, which makes it very flexible to change.
Satellite
Descriptive data with change history
Satellites can be created for Hubs or Links too, depending on whether we store data related to a business object or a relationship/event.
When should you use data vault?
Data Vault is designed to handle complex, large scale data warehouse systems.
It is a good choice when you need
Integration of systems
Fast-changing multiple-source systems benefit the most. Decoupled architecture and a business-focusing approach let you integrate entities and connections without reengineering and breaking processes.
Audit and traceability
If you need to provide audit information because of regulations (GDPR, HIPAA, PII, CCPA) or data trust issues. You can always show when the information has been loaded in the system and what the source was.
Data Warehouse Automatisation
Template-based
simple components facilitate automatisation and code generation. You avoid the boring stuff, boost productivity and remove the manual errors from the implementation process.
Agile
development
You can start by modelling only one part of your system and go to production with it. This would give you business value from sprint one. Then you can just keep adding new features to your solution.
Raw atomic data with history
You may need atomic data with all the changes which give you flexibility and failsafe when the business logic is changing, or you have made a mistake. You can always go back to the raw data vault when the unexpected happens.
Adaptability to change
Decoupled components and architecture are essential when you expect changes. You need minimum to zero reengineering efforts to be able to follow changes in the source systems or on a business level.
Near real-time & parallel loading
Data vault entities can be loaded separately, which fits well with real-time report and parallel loading. If you are already using DV, you can switch to NRT without changing the data models or the loading patterns.
When should you opt for another solution?
Simple system
If you work with a few consistent data sources only and there are no integration challenges and no issues with your system.
One-off solution
If you need a one-off quick solution, you have no repeatable tasks (analytics/reporting) and you don't need a long-term data warehouse solution.
Constant system
If your system is not changing much, so you do not usually experience any reengineering pains or regular integration issues.
You don’t believe in standards
If you do not believe in using standards, patterns and automatisation or, alternatively, you
would like to experiment and create your own standards.
You need direct reporting
Wide tables and dimensional models prove the best practice for reporting. Build a dimensional model on top of the data vault one to accommodate BI and analytics.
- If you’re not sure, we’re here to help you decide. Get in touch now to discuss your use case.
Use case
Overview
One of our clients, a successful e-commerce company, was aiming to improve the productivity of their data ecosystem. They had data coming from 36 frequently changing sources, which needed to be integrated.
Issues and actions:
- Unconnected data silos: we needed to integrate the sources
- Manual reporting using Google Sheets: we needed to automate
- Slow and unreliable BI process: we needed to create a single source of truth
Solution
Using Data Vault, we could integrate new sources without breaking existing processes. We stored atomic data with history when the business logic changed and we could modify or create new versions of BI reports.
Passive integration enabled us to consolidate customer data that came from different CRM systems and provided seamless reporting capabilities.
On top of the data vault model, we used wide tables and dimensional models to serve BI requirements and provide a single source of truth.
With agile development we could provide business value starting from sprint one and keep expanding the data model, integrating new data sources and adding new features after each sprint.
Why choose Infinite Lambda
The experience
Infinite Lambda is a dbt preferred consulting provider, having successfully leveraged the technology in over 15 projects over the last 2 years. We continuously upgrade our skills and stay up to date with the latest developments.
The approach
We emphasise the full transparency at each stage of the project. Working with us, you will take an active part of the development process and will be familiar with everything we do with your system and resources.
The training
We do not shy away from sharing expertise and will organise engaging training sessions to teach you how to use dbt. Once we complete the project, you will be able to take full care of your infrastructure on your own.
Use Data Vault with
We can help your business make better use of data, so you can:
- Client focus / modern stack
- Certification
- Experience
Get in touch
Fullstack Data
Bring all your real-time or batch data into a single integrated analytics platform
Cloud Development
Develop flexible cloud processes that are robust, secure and scalable