More and more modern companies are looking to get started with data contracts because they are a great way of increasing a business' data maturity and awareness.
One of their most important benefits is that they help create a Data-as-a-Product mindset within an organisation, enabling easier collaboration among teams as it is much quicker to describe the data rather than actually build it.
Data contracts also enable parallel development since consumers can start mocking the data while producers build it. They facilitate automation and improve data quality, which optimises resources and contributes to data trust.
Finally, the same data contract can offer a series of other derivative solutions, such as dbt models, tests, other schema formats, SQL queries and more.
How do data contracts work?
Data contracts are a relatively new concept or, rather, a combination of existing concepts, repackaged for a data-related context. They work similarly to API specs: they describe how an API behaves, the format of its input/output data and any other features. The developer of the API commits to honouring those specs, and clients can confidently build their applications against those specs, knowing that they will get the expected behaviour.
Data contracts are a very similar concept, but specialised for data platforms. In general, anywhere data is being produced for consumption by groups different from the producer themselves, a data contract is potentially applicable.
In a nutshell: a data contract acts as an agreement or set of rules between data publishers and subscribers, ensuring mutual understanding during data transactions and maintaining consistency. It prevents confusion during the data exchange process.
The basics of getting started with data contracts
- Start small
Embrace open formats and standards and customise them to fit your culture. Experiment and slowly ramp up.
- Automate, automate, automate
Leverage the endless possibilities of code and metadata: test, generate, get infinitely creative.
- Be gradual
Getting started with data contracts usually comes with a big culture shift for the entire organisation, so be gradual and seek feedback continuously.
- Commit to it
Make an effort implementing data contracts seriously into your processes. A poor initial experience could risk the adoption not getting any traction, so be consistent once you get started.
How to introduce data contracts into tech practices
Define the format of data contracts
Data contracts are first of all code. A common practice is to use YAML, but any similar structured language, such as JSON or AVRO, can be used to the same effect.
Have a look at the available open standards first, and only when you have completed a thorough research, make your decision whether to adopt one of them or create your own.
- The systems the contracts will be integrated with, such as Kafka Schema Registry or data catalogues;
- The existing practices there are within the company;
- The level of data maturity vs the existing open standards / solutions, which might prove too complex when getting started with data contracts.
Define a data contract creation and maintenance process
Consider the specific tech stack and any other applicable variables. Aim to establish a flexible but clear process that everyone can understand and easily leverage to create and update data contracts.
Make sure you:
- Have automated tests in place to check the contracts’ formal validity;
- Require a minimum of reviews before any contract is accepted;
- Enforce backward compatibility;
- Enforce any other possible conditions you define, including versioning, naming and more
Do not amend contracts unless it makes sense for your stack or scenario as any changes should be created as a new contract version.
Look for automation opportunities
It is not easy to get everyone onboard with the data contract context. Make sure to show the organisation the most practical sides of data contracts, such as the way they save time and effort.
One of the most impactful ways to do this is through automations and code generation. Try leveraging these to optimise people’s work and you will start gaining the data contracts advocates you need.
Here are a few examples:
- Masking PII fields based on tags;
- dbt source models & contracts;
- Data validators for other languages in use.
Useful resources you can leverage to get started with data contracts
Non-technical readers’ guide to data contracts: read the article on the Infinite Lambda Blog
Using data contracts to drive a Data-as-a-Product mindset: online panel discussion with Chad Sanderson (Data Quality Camp), Yali Sassoon (Snowplow), Jean-Georges Perrin (data mesh expert) and Nina Anderson (Infinite Lambda). Available on demand.
Data mesh adoption: A business leader's perspective on the benefits and challenges of federated governance. Read the article on the Infinite Lambda Blog.
Driving a Data-as-a-Product mindset: online panel discussion with Kate Ho (Head Of Product, Data Platform & Products At The BBC), Mohit Joshi (Global Head of Data Platforms & Data Science at Collinson Group), Will Blake (Chief Technology Officer at CRU) and Nas Radev (CEO at Infinite Lambda). Available on demand.
Could you use some help to get started with data contracts?
The Infinite Lambda team are experts in helping organisations adopt cutting-edge data technology and practices to become truly data-driven and AI-enabled.
Drop us a line if you have any questions about getting started with data contracts and fostering a Data-as-a-Product mindset.