...

How to get started with data contracts

Cristiano Valente
December 28, 2023
Read: 3 min

More and more modern companies are looking to get started with data contracts because they are a great way of increasing a business' data maturity and awareness.

One of their most important benefits is that they help create a Data-as-a-Product mindset within an organisation, enabling easier collaboration among teams as it is much quicker to describe the data rather than actually build it.

Data contracts also enable parallel development since consumers can start mocking the data while producers build it. They facilitate automation and improve data quality, which optimises resources and contributes to data trust.

Finally, the same data contract can offer a series of other derivative solutions, such as dbt models, tests, other schema formats, SQL queries and more.

How do data contracts work?

Data contracts are a relatively new concept or, rather, a combination of existing concepts, repackaged for a data-related context. They work similarly to API specs: they describe how an API behaves, the format of its input/output data and any other features. The developer of the API commits to honouring those specs, and clients can confidently build their applications against those specs, knowing that they will get the expected behaviour.

Data contracts are a very similar concept, but specialised for data platforms. In general, anywhere data is being produced for consumption by groups different from the producer themselves, a data contract is potentially applicable.

In a nutshell: a data contract acts as an agreement or set of rules between data publishers and subscribers, ensuring mutual understanding during data transactions and maintaining consistency. It prevents confusion during the data exchange process.

 

The basics of getting started with data contracts

  • Start small
    Embrace open formats and standards and customise them to fit your culture. Experiment and slowly ramp up.
  • Automate, automate, automate
    Leverage the endless possibilities of code and metadata: test, generate, get infinitely creative.
  • Be gradual
    Getting started with data contracts usually comes with a big culture shift for the entire organisation, so be gradual and seek feedback continuously.
  • Commit to it
    Make an effort implementing data contracts seriously into your processes. A poor initial experience could risk the adoption not getting any traction, so be consistent once you get started.

How to introduce data contracts into tech practices

Define the format of data contracts

Data contracts are first of all code. A common practice is to use YAML, but any similar structured language, such as JSON or AVRO, can be used to the same effect.

Have a look at the available open standards first, and only when you have completed a thorough research, make your decision whether to adopt one of them or create your own.

Consider:

  • The systems the contracts will be integrated with, such as Kafka Schema Registry or data catalogues;
  • The existing practices there are within the company;
  • The level of data maturity vs the existing open standards / solutions, which might prove too complex when getting started with data contracts.

Define a data contract creation and maintenance process

Consider the specific tech stack and any other applicable variables. Aim to establish a flexible but clear process that everyone can understand and easily leverage to create and update data contracts.

Make sure you:

  • Have automated tests in place to check the contracts’ formal validity;
  • Require a minimum of reviews before any contract is accepted;
  • Enforce backward compatibility;
  • Enforce any other possible conditions you define, including versioning, naming and more

Do not amend contracts unless it makes sense for your stack or scenario as any changes should be created as a new contract version.

Look for automation opportunities

It is not easy to get everyone onboard with the data contract context. Make sure to show the organisation the most practical sides of data contracts, such as the way they save time and effort.

One of the most impactful ways to do this is through automations and code generation. Try leveraging these to optimise people’s work and you will start gaining the data contracts advocates you need.

Here are a few examples:

  • Masking PII fields based on tags;
  • dbt source models & contracts;
  • Data validators for other languages in use.

 

Useful resources you can leverage to get started with data contracts

Non-technical readers’ guide to data contracts: read the article on the Infinite Lambda Blog

Using data contracts to drive a Data-as-a-Product mindset: online panel discussion with Chad Sanderson (Data Quality Camp), Yali Sassoon (Snowplow), Jean-Georges Perrin (data mesh expert) and Nina Anderson (Infinite Lambda). Available on demand.

Data mesh adoption: A business leader's perspective on the benefits and challenges of federated governance. Read the article on the Infinite Lambda Blog.

Driving a Data-as-a-Product mindset: online panel discussion with Kate Ho (Head Of Product, Data Platform & Products At The BBC), Mohit Joshi (Global Head of Data Platforms & Data Science at Collinson Group), Will Blake (Chief Technology Officer at CRU) and Nas Radev (CEO at Infinite Lambda). Available on demand.

 

Could you use some help to get started with data contracts?

The Infinite Lambda team are experts in helping organisations adopt cutting-edge data technology and practices to become truly data-driven and AI-enabled.
Drop us a line if you have any questions about getting started with data contracts and fostering a Data-as-a-Product mindset.

More on the topic

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.

Tag-Based Masking in Snowflake
Tag-Based Masking in Snowflake: Practical Guide with Scalable Implementation
As data continues to be a critical asset for organisations across industries, safeguarding sensitive information while enabling data access for authorised users is a constant...
June 11, 2024
Cloud sustainability
Cloud Sustainability
This article on cloud sustainability is a part of a series on carbon analytics published on the Infinite Lambda Blog. Appreciating the cloud is becoming...
June 5, 2024
How to measure happiness and safety in tech teams
How to Measure Happiness and Safety in Tech Teams
Software product development initiatives can go awry for a whole range of reasons. However, the main ones tend not to be technical at all. Rather,...
May 30, 2024
why sustainability analytics
Why Sustainability Analytics
We all like a sunny day. Kicking back in the garden with the shades on, cool drink in hand and hopefully a liberal amount of...
May 8, 2024
Data diff validation in a blue green deployment: how to guide
Data Diff Validation in Blue-Green Deployments
During a blue-green deployment, there are discrepancies between environments that we need to address to ensure data integrity. This calls for an effective data diff...
January 31, 2024
GDPR & Data Governance in Tech
GDPR & Data Governance in Tech
The increasing focus on data protection and privacy in the digital age is a response to the rapid advancements in technology and the widespread collection,...
January 18, 2024

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.