...

Sustainability: the Last Frontier in Business Intelligence

Nina Anderson
April 1, 2022
Read: 5 min

The power of the modern data stack in generating actionable insights out of disparate data is well documented. It’s time to apply this to sustainability.

As the latest instalment of the IPCC report has made clear, we are heading towards an ecological crisis that will be irreversible if not addressed in the next few years. Given that the private sector accounts for 85% of investments globally, it is difficult to overstate its role in any viable solution. Despite this, historically the onus has been on consumers to reduce their personal carbon footprint – largely due to very successful marketing campaigns by vested interests.

Fast forward to today, and the balance is starting to shift: there is increasing consumer pressure for companies to be more transparent about the environmental impact of the goods and services they offer.

Conscious consumerism

Meanwhile, the analytics industry has developed at a breakneck pace, valued at more than $200 billion and ever growing, largely from its role in helping businesses optimise their every decision. Targeting lookalike audiences to boost sales is now a well-worn technique; we can finely tune processes based on real time IoT data, and even understand protein folding, well, some of us. So why does the private sector still struggle to understand its own impact on the environment?

There are 3 main factors that make this a hard problem for businesses:

1. It is a very data-intensive problem.

Understanding a whole organisation’s carbon footprint is currently an expensive and time-consuming exercise, involving process mapping and data collection – often from third party suppliers who may have little incentive to help. The vast majority of emissions for most companies relate to activities outside of their immediate sphere of control (known as Scope 3 emissions). It can be difficult to get a green light for this kind of exercise when the analytics budget is limited.

2. Emissions factors are not readily available.

Operational data must be combined with emissions factors to generate meaningful sustainability metrics. Emissions factors are not readily available in the right format, may be locale-specific, patchy, or behind a paywall.

3. Companies lack the analytics expertise.

Even if you do manage to gather the right operational data and emissions factors, there are multiple standards for calculating metrics (though GHGP is dominant) and you need expertise to navigate them. Unsurprisingly, large traditional consultancies as well as a tranche of new sustainability consultancies are capitalising on this need.

In other words, it is hard, confusing and expensive, while offering little immediate reward. In this context, it is understandable that most businesses that do try to address their environmental impact limit their efforts to (partial) carbon offsetting rather than addressing root causes in their supply chains. But the limitations of this approach are widely documented.

It is time for the data industry to start tackling the last frontier in BI – sustainability. We need to make carbon accounting feasible (read: less hard, confusing and expensive) for businesses so they can focus on the hard work of reaching net zero, and help them prioritise environmental initiatives according to their cost-impact ratio.

The cloud data warehouse in the context of sustainability

The role of the data warehouse in supporting sustainability analytics

Let us consider the role of the data warehouse in this challenge and the reasons it might be a better bet than other tools. As this is a multifaceted issue that can by no means be explored in a single article, we are going to dedicate future posts to other elements as well.

At Infinite Lambda, we often talk about automating the trivial in order to focus on the most difficult problems. It might seem like there is little that is trivial in analysing an organisation’s environmental impact but the reality of any data-intensive problem is that some heavy lifting must be done to prepare data for analysis.

So far, the emerging solutions in this space are specialised SaaS solutions focused on removing that heavy lifting from a business’s purview entirely. However, recent developments in the data space such as the astronomical success of dbt have validated the ideas that (i) problems should be broken down into modular, reusable components (a software engineering mindset), and (ii) that code is the best way to express complex analytic logic.

When applied to sustainability analytics, this points to the data warehouse, rather than a specialised platform as the best tool. There is precedent for this kind of thinking in the data warehouse vs CDP debate, as commented on by Fivetran.

But there are more reasons:

  • Consistency: your data warehouse probably already contains some of the data you need, such as shipping details, inventory volumes, SKU-level product specifications, and maybe even business travel logs. Maintaining a single source of truth in the data warehouse encourages consistency, plus it’s more efficient.
  • Data quality and freshness: putting sustainability analytics in the remit of the data warehouse means it’s also in the remit of your data team, who will hopefully apply the same exacting standards that they do for everything else.
  • Downstream use case: it was born to support a variety of downstream use cases. A data warehouse lends itself not just traditional dashboard-based BI, but also a range of applications that can be accomplished with reverse ETL, perhaps even automated carbon credit purchases.
  • Flexibility: business structures vary massively, and SaaS tools cannot always account for that, whereas you can have full control this way. It is also easier to change if a data source changes, or if you want to create custom data retention policies or snapshots.
  • Transparency: it will be much easier to understand and communicate how metrics have been produced if your data team has produced and documented them, even if relying on external frameworks.
  • Ownership: it allows you to own your own data and to use infrastructure you already have.
  • Actionability: it will be much easier to present environmental insights alongside other KPIs in an existing BI platform. This is crucial in granting environmental metrics a ‘seat at the table’ alongside traditional profitability metrics.

All in all, it is a classic use case for the data warehouse: disparate data sources need to be gathered, tamed, and made sense of. Cloud providers such as AWS are now offering their own sustainability monitoring tools and recommendations, indicating momentum in this space, but digital footprint is only one piece of the puzzle.

Digital footprint on sustainability

Why now especially?

The IPCC has emphasised that our climate window of opportunity is narrowing with every increment of warming. Aside from the obvious existential threat: while ESG reporting has so far been mostly voluntary, governments are moving towards mandatory reporting following COP26. In the UK, climate-related financial disclosures will be mandatory for large companies from April 2022, and across the economy by 2025. Even so, annual, manual ESG reporting is not enough to spark the actions needed.

At Infinite Lambda, we are encouraging clients to consider sustainability analytics as part of their data platform strategy. Contact us to discuss implementing sustainability analytics for your organisation.

More on the topic

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.

Apache Airflow start_date and execution_date explained
Airflow start_date and execution_date Explained
Despite Airflow’s popularity in data engineering, the start_date and execution_date concepts remain confusing among many new developers today. This article aims to demystify them. Basic...
June 15, 2022
Breaking Some Myths about the Use of Dual-Track Agile
Bringing both flexibility and transparency, the Dual-Track Agile methodology is increasingly popular. With a growing number of teams that decide to try it out, it...
June 10, 2022
Creating a PostgreSQL to BigQuery Sync Pipeline Using Debezium and Kafka
Many companies today use different database technologies for their application and their data platform. This creates the challenge of enabling analytics on application data without...
June 1, 2022
How to Apply Dual-Track Agile in Practice
This article is a part of a series on the Dual-Track model. Here, I am going to share with you 5 rules on how to...
May 17, 2022
Challenges of Using Dual-Track Agile and How to Handle Them
Welcome to Part II of the Infinite Lambda’s blog series on Dual-Track Agile. You might want to check Part I that explains what this model...
April 15, 2022
What Is Dual-Track Agile and What Are the Benefits of Using It?
Many organisations today struggle to build products and features that their customers would actually need and use. In the product space, there are a myriad...
March 15, 2022

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.

Optimized by Seraphinite Accelerator
Turns on site high speed to be attractive for people and search engines.