...

Reducing Analytics Platform Costs by 70%

We helped the client migrate their data warehouse and leveraged data science to optimise expenditure on the cloud.

Our client is a successful UK-based adtech company that enables over a hundred B2B retailers to understand and communicate with their customers better.

The company provides a platform that creates individually-tailored customer experiences across all channels using an AI-powered intelligence layer. With a plan to expand the customer base, it was becoming obvious that the existing techstack would not scale with the increasing data volume.

Infinite Lambda helped them migrate to a new data warehouse that reduced their analytics platform cost by 70%. This new architecture was more cost-efficient and more flexible, allowing for a seamless integration of data science solutions.

What we did

A single source of truth in Snowflake
Advanced security and backup capabilities
Machine learning and data science

The Challenge

The client’s original infrastructure with clustered PostgreSQL databases had a plethora of performance and scaling issues, along with high cost and operational overheads.

With the outlook of expanding the client base that would lead to data volume increase, they needed a scalable and cost-effective architecture.

Additionally, they wanted to experiment with more machine learning and data science projects and required expertise.

The solution

Snowflake migration

We proposed separating the analytical and transactional databases. keeping the transactional one on PostgreSQL and migrating the analytical one to Snowflake. Our team conducted a full Snowflake evaluation report that first defined the evaluation goals, such as performance, computing scalability, cost of running the ingestion pipeline and the complex platform queries and reports.

As a multi-cluster warehouse, Snowflake could sustain an extortionate level of concurrent users with acceptable performance characteristics. Considering the expansion and the increased concurrency levels that would come with it, we recommended that a bigger warehouse would provide a better performance-cost ratio.

While evaluating Snowflake, we built out the architecture, which would include the ingestion pipeline and data transformation. We selected the technology stack to make sure it would support the concurrent usage of Snowflake and PostgreSQL without disruptions when running simultaneously.

Finally, we helped the client’s tech team with their machine learning and data science projects which would be integrated into the new data architecture with Snowflake. Together, we successfully delivered three data science projects focusing on taste profiling, product replenishment rates and image-based product labelling.

Data engineering

We created a single source of truth in Snowflake, which would enable data users, such as BI analysts, to fetch data far more easily.

Initially, data was stored in multiple PostgreSQL databases which needed to be migrated into Snowflake to store in one single data warehouse, so we built a data pipeline that streams data into Snowflake.

We leveraged AWS Kinesis Firehose to stream raw data in JSON format and saved it in the S3 bucket. We then performed data cleansing and transformation with dbt on a staging layer in Snowflake.

The refactoring of a segmentation library that served as JSON to SQL translator had to be tested with thousands of JSON strings to ensure the tool translated SQL of both PostgreSQL and Snowflake correctly. We refactored it in a way that was compatible not only with the original PostgreSQL database setup but also with the new Snowflake data warehouse.

DevOps

We needed to transform the data and then load it into the database for storing and computing. Our engineers helped to set up AWS services to load and save data, and leveraged the Apache Airflow orchestration system with EKS. To ensure robustness, we introduced security measures and created backups for worst case scenarios.

Data science and machine learning

We employed machine learning to explore the potential use cases for the development of the image-based product labelling tool. While exploring the large catalogue of product images with attributes, we realised that images were hosted on different platforms and, in some cases, were partially or completely available. As data was not standardised, we had to assess the product attributes, the availability and the quality of each provided image.

First, we collected the latest image data (products and attributes) from Snowflake. We downloaded all the images and saved in S3 buckets for express retrieval upon training and saved spaces on the ML compute instance. Then we trained different ML models.

The test server we built presented the accuracy results of the trained models as well as an interactive interface by which users could manually label random images from the image data set, and compare the accuracy to those manual labels.

We helped the tech team build a predictive engine to identify the right time to send product replenishment as well as a smart recommendations engine, experimenting with different ML approaches and algorithms to get the best results.

We handed over the developed and validated ML algorithms to the tech team, so they could train their own models.

The technology we useD

The Result

Curbing platform costs by 70%

We helped the client to optimise the expenditure on the cloud and migrated them to newer technologies that require less operations overhead. In fact, our work has helped them to reduce platform costs by 70% per month.

The Snowflake evaluation report gave a clear indication of the performance expectation and the tools to use in the future when changing and/or scaling. This way, the company could easily move higher tier customers’ data to the data warehouse for better performance predictability.

The scalable architecture now allows them to tackle the expansion of their customer base without worrying about the increasing volume of data. Moreover, the data science projects we worked on could be integrated into this architecture seamlessly.

The initial goal of training on custom labels required additional manual work to craft such labels. Instead, we delivered a powerful tool that enabled the tech team to understand and further enrich the data they have at hand.

don’t wait

Let’s walk the walk together

We are a generation of engineers and technologists who are passionate about transforming organisations with digital-age solutions and seeing them thrive on the cloud.

see Related Stories

We have helped over 50 organisations to deliver projects at different scales with over £100m in ROI.

World Health Organisation
Carbon Analytics Platform for a Vehicle Fleet
The Francis Crick Institute
Scalable Global Trusted Research Environment
Railsr
Robust Data Platform for a Digital Bank
Oddbox
Building a Single Source of Truth
Moshi
Integrated Data Analytics Platform
The Halo Trust
Using Artificial Intelligence to Find the Debris of War

We have helped over 50 organisations to deliver projects at different scales with over £100m in ROI.

BUILDING A DATA VAULT WITH dbt CLOUD

Thursday, 1 June, 2023

9 am EST | 2 pm BST