Data Warehouse Migration and Data Science Experiments

Reducing Analytics Platform Costs by 70%

Industry: AdTech

CASE STUDY / 2021

Client and context

This case study is about a successful UK-based AdTech that enables over a hundred B2B retailers to understand and communicate with their customers better.

The company provides a platform that creates individually-tailored customer experiences across all channels using an AI-powered intelligence layer. This empowers retailers to raise brand awareness as well as to engage the customers who visit their online shops.

With a plan to expand the customer base, it was becoming obvious that their existing technology stack would not scale with the increasing data volume.

Infinite Lambda helped them migrate to a new data warehouse that reduced their analytics platform cost by 70%. This new architecture was not only more cost-efficient but flexible such that it allowed us to integrate further data science solutions seamlessly.

“It was exciting to work for a data-driven company with a clear innovative view on AI-based technology. We delivered the Snowflake PoC and empowered them with artefacts, codes and clear instruction, which allowed the team to dive-in immediately.”

Petyo Pahunchev / Project Lead

The Challenge

The original infrastructure with clustered PostgreSQL databases was experiencing performance and scaling issues, accompanied with high cost and operational overheads.


With the outlook of expanding the client base that will lead to data volume increase, they needed a scalable and cost-effective architecture.

The tech team decided to separate their Analytical database and Transactional database. While keeping the Transactional database on PostgreSQL, they wanted to migrate their Analytical database to Snowflake. Therefore, a proof of concept was needed to evaluate the feasibility and suitability. Additionally, they wanted to experiment with more machine learning and data science projects and required expertise.

ometria_challenge

The Solution

We conducted a full Snowflake evaluation report that first defined the evaluation goals, such as performance, computing scalability, cost of running the ingestion pipeline and the complex platform queries and reports. As a multi-cluster warehouse, Snowflake can sustain an extortionate level of concurrent users with acceptable performance characteristics. We found that in this specific case, a bigger warehouse would provide a better performance-cost ratio considering the expansion and the increased concurrency levels that come with it.

While evaluating Snowflake, we were building out the architecture which includes the ingestion pipeline and data transformation. We changed the technology stack to support the concurrent usage of Snowflake and PostgreSQL without disruptions when running simultaneously.

Meanwhile, we helped the tech team to work on their machine learning and data science projects which would be integrated into the new data architecture with Snowflake. We successfully delivered three data science projects focusing on taste profiling, product replenishment rates, and image-based product labelling.

So What?

The Snowflake evaluation report gave a clear indication of the performance expectation and the tools to use in the future when changing and/or scaling. This way, the company could easily move higher tier customers’ data to the data warehouse for better performance predictability.

We helped them to optimise the expenditure on the cloud and migrated them to newer technologies that require less operations overhead. In fact, our work has helped them to reduce platform costs by 70% per month.

The scalable architecture now allows them to tackle the expansion of their customer base without worrying about the increasing volume of data. Moreover, the data science projects we worked on can be integrated into this architecture seamlessly.

Digging Deeper... with Data Engineering

Data Engineering Problem!

“Data was stored in multiple PostgreSQL databases which needed to be migrated into Snowflake to store in one single data warehouse.”

Infinite Lambda Solutions!

“We built a data pipeline that streams data into Snowflake. We used AWS Kinesis firehose to stream raw data in JSON format and saved it in the S3 bucket. Then we performed data cleansing and transformation with DBT on a staging layer in Snowflake. We created a single source of truth in Snowflake, which would enable data users, such as a BI analyst, to fetch data more easily.”

Berta Vincze, Data Engineer

Killer Feature​

“The refactoring of a segmentation library that served as JSON to SQL translator. It had to be tested with thousands of JSON strings to ensure the tool translated SQL of both PostgreSQL and Snowflake correctly.

I refactored it in a way that was compatible not only with the original PostgreSQL database setup but also with the new Snowflake data warehouse.”

Berta Vincze, Data Engineer

Digging Deeper... into DevOps

DevOps Problem!

Data needed to be transformed into more structured data and then loaded into the database for storing and computing.

Infinite Lambda Solutions!

“We helped to set up AWS services that can load and save data, and the orchestration system Apache Airflow with EKS. To ensure robustness, we created security measures to make sure data is stored safely and create backups for worst case scenarios.”

Peter Orliczki, DevOps Engineer

Digging Deeper... into Data Science

Nikolay Petrov

Data Science Problem!

“We employed machine learning to explore the potential use cases for the development of the image-based product labelling tool.

While exploring the large catalogue of product images with attributes, we realised that images were hosted on different platforms and in some cases unavailable or partially available. As data was not standardised, we had to assess the product attributes, the availability and the quality of the provided image.”

Infinite Lambda Solutions!

“We first collected the latest image data (products and attributes) from Snowflake. We downloaded all the images and saved in S3 buckets for express retrieval upon training and saved spaces on the ML compute instance. We then trained different ML models.

The test server I built presented the accuracy results of the trained models as well as an interactive interface by which users can manually label random images from the image data set, and compare the accuracy to those manual labels.”

Further Data Science experiments!

“We helped the tech team work on a predictive engine to identify the right time to send product replenishment and ‘Taste profiling’ which is a smart recommendation engine. In both cases we experimented with different ML and algorithms to get the best results. We handed over the developed and validated ML algorithms to the tech team, so they could train their own models.”

Nikolay Petrov, Data Science / Engineering

Killer Feature​

“The initial goal of training on custom labels required additional manual work to craft such labels. Instead, we delivered a powerful tool that enabled the tech team to understand and further enrich the data they have at hand.”

Nikolay Petrov, Data Science / Engineering

Llama Learning

“The conduction of the Snowflake evaluation report has accelerated our Snowflake expertise. Beside our skills in working with Snowflake, we are now equipped with the knowledge of Snowflake’s performance and scalability based on different usage scenarios, which allows us to give more accurate estimates to companies that want to work with Snowflake. Moreover, the team benefited from working with Helm in DevOps, which only a few members would use previously. Helm has now become an essential part of our DevOps work.”

Petyo Pahunchev, Project Lead

Reflections

“Personally, I really liked that project, because it helped me to build up new skills. It was my first project working alongside our DevOps Lead and I benefited a lot from the knowledge transfer. The objectives and the architecture of the project were very clear and we created an open source template of Apache Airflow on EKS setup, which is now used as a foundation for similar projects. ”

Peter Orliczki

“The project was interesting, engaging and challenging with much unknowns at the beginning and various pathways to reach for the goal. It was genuinely a constructive and a positive experience.”

Nikolay Petrov

“This project has significantly contributed to my development.

It was the first project where I could put my Snowflake knowledge from theory into practice. I learned more about the Snowflake SQL language and how to transform data with DBT. Moreover, I brought my Python and SQL skills to the next level.”

Berta Vincze

And Finally…

Thanks to our client, our team and all of our technology partners who make all this possible.

If you want to find out more about what we do, life at Infinite Lambda, or how we could help you, your company, your government, your planet or your universe… just drop us a line and we can engineer the data in your world.

Blog Picks