Our work with Moshi – a mindfulness app for children – helping them to engineer their data to support their mission of enhancing childrens’ wellbeing.
According to the National Sleep Foundation, lack of sleep in children can lead to impaired learning, emotional distress and health issues such as obesity. Moreover, it can be distressing for parents. In times of uncertainty, learning how to be mindful at an early age is seen as an incredibly valuable skill to foster.
Moshi has created an app that brings mindfulness and meditative techniques to children. Not only can this help them sleep, but it can teach them the basics of mindfulness and seed habits that build early resilience.
As the volume and complexity of data increased, the Moshi team realised they needed a more reliable and sustainable solution. In this article, you will learn how Infinite Lambda set up a modern data stack from scratch (Fivetran, Snowflake, Looker) and work with cutting edge tools such as Airflow and DBT to automise data pipelines.
Like many organisations, Moshi did not have a data analytics platform in place. KPI reporting was done manually and managed by one team member on Excel. A familiar and understandable story for fast growing companies. The reporting comprised data from third party providers which arrived at different times and in a different format. To aggregate the data, each third party dataset had to be downloaded and then added into one master Excel spreadsheet manually. This process was time consuming and more exposed to human error. Moreover, due to data latency, there were differences in reporting figures.
We brought in our expert team to set up a modern data stack from scratch. Working with multiple technologies, including our partners Fivetran, Snowflake and Looker, we automated the process from data ingestion to final reporting. We built a customised Looker dashboard that mirrored the original KPI report and automated the data pipelines to ensure a seamless reporting.
Our work enabled the Moshi team to build automated near real-time reporting and more analytics capability that give them a better understanding of the customer journey. The new customer insights can help with further product development and new creating features that can lead to more children benefiting from their mindfulness app, giving them a better night sleep!
“Infinite Lambda enabled us to accelerate our data projects and achieve our goals in rapid time. Delivering us not only an end to end data pipeline but also detailed business KPI reports. They have provided much-needed capacity, knowledge and have listened and adapted to our needs. I would highly recommend.”
— Ian Trayler, CTO at Moshi
With no data analytics platform in place, we first drafted the architecture before we built. To significantly reduce build time, we decided to use Fivetran as a tool that can robustly ingest data from hundreds of data sources with minimal operational overheads. However, during the process, we noticed that for two data sources there are no direct Fivetran connectors available. As they play a vital role in the reporting, we created customised connectors. Our two Data Engineers Zoltan and Peter worked on the architecture setup. Here is what they shared with us.
Infinite Lambda Solutions!
“We built an end-to-end data stack from scratch. First, we created a data lake with two different layers that store raw and quality data. That way, we can pull the data only that is relevant for the KPI report. We then implemented DBT for data transformation that gives the Moshi team a lineage to understand the data transformation process. Throughout the project, we worked closely with our BI team to replicate the original excel KPI report on Looker.
Moreover, we deployed Airflow with Kubernetes and Docker as an infrastructure on AWS and automated data ingestion utilising Airflow. Furthermore, we set up manual deployment with Docker, EKS and ECR. We then developed custom connectors for 2 data sources that did not have direct Fivetran connectors.” – Zoltán Csonka
Infinite Lambda Solutions!
“We built two customised connectors that were not available on Fivetran. Since the two main data sources have a large dataset, we made sure to only extract the most important data to their KPI report in order to avoid redundant data that create unnecessary cost and confusion. We managed to ingest 30 days rolling data to identify deleted records which could not be done with Fivertran or the data pipeline alone which helped the Moshi team to work out close estimates of missing data – delayed due to 3rd party.” – Peter Osztodi
The deployment of DBT has helped the Moshi team to get an overview of data lineage and understand the data transformation process.
Thanks to a combination of out-of-the-box Fivetran connectors and custom data pipelines for other data sources we instrumented through Airflow, the Moshi team can now not only extract the most important data out of a large datasets without any delays but is also able to work out close estimates of missing data immediately.
The Analytics Expert at Moshi would normally spend hours every day pulling data manually from different data sources to create a KPI report which would be further distributed to different team members. The KPI reporting process was fairly manual and it was time consuming to create a report. All data was aggregated in excel by downloading csv files from each data source. Moreover, as data volume and complexity increased, they needed a more robust and sustainable process.
“We analysed the original excel report to get a full understanding of the data and how the KPIs are calculated. Once we could determine all the data sources, we used Fivetran and Python scripting to ingest them into Snowflake and made sure it matches with the data in excel. Then we created a customised Looker dashboard and to mirror the original excel report template.” – Mincho Ganchev
“We deployed Airflow to automatically run the scripts to fetch data from two APIs and to run dbt models that transform the data in Snowflake every morning. After all the setup, data pipelines can now be scheduled and automated to ingest into Snowflake. The next step is to use DBT to transform the data to make it more analysis-friendly. It included cleaning and structuring such as creating unique identifiers, removing duplicates, creating data models and aggregated metrics.” – Irina Veleva
In order to enable behavioral analysis, instant raw data is needed. However, third party data comes in different formats and with latency.
“Datasets from third party providers always arrive a few days later which is difficult to do analysis. We helped the Moshi team to inject data from their internal database into Snowflake where available data can be displayed instantly. We further built out the CI/CD pipelines from ingestion to transformation and applied automation, so a report in Looker will be instantly generated once data is refreshed. All end-users will be informed via email once the report is ready.” – Irina Veleva
Everyone working in BI knows that it can be challenging to create an exact replica of the original excel report in a BI dashboard. We did not only create a replica successfully without changing the report structure but also managed to aggregate all data from multiple excel sheets into one single Looker dashboard.
Data pipeline automation. It normally took hours for an analyst to pull the reports. Now data is automatically ingested from multiple first and third party data sources, tested, validated and integrated. This not only allows the growing analytics team at Moshi to focus on more important tasks but also makes data more easily accessible to others within the organisation.
At Infinite Lambda we always make sure that our solutions are flexible and scalable, making data easily accessible and enabling in-depth analytics. We managed to deploy an end-to-end integrated custom analytics platform with automated data ingestion in a short-time period. We refined our DataOps work and set-up a robust foundation for near real-time reporting. We have advanced our knowledge in working with Airflow and DBT and are excited to apply this expertise to other projects where automation and data transformation are significant for the workflow.
“It was my first project as a Project Lead and it was great to apply my knowledge in data warehousing and data modelling on this project. It was rewarding to see the progress of transforming the database from excel into a modern data stack. I really enjoyed accompanying the Moshi team in their data journey and it’s very rewarding that our work has impacted their way of work, sharing the joy in working with modern technologies.”
“It was great to lay the groundwork for the project on my own in writing all the codes and setting up the environment. I also learnt new Airflow functionalities while supporting my team in building an Airflow infrastructure from scratch.”
“It was my first project working with the whole team and I really enjoyed the collaboration and learnt a lot from different chapters. It gave me the opportunity to deepen my knowledge in dbt and in return I could contribute with my SQL and data modelling expertise.”
“I was very excited to work with Moshi, helping them to transform their way of work with data. I had the chance to foster my knowledge in Airflow which was a crucial tool for automation. The positive feedback from the Moshi team made me feel proud of what I’m doing and helped me to become more self confident.”
Thanks to our partner, our team and all of our technology partners who make all this possible.
If you want to find out more about what we do, life at Infinite Lambda, or how we could help you, your company, your government, your planet or your universe… just drop us a line and we can engineer the data in your world.