...

Our Looker Power Tools: Automatic Content Validation and Dashboard Recovery

William Horel
May 18, 2020
Read: 4 min

If your organisation is using Looker, great – you’re probably on your way to higher data literacy across teams and a more data-driven culture. But as uptake increases and more users start producing their own content, modifying dashboards, and adding or deleting views and models, you may also have run into two common issues:

  • a dashboard or tile broke and you can’t figure out why
  • something important accidentally got deleted

Part of the beauty of Looker is that the data team does not – and should not – control and review everything. But you still need to ensure that your coworkers have quick and reliable access to this data, and that you can anticipate and prevent problems before they happen. In this blog we’ll talk about two tools we developed precisely for that purpose.

1. Content Validation

With an increasing number of users, it becomes easy to miss when something goes wrong. For example, a BI developer might get a request to modify a table or create a new one, but after doing so could forget to commit their modifications.
This can result in false data in downstream dashboards, or tiles that just stop working. Developers tend to hear about these issues post-hoc, when a business user has already complained about missing or misleading data.
Looker does provide its own solution to this, which is the Content Validator. This is pretty handy when you want to check the impact of some specific modifications, but you need to remember to regularly check this tool whilst on the correct branch. As content multiplies, this becomes unscalable and unreliable.

Our solution

We created a tool that will automatically run the Content Validator and extract any errors on a daily basis and send the results through the messaging platform of your choice on a regular basis.

How?

First, a Python script connects to the Looker API. Then, it checks for any error reported in a dashboard or a look with the content_validation() function. Looker renders a JSON, with all the details of the error, such as the folder it’s in, the hyperlinks to the dashboards it affects, and a description of the error.
After all the errors are collected, we send chosen details as a report on your messaging platform, such as Slack or email.

Reporting errors via Slack

Your team will receive an instant message and will be able to detect and correct errors faster.
The error checker can run at regular intervals of your choice but the beginning or end of the workday tends to work best.

2. Dashboard recovery

Mistakes happen, and someone could have deleted a dashboard that they thought wouldn’t be needed in the future. This can also happen if you have deleted rather than disabled the account of a user who has left your organization, or who no longer needed Looker access, but had created resources that others still rely on.
Unfortunately, Looker doesn’t currently allow erased dashboards to be retrieved.

Our solution

To remedy this, we created a way of storing every dashboard (and its tiles) in Github.

How?

Each day, a python job connects to the Looker API, and pulls details about all dashboards and relevant metadata like the dashboard owner’s full name, title, dashboard ID, and so on. This information for any new or modified dashboard is then added to a Github repository made for this purpose. When a dashboard is accidentally deleted, it can be restored using the backup file from Github.

GitHub backup file

What to do with a Dashboard.json?

If there were any unexpected deletions, you can upload your dashboard with Gazer. After installing Ruby, add the Gazer gem.

Get your access keys to Looker API through the User page on Admin panel. First click on “Edit”.

Then select “Edit Keys” on API3 Keys.

API3 keys

Create a file ~/.netrc in your home directory.
Paste in the API3 credentials as follows:

[php]
machine foo.bar.mycompany.com
login AbCdEfGhIjKlMnOp
password QrStUvWxYz1234567890
[/php]

The login is your Client ID, and the password is your Client Secret. Once you have it, get a clone of your Dashboard.json file on your local machine, and run the following command, where 1 is the number of the folder where you want the dashboard.

[php]

gzr dashboard import Dashboard.json 1 –host foo.bar.mycompany.com

[/php]

What else?

The approaches described above can be expanded to a variety of use cases. For instance, as well as checking for errors, we can set up automatic checks for any soft-deleted dashboards, Looks or folders, with the functions search_dashboards(), search_folders(), or search_looks(), and send this list to your chosen messaging channel, so you won’t be surprised at the last minute.

Alternatively, to get a detailed breakdown of Looker users’ activity, we can also use the python script to check for activities like logins, creations and queries, and send it as a regular report or compile it into a tile or dashboard. Looker also provides some out of the box activity dashboards, read more about them on our blog.

Thank you, @Nina, for proofreading and upgrading this article!

More on the topic

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.

Apache Airflow start_date and execution_date explained
Airflow start_date and execution_date Explained
Despite Airflow’s popularity in data engineering, the start_date and execution_date concepts remain confusing among many new developers today. This article aims to demystify them. Basic...
June 15, 2022
Breaking Some Myths about the Use of Dual-Track Agile
Bringing both flexibility and transparency, the Dual-Track Agile methodology is increasingly popular. With a growing number of teams that decide to try it out, it...
June 10, 2022
Creating a PostgreSQL to BigQuery Sync Pipeline Using Debezium and Kafka
Many companies today use different database technologies for their application and their data platform. This creates the challenge of enabling analytics on application data without...
June 1, 2022
How to Apply Dual-Track Agile in Practice
This article is a part of a series on the Dual-Track model. Here, I am going to share with you 5 rules on how to...
May 17, 2022
Challenges of Using Dual-Track Agile and How to Handle Them
Welcome to Part II of the Infinite Lambda’s blog series on Dual-Track Agile. You might want to check Part I that explains what this model...
April 15, 2022
Sustainability: the Last Frontier in Business Intelligence
The power of the modern data stack in generating actionable insights out of disparate data is well documented. It’s time to apply this to sustainability....
April 1, 2022

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.

Optimized by Seraphinite Accelerator
Turns on site high speed to be attractive for people and search engines.