Airflow start_date and execution_date Explained

Quang Anh
June 15, 2022
Read: 3 min

Despite Airflow’s popularity in data engineering, the start_date and execution_date concepts remain confusing among many new developers today. This article aims to demystify them.

Basic Concept

Airflow is an orchestration tool, which means that with sufficient permission it is capable of controlling other services in a pre-defined order and timing.

Let’s say you want to ingest some data with Fivetran, transform it with dbt, then run a notebook analysis with Databricks. You can program each of these steps as an Airflow task in that particular order and dependency, scheduling them to run as often as you need, and Airflow will execute them as per your instructions.

Needless to say, in the world of data engineering where gigabytes of data are moved at any second, having an orchestration tool like this is critical.

The start_date

Say you go to work on the 1st of January, 2022 (I know, I know, it is New Year’s and nobody should be working but we do not have a labour union for fictional blog characters here). You finish your DAG at 15:00 and you want it to run regularly everyday at midnight. So, you input all of the final settings as below, expecting the pipeline to run immediately after you start:

If you expect the pipeline to run at that time, think again. Airflow starts running tasks for a given interval at the end of the interval itself, so it will not start its first run until after 11:59 pm on 01-01-2022 or midnight on the following day (2nd Jan 2022).

The reason is that if you want to ingest data from the 1st of January 2022 (and before), you will need to wait until the end of the interval (daily) for the data source to have all of the data available from the day before the ingestion starts.

If you want your DAG to run today (1st of Jan in our example), do this:

Or, to be safe, why not go a bit overboard:

Right? Wrong.

By default, Airflow will start any unexecuted DAG with a past start_date. So unless you want to have unnecessary additional runs, do not put your start_date in the past. This behaviour can be disabled by setting catchup=False.

You might be wondering, “Why not automate the start_date as today? We are using the datetime library after all.”

First of all, your today() is not at midnight. It could be at 13:45:32. You’ll never know the exact time of its runs.

Second, this simply will NOT run. In the FAQ here, Airflow strongly recommend against using dynamic start_date. The reason being, as stated above, that Airflow executes the DAG after start_date + interval (daily). Therefore, if start_date is a callable, it will be re-evaluated continuously, moving along with time. The start_date + interval would forever stay in the future.

The execution_date

Another tricky variable is execution_date (if you work with Airflow versions prior to 2.2). Nowadays, we just call it logical_date or ds for short. This is one of the many parameters that you can reference inside your Airflow task.

What do you think, what date will be printed out at the first run?

If your answer is “2022-01-02”, the date of its first run, then you are once again wrong. By definition, Airflow’s logical date points to the start of the interval, not at the moment when the DAG is actually executed. Hence, the correct answer is still “2022-01-01”.


Scheduled DAGs in Airflow always have a date interval, and tasks are run at the end of it. While both start_date and execution_date (or logical_date) point to the beginning of the interval, start_date will be constant for all the runs as defined in the DAG definition. The execution_date, on the other hand, is passed as a parameter to the tasks, with a different value every time the DAG is executed.

If this explanation has been helpful, head to the Infinite Lambda blog for more useful content in the data and cloud space.

More on the topic

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.

Data Validation After Refactoring in Snowflake
Data Validation After Refactoring in Snowflake
Oh, well. Your current model is working as expected. The resulting table or query for reporting has good quality data that has been already validated...
January 24, 2023
speed up Terraform
How to Speed Up Terraform in CI/CD Pipelines
In this series of blog posts, we are going to offer various tips and tricks to speed up Terraform in CI/CD pipelines. In the first...
January 20, 2023
Infinite Lambda is now a Snowflake Elite Services Partner
Infinite Lambda Named Snowflake Elite Services Partner
Infinite Lambda has been accredited with Snowflake Elite Services Partnership. We are beyond thrilled to share the news as this recognition attests to our extensive...
December 8, 2022
How to Provide Platform-Specific Interface for Kotlin Multiplatform Code
This blog post looks at interfacing with platform-specific code using Kotlin Multiplatform. This is Part 2 of a series, so make sure to check out...
December 2, 2022
event-driven architectures for data-driven apps
Make Data-Driven Apps with Event-Driven Architectures
The rise of cloud computing and cloud-native technologies enabled the emergence of new age companies. A digital-native breed of businesses that truly operate 24/7 across...
November 23, 2022
Fivetran Regional Innovation Partner of the Year for EMEA 2022
Infinite Lambda Named Fivetran Regional Innovation Partner of the Year for EMEA
We are thrilled to announce that we have been named Fivetran Regional Innovation Partner of the Year for EMEA. We are twice as happy to...
October 20, 2022

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.