Skip to main content

Airflow

Experimental
Creates:
AssetsLineageRun History

Configure in the UI

This plugin can be configured directly in the Marmot UI with a step-by-step wizard.

View Guide

The Airflow plugin ingests metadata from Apache Airflow, including DAGs (Directed Acyclic Graphs), tasks, and dataset lineage. It connects to Airflow's REST API to discover your orchestration layer and track data dependencies through Airflow's native Dataset feature.

Prerequisites

  • Airflow 2.0+ for basic DAG and task discovery
  • Airflow 2.4+ for Dataset-based lineage tracking
  • REST API enabled with authentication configured
Authentication

The plugin supports two authentication methods:

  • Basic Auth: Username and password
  • API Token: For token-based authentication

Configure authentication in your Airflow instance via airflow.cfg:

[api]
auth_backends = airflow.api.auth.backend.basic_auth

Example Configuration


host: "http://localhost:8080"
username: "admin"
password: "${AIRFLOW_PASSWORD}"
discover_dags: true
discover_tasks: true
discover_datasets: true
include_run_history: true
run_history_days: 7
only_active: true
dag_filter:
include:
- "^analytics_.*"
exclude:
- ".*_test$"
tags:
- "airflow"
- "orchestration"

Configuration

The following configuration options are available:

PropertyTypeRequiredDescription
api_tokenstringfalseAPI token for authentication (alternative to basic auth)
dag_filterplugin.FilterfalseFilter DAGs by ID pattern (include/exclude regex)
discover_dagsboolfalseDiscover Airflow DAGs as Pipeline assets
discover_datasetsboolfalseDiscover Airflow Datasets for lineage (requires Airflow 2.4+)
discover_tasksboolfalseDiscover tasks within DAGs
external_links[]ExternalLinkfalseExternal links to show on all assets
hoststringfalseAirflow webserver URL (e.g., http://localhost:8080)
include_run_historyboolfalseInclude DAG run history in metadata
only_activeboolfalseOnly discover active (unpaused) DAGs
passwordstringfalsePassword for basic authentication
run_history_daysintfalseNumber of days of run history to fetch
tagsTagsConfigfalseTags to apply to discovered assets
usernamestringfalseUsername for basic authentication

Available Metadata

The following metadata fields are available:

FieldTypeDescription
consumer_countintNumber of DAGs that consume this dataset
created_atstringDataset creation timestamp
dag_idstringUnique DAG identifier
dag_idstringParent DAG ID
dag_run_idstringUnique identifier for the DAG run
descriptionstringDAG description
downstream_tasks[]stringList of downstream task IDs
end_datestringEnd time of the run
execution_datestringLogical execution date
file_pathstringPath to DAG definition file
is_activeboolWhether DAG is active
is_pausedboolWhether DAG is paused
last_parsed_timestringLast time the DAG file was parsed
last_run_datestringExecution date of the last DAG run
last_run_idstringID of the last DAG run
last_run_statestringState of the last DAG run (success, failed, running)
next_run_datestringNext scheduled run date
operator_namestringAirflow operator class name (e.g., BashOperator, PythonOperator)
ownersstringDAG owners (comma-separated)
poolstringExecution pool for the task
producer_countintNumber of tasks that produce this dataset
retriesintNumber of retries configured for the task
run_countintNumber of runs in the lookback period
run_typestringType of run (scheduled, manual, backfill)
schedule_intervalstringDAG schedule (cron expression or preset)
start_datestringActual start time of the run
statestringRun state (queued, running, success, failed)
success_ratefloat64Success rate percentage over the lookback period
task_idstringTask identifier within the DAG
trigger_rulestringTask trigger rule (e.g., all_success, one_success)
updated_atstringDataset last update timestamp
uristringDataset URI identifier