Skip to main content

OpenLineage

OpenLineage is an open standard for data lineage collection and analysis. It provides a unified way to track data flows across different tools and platforms by emitting standardised events during job execution.

Marmot integrates with OpenLineage to automatically discover assets and lineage relationships from your data pipelines, eliminating manual catalog maintenance.

What is OpenLineage?

OpenLineage is a vendor-neutral, open standard for lineage metadata collection. It captures how data flows through your systems without locking you into a specific tool.

Learn More
Compatibility

OpenLineage support in Marmot is still experimental and has not been tested with all sources. Please report any issues you encounter on GitHub.

What You Get

Automatic Asset Discovery

Jobs, tables, files and topics are added to your catalog as they run.

Lineage Relationships

See how data flows between assets with upstream and downstream connections.

Run History

Track execution status, timing and data volumes for every pipeline run.

Stub Assets

Lineage is captured even for undocumented datasets without polluting your catalog.

Supported Asset Types

Marmot maps OpenLineage events to specific asset types:

Asset TypeDescription
DAGAirflow workflows
TaskIndividual Airflow tasks
ModelDBT models
ProjectDBT projects
TableDatabase tables
FileData files
TopicKafka topics

Authentication

By default, the OpenLineage endpoint requires authentication via an API key. You can disable authentication for trusted environments if needed.

Generate API Key

  1. Navigate to ProfileAPI Keys
  2. Click New Key
  3. Copy the generated key
  4. Configure your OpenLineage producer

Endpoint URL

POST /api/v1/lineage
Authorization: X-API-Key <your-api-key>

Disable Authentication

To disable authentication for the OpenLineage endpoint, set the following configuration:

Config file

openlineage:
auth:
enabled: false

Environment variable:

export MARMOT_OPENLINEAGE_AUTH_ENABLED=false
warning

Disabling authentication allows anyone to send lineage events to your Marmot instance. Only use this in trusted environments.

Configuration Examples

Airflow

Configure the OpenLineage provider in airflow.cfg:

[openlineage]
transport = http
url = https://your-marmot-instance.com/api/v1/lineage
api_key = your-api-key

DBT

Add to your profiles.yml:

your_profile:
outputs:
prod:
# ... your connection details
vars:
openlineage:
url: https://your-marmot-instance.com/api/v1/lineage
api_key: your-api-key

Spark

Set environment variables:

export OPENLINEAGE_URL=https://your-marmot-instance.com/api/v1/lineage
export OPENLINEAGE_API_KEY=your-api-key

OpenLineage Documentation

For detailed configuration options and supported integrations, see the official OpenLineage documentation.

View Docs