Skip to main content

CLI

The ingest command discovers metadata from configured data sources and catalogs them as assets in Marmot. It supports multiple data sources, can establish lineage relationships between assets and can attach documentation to assets.

Installation

curl -fsSL get.marmotdata.io | sh

See the CLI Reference for configuring the host, API key and other global options.

Configuration File

The ingest command requires a YAML configuration file that defines the data sources to ingest. The configuration follows this structure:

name: my_pipeline_name
runs:
- source_type1:
# source-specific configuration
- source_type2:
# source-specific configuration

Where source_type is one of the supported data source types. You can find all available source types and their configuration in the Plugins documentation.

Pipeline Names

Give your pipeline a unique name. This is used to track the state of the ingestion.

Example: Ingesting Kafka Topics

runs:
- kafka:
bootstrap_servers: "kafka-broker:9092"
client_id: "marmot-kafka-plugin"
client_timeout_seconds: 60
authentication:
type: "sasl_plaintext"
username: "username"
password: "password"
mechanism: "PLAIN"
schema_registry:
url: "http://schema-registry:8081"
enabled: true
config:
basic.auth.user.info: "username:password"

This configuration connects to a Kafka broker at kafka-broker:9092 with SASL PLAIN authentication and integrates with a Schema Registry at http://schema-registry:8081.

marmot ingest -c config.yaml