CLI
The ingest command discovers metadata from configured data sources and catalogs them as assets in Marmot. It supports multiple data sources, can establish lineage relationships between assets and can attach documentation to assets.
Installation
curl -fsSL get.marmotdata.io | sh
Download the latest binary for your platform from GitHub Releases, then:
chmod +x marmot && sudo mv marmot /usr/local/bin/
See the CLI Reference for configuring the host, API key and other global options.
Configuration File
The ingest command requires a YAML configuration file that defines the data sources to ingest. The configuration follows this structure:
name: my_pipeline_name
runs:
- source_type1:
# source-specific configuration
- source_type2:
# source-specific configuration
Where source_type is one of the supported data source types. You can find all available source types and their configuration in the Plugins documentation.
Give your pipeline a unique name. This is used to track the state of the ingestion.
Example: Ingesting Kafka Topics
runs:
- kafka:
bootstrap_servers: "kafka-broker:9092"
client_id: "marmot-kafka-plugin"
client_timeout_seconds: 60
authentication:
type: "sasl_plaintext"
username: "username"
password: "password"
mechanism: "PLAIN"
schema_registry:
url: "http://schema-registry:8081"
enabled: true
config:
basic.auth.user.info: "username:password"
This configuration connects to a Kafka broker at kafka-broker:9092 with SASL PLAIN authentication and integrates with a Schema Registry at http://schema-registry:8081.
marmot ingest -c config.yaml