Skip to main content

Iceberg

Experimental
Creates:
AssetsLineage

Configure in the UI

This plugin can be configured directly in the Marmot UI with a step-by-step wizard.

View Guide

The Iceberg plugin discovers namespaces, tables and views from Iceberg catalogs. It supports both REST catalogs and AWS Glue Data Catalog as backends.

AWS Glue Catalog Permissions

When using catalog_type: "glue", the following IAM permissions are required:

The s3:GetObject permission is needed because Glue's LoadTable reads Iceberg metadata files from S3.

Example Configuration


# REST catalog (default)
uri: "http://localhost:8181"
warehouse: "my-warehouse"
credential: "client-id:client-secret"
tags:
- "iceberg"

# Glue catalog:
# catalog_type: "glue"
# credentials:
# region: "us-east-1"
# glue_catalog_id: "123456789012" # optional, defaults to caller's account

Configuration

The following configuration options are available:

PropertyTypeRequiredDescription
catalog_typestringfalseCatalog backend type
credentialstringfalseCredential for OAuth2 client credentials authentication
credentialsAWSCredentialsfalseAWS credentials configuration
external_links[]ExternalLinkfalseExternal links to show on all assets
filterFilterfalseFilter discovered assets by name (regex)
glue_catalog_idstringfalseAWS Glue Data Catalog ID (defaults to caller's account)
include_namespacesboolfalseWhether to discover namespaces as assets
include_tags[]stringfalseList of AWS tags to include as metadata. By default, all tags are included.
include_viewsboolfalseWhether to discover views
prefixstringfalseOptional prefix for the REST catalog
propertiesmap[string]stringfalseAdditional catalog properties
tagsTagsConfigfalseTags to apply to discovered assets
tags_to_metadataboolfalseConvert AWS tags to Marmot metadata
tokenstringfalseBearer token for authentication
uristringfalseREST catalog URI (required for catalog_type=rest)
warehousestringfalseWarehouse identifier

Available Metadata

The following metadata fields are available:

FieldTypeDescription
current_snapshot_idstringCurrent snapshot ID
format_versionintIceberg format version (1, 2, or 3)
format_versionintView format version
last_updated_msint64Last update timestamp in milliseconds
locationstringTable data location
locationstringDefault location for tables
locationstringView metadata location
namespacestringNamespace path
partition_specstringPartition specification
schema_field_countintNumber of schema fields
schema_field_countintNumber of schema fields
snapshot_countintNumber of snapshots
sort_orderstringSort order specification
sqlstringSQL definition of the view
sql_dialectstringSQL dialect of the view definition
table_uuidstringTable UUID
total_data_filesstringTotal data file count
total_file_sizestringTotal file size in bytes
total_recordsstringTotal record count
view_uuidstringView UUID