Version: 0.9

Glue

Experimental

Creates:

Assets

Configure in the UI

This plugin can be configured directly in the Marmot UI with a step-by-step wizard.

The Glue plugin discovers and catalogs AWS Glue resources including jobs, databases, tables and crawlers. It captures metadata such as job configurations, table schemas, crawler schedules and database properties. Iceberg-managed tables are automatically skipped (use the dedicated Iceberg plugin instead).

Required Permissions

AWS Configuration

See AWS Configuration for the supported AWS configuration options.

Example Configuration

credentials:
  region: "us-east-1"
  profile: "production"
  role: "<role>"
tags:
  - "aws"
discover_jobs: true
discover_databases: true
discover_tables: true
discover_crawlers: true

Configuration

The following configuration options are available:

Property	Type	Required	Description
credentials	AWSCredentials	false	AWS credentials configuration
discover_crawlers	bool	false	Whether to discover Glue crawlers
discover_databases	bool	false	Whether to discover Glue databases
discover_jobs	bool	false	Whether to discover Glue jobs
discover_tables	bool	false	Whether to discover Glue tables
external_links	[]ExternalLink	false	External links to show on all assets
filter	Filter	false	Filter discovered assets by name (regex)
include_tags	[]string	false	List of AWS tags to include as metadata. By default, all tags are included.
tags	TagsConfig	false	Tags to apply to discovered assets
tags_to_metadata	bool	false	Convert AWS tags to Marmot metadata

Available Metadata

The following metadata fields are available:

Field	Type	Description
catalog_id	string	ID of the Data Catalog
classification	string	Classification of the table data (csv, parquet, json, etc.)
classifiers	string	Custom classifiers used by the crawler
connections	string	Connections used by the job
create_time	string	Date and time the database was created
create_time	string	Date and time the table was created
created_on	string	Date and time the job was created
creation_time	string	Date and time the crawler was created
database_name	string	Target database for the crawler
database_name	string	Name of the database containing the table
description	string	Description of the database
glue_version	string	Glue version used by the job
input_format	string	Hadoop input format class
last_crawl_error	string	Error message from the last crawl
last_crawl_status	string	Status of the last crawl
last_crawl_time	string	Start time of the last crawl
last_modified_on	string	Date and time the job was last modified
last_updated	string	Date and time the crawler was last updated
location	string	S3 location of the table data
location_uri	string	Location of the database
max_capacity	float64	Maximum number of DPU that can be allocated
max_retries	int	Maximum number of retries
number_of_workers	int32	Number of workers allocated to the job
output_format	string	Hadoop output format class
owner	string	Owner of the table
parameters	string	Database parameters
partition_keys	string	Partition key columns
recrawl_behavior	string	Recrawl behavior policy
retention	int32	Retention period in days
role	string	IAM role ARN assigned to the job
role	string	IAM role ARN assigned to the crawler
schedule	string	Cron schedule expression
schema_delete_behavior	string	Behavior when schema objects are deleted
schema_update_behavior	string	Behavior when schema changes are detected
script_location	string	S3 location of the job script
security_configuration	string	Security configuration applied to the job
serde	string	Serialization/deserialization library
state	string	Current state of the crawler (READY, RUNNING, STOPPING)
table_type	string	Type of table (EXTERNAL_TABLE, VIRTUAL_VIEW, etc.)
targets	string	Summary of crawler targets
timeout	int32	Job timeout in minutes
type	string	Job command type (glueetl, pythonshell, gluestreaming)
update_time	string	Date and time the table was last updated
worker_type	string	Worker type (Standard, G.1X, G.2X, etc.)

Configure in the UI

Required Permissions​

AWS Configuration​

Example Configuration​

Configuration​

Available Metadata​

Required Permissions

AWS Configuration

Example Configuration

Configuration

Available Metadata