Skip to main content

BigQuery

This plugin discovers datasets and tables from Google BigQuery projects.

Status: experimental

Example Configuration


project_id: "company-data-warehouse"
credentials_path: "/etc/marmot/bq-service-account.json"
tags:
- "bigquery"
- "data-warehouse"

Configuration

The following configuration options are available:

PropertyTypeRequiredDescription
project_idstringfalseGoogle Cloud Project ID
credentials_pathstringfalsePath to service account credentials JSON file
credentials_jsonstringfalseService account credentials JSON content
use_default_credentialsboolfalseUse default Google Cloud credentials
include_datasetsboolfalseWhether to discover datasets
include_table_statsboolfalseWhether to include table statistics (row count, size)
include_viewsboolfalseWhether to discover views
include_external_tablesboolfalseWhether to discover external tables
dataset_filterplugin.FilterfalseFilter configuration for datasets
table_filterplugin.FilterfalseFilter configuration for tables
exclude_system_datasetsboolfalseWhether to exclude system datasets (_script, _analytics, etc.)
max_concurrent_requestsintfalseMaximum number of concurrent API requests

Available Metadata

The following metadata fields are available:

FieldTypeDescription
access_entries_countintNumber of access control entries
clustering_fields[]stringClustering fields
creation_timestringDataset creation timestamp
creation_timestringTable creation timestamp
dataset_idstringDataset ID
dataset_idstringDataset ID
default_partition_expirationstringDefault partition expiration duration
default_table_expirationstringDefault table expiration duration
descriptionstringDataset description
descriptionstringColumn description
descriptionstringTable description
expiration_timestringTable expiration timestamp
external_data_configmap[string]interfaceExternal data configuration for external tables
labelsmap[string]stringDataset labels
labelsmap[string]stringTable labels
last_modifiedstringLast modification timestamp
last_modifiedstringLast modification timestamp
locationstringGeographic location of the dataset
namestringColumn name
nested_fields[]map[string]interfaceNested fields for RECORD type columns
num_bytesint64Size of the table in bytes
num_rowsuint64Number of rows in the table
partition_expirationstringPartition expiration duration
project_idstringGoogle Cloud Project ID
project_idstringGoogle Cloud Project ID
range_partitioning_fieldstringRange partitioning field
source_formatstringSource data format (CSV, JSON, AVRO, etc.)
source_uris[]stringSource URIs for external data
table_idstringTable ID
table_typestringTable type (TABLE, VIEW, EXTERNAL)
time_partitioning_fieldstringTime partitioning field
time_partitioning_typestringTime partitioning type
typestringColumn data type
view_querystringSQL query for views