Skip to main content
Version: Preview

LangChain

The LangChain integration ships in the Python and TypeScript SDKs. It has two halves:

  1. catalog_tools(client) returns a list of LangChain tools (search_catalog, get_asset, lookup_asset, get_upstream_lineage) bound to your Marmot client. Drop them into any agent.
  2. MarmotCallbackHandler registers the agent on first run as an asset of type Agent, captures every tool call and writes one batched lineage edge per upstream when the run ends.

Install

pip install "marmot-sdk[langchain]"

The langchain extra adds langchain-core. The agent runtime and model providers are up to you.

Quick start

A minimal agent that searches the catalog, registers itself and writes lineage:

import marmot
from marmot.integrations.langchain import MarmotCallbackHandler, catalog_tools
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

with marmot.connect() as client:
tools = catalog_tools(client)

prompt = ChatPromptTemplate.from_messages([
("system", "You are a data analyst with access to the Marmot catalog."),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)

handler = MarmotCallbackHandler(
client,
name="catalog-explorer",
model="gpt-4o-mini",
owner="data-eng",
tools=tools,
)

executor.invoke(
{"input": "Find a postgres table about orders and summarise it."},
config={"callbacks": [handler]},
)

print("agent registered as:", handler.agent_mrn)

After the first run the agent appears in Marmot as type=Agent, service=LangChain, name=catalog-explorer, with lineage edges from every asset it touched.

Catalog tools

catalog_tools(client) returns four tools wrapped around the SDK:

ToolPurpose
search_catalogFind assets by name, description or metadata
get_assetFetch full schema and metadata for one asset ID
lookup_assetResolve an asset by (type, service, name)
get_upstream_lineageTrace ancestors up to N hops

Their responses include mrn fields, so the callback handler picks them up automatically and records the upstreams.

Custom tools

Three ways to attribute lineage from your own tools.

marmot_tool helper

from marmot.integrations.langchain import marmot_tool

@marmot_tool(asset_mrn="mrn://table/postgres/orders")
def query_orders(sql: str) -> list[dict]:
"""Run a read-only SQL query against the orders table."""
return run_sql(sql)

The MRN is stamped into tool metadata. The handler reads it on every call.

Manual record_source

Use this when the upstream is only known at runtime, for example a tool that picks one of several tables:

def query_table(table: str, sql: str) -> list[dict]:
handler.record_source(f"mrn://table/postgres/{table}")
return run_sql(sql)

MRNs in tool output

If your tool returns objects shaped like { mrn, ... } or { results: [{ mrn, ... }] }, the handler walks the output looking for them. This is how catalog_tools produces lineage automatically.

Other frameworks

LlamaIndex, AutoGen and CrewAI work today against the Marmot SDK. First-class integrations follow demand.

See all integrations