The Tool Contract Pattern: Stop Importing Your Agent's Tools. Declare Them.

Pattern #2 in the Agentic Platform Patterns catalog. See the Graph Blueprint article for what this plugs into, and the introduction for the catalog framing.

Initially, adding a new tool to an agent system we built meant editing five files. But after we wrote down what a tool actually is, it meant defining one piece of data.

That's the pattern. The rest of this article is the five files, the one record that replaced them, and the surprising thing that happened to the orchestrator once the record existed.

A note for anyone who's read the canonical pattern catalogs first: the well-known Tool Use pattern tells you an agent should call external tools, and what the call-and-respond loop looks like. This article is the layer below it. Not whether the agent uses tools, but how a tool's entire interface gets declared so that adding one means defining a single record instead of modifying a bunch of files. Tool Use is a behavior; the Tool Contract is the architecture that keeps that behavior from coupling your codebase to one product.

Five files to add a tool

The orchestrator in our medical AI engine plans by calling tools. The planner LLM looks at a catalog of tools, picks some, fills in their arguments, and the execution layer runs them. Standard agent shape. The problem wasn't the shape. The problem was what it took to add a tool to that catalog.

Before the extraction, adding one new tool meant touching roughly five places:

The orchestrator node, to import the tool and bind it into the function the planner called at execution time.
The planner's prompt, to teach the LLM that this tool existed, what it was for, and when to reach for it.
A config file, to declare any per-deployment knobs the tool needed: search modes, feature toggles, thresholds.
The state class, to add the output fields the tool would write back, so the rest of the graph could read them.
A validation helper, somewhere, to enforce the structural rules. "You can't pass these two arguments together." "This argument requires that flag to be true."

Five files. Five mental steps. And nothing connecting them. A reviewer looking at the orchestrator import had no way to know whether the matching state field, config knob, and validation rule had been added too. Adding a tool was a checklist you reconstructed from memory every time, and the failure mode was silent: the tool worked, right up until the one plan that tripped the invariant nobody remembered to write.

The deeper problem wasn't the volume of edits. It was that the tool's interface with the orchestrator wasn't written down anywhere. It was distributed across five files and held together by convention. When we started asking whether this codebase could host a second product, this was the first thing that screamed. A second team would have to learn five conventions instead of reading one definition.

The pattern in this article is what we built when we asked: what would it look like if a tool's entire interface with the orchestrator lived in one place?

What a tool actually is

Before the code, the conceptual move. The whole pattern falls out of one definition.

A tool, from the orchestrator's point of view, is four things:

A name and a schema. What the LLM sees in its tool catalog. Enough to decide when to call the tool and how to fill in the arguments.
A set of config knobs it needs at runtime. Per-deployment values that shape behavior: modes, toggles, thresholds. These can arrive from the LangGraph configurable payload, from the graph state, or from a declared default, which means they need a resolution order.
A set of invariants. Rules that must hold for any valid call. "You can't pass subquestion_id and subquestion_ids together." "Bundled steps require web_search=True." Some invariants are checkable (a function that returns true or false). Some are correctable (a function that fixes the call instead of rejecting it).
A set of state effects. The fields it writes to graph state when it runs, each with a merge strategy (replace, append, merge into a dict), a flag for whether to preserve the value across conversation turns, and optionally a marker that says "inject this value back into the system prompt at this template variable."

Each of these used to live in a different file. The realization was that all four are facts about the tool, declared by whoever owns the tool. They have no business being scattered across the orchestrator, the prompt, the config, the state class, and a validation helper. They belong together, because they describe one thing.

The piece that makes the abstraction earn its keep is the third one — invariants as data, with optional correctors. That's the design choice that converts "validation is a function someone remembers to call" into "validation is part of the tool's identity." Once invariants are data, the registry runs them automatically. Once correctors are data, the registry can rescue plans that would otherwise be rejected. So the pattern is a lot more than just tidy bookkeeping.

The pattern

Name: Tool Contract.

Tagline: Bundle a tool's schema, config, invariants, and state effects into one declarative record: the single seam between an agent's planner and its tools.

Intent: Replace the scattered, convention-held interface between an orchestrator and its tools with a single data record that the orchestrator, the validator, the state layer, and the prompt builder all read from. Make adding a tool mean defining one value, not editing five files.

Structure

The top-level record is a dataclass with no behavior to speak of. It's the place the four facts live:

@dataclass
class ToolContract:
    """Contract declaring what a tool accepts, enforces, reads, and writes.

    This is the single declaration point for a tool's interface with
    the orchestrator. Adding a new tool means defining one contract.
    """

    name: str
    schema: dict
    invariants: list[ToolInvariant]
    description: str
    config_params: list[ToolConfigParam] = field(default_factory=list)
    state_reads: list[ToolStateRead] = field(default_factory=list)
    state_effects: list[ToolStateEffect] = field(default_factory=list)

Most of those fields are lists of small subordinate records. Three of them carry the weight, and each is worth a look.

ToolConfigParam is (key, default, normalizer):

@dataclass
class ToolConfigParam:
    key: str
    default: Any = None
    normalizer: Callable[[Any], Any] | None = None

The registry resolves every param through one cascade, configurable -> state_config -> declared default, applied uniformly, so no call site ever reimplements "where does this value come from again?" The normalizer is the unsung hero. It absorbs the "should this be lowercased? should an empty string become None?" decisions into the param declaration itself, instead of forcing every reader of the value to remember to clean it. One declaration, normalized once, read everywhere.

ToolInvariant is (id, rule, check, correct):

@dataclass
class ToolInvariant:
    id: str
    rule: str
    check: Callable           # (call, state, resolved_config) -> bool
    correct: Callable | None = None  # (call, state, resolved_config) -> (call, bool)

The id is a stable identifier the orchestrator logs when an invariant trips. The rule is the human-readable message, and it does double duty: the same string is what the LLM reads when it's taught the tool's rules. The check is a pure function over the call, the state, and the resolved config. The optional correct takes the same arguments and returns a fixed call plus a success flag.

And here's the property that makes it work: the orchestrator never needs to know what any specific invariant checks. It calls contract.validate(call, state, resolved_config) and gets back a list of violation messages. A new invariant is a new entry in a list. The orchestrator code does not change. Validation went from something the orchestrator performs to something a tool declares about itself.

ToolStateEffect is (key, type_hint, merge, preserve_across_turns, prompt_template_var):

@dataclass
class ToolStateEffect:
    key: str
    type_hint: type = dict
    merge: str = "replace"  # "replace" | "append" | "merge_dict"
    preserve_across_turns: bool = False
    prompt_template_var: str | None = None

merge encodes how the tool's output combines with whatever's already in state. preserve_across_turns tells the guardrails layer whether to clear this field at the start of a new turn. And prompt_template_var is the one worth dwelling on: when it's set, the value in this state field gets serialized and injected into the system prompt at {{VAR_NAME}}. That single field is what lets a tool's output flow back into the next LLM call without anyone wiring it up by hand. The tool says "my output belongs in the prompt here," and it just happens.

There's a fourth list on the record, and it closes a symmetry the other three only hint at. Where state_effects declares what a tool writes, ToolStateRead declares what it reads: the state keys it needs as input.

@dataclass
class ToolStateRead:
    key: str

That's the whole record, a single key. A tool's state_reads list names the fields the execution layer must pull out of graph state and hand to the tool when it runs. A retrieval tool, for instance, might declare state_reads=[ToolStateRead("normalized_query_text"), ToolStateRead("skills_loaded")]. The payoff is that a tool's inputs become as declarative as its outputs. The execution layer injects the named state into the call without the tool reaching for LangGraph's InjectedState plumbing, and a reader can see at a glance what a tool consumes, not just what it produces. It's a small field, and for a while it was the one part of the contract that stayed implicit while everything around it got written down. Once it's declared, the record finally describes the tool's entire state interface, both directions, in one place.

So here's what lands the whole pattern. A ToolContract instance is the single source of truth for a tool. The orchestrator reads from it to bind the tool. The validator reads from it to check the plan. The state guardrails read from it to know what to clear and preserve. The prompt builder reads from it to know what to inject. The execution layer reads from it to know what state to feed in and what to collect back out. One declaration, every consumer, zero duplication.

A real example

Here's an actual contract from the medical engine: the model_rag tool, which retrieves clinical evidence by decomposing a question into subquestions. Lightly trimmed:

RETRIEVAL_LLM_CONTRACT = ToolContract(
    name="retrieval_llm",
    schema={
        "subquestion_id": "Optional[str] — single subquestion ID for targeted retrieval",
        "subquestion_ids": "Optional[list[str]] — multiple IDs for parallel (skeleton-of-thoughts) retrieval",
        "web_search": "Optional[bool] — enable web search for this step",
        "query": "str — the search query for retrieval",
    },
    config_params=[],
    invariants=[
        ToolInvariant(
            id="MUTUAL_EXCLUSIVE_SUBQUESTION_FIELDS",
            rule="Cannot use both subquestion_id and subquestion_ids in same step",
            check=_check_id_fields_not_both_set,
            correct=None,  # no correction; rejection is the right response
        ),
        ToolInvariant(
            id="BUNDLED_REQUIRES_WEB_SEARCH",
            rule="Steps using subquestion_ids must have web_search=True",
            check=_check_bundled_has_web_search,
            correct=_correct_bundled_enable_web_search,  # set the flag
        ),
        ToolInvariant(
            id="SINGLE_BUNDLED_STEP_PER_PLAN",
            rule="At most one retrieval_llm step with subquestion_ids per plan",
            check=_check_one_bundled_step_max,
            correct=_correct_drop_extra_bundled,  # prune the duplicate step
        ),
        ToolInvariant(
            id="NO_DUPLICATE_IDENTICAL_CALLS",
            rule="No duplicate (text, web_search) combinations within plan",
            check=_check_calls_are_unique,
            correct=_correct_dedupe_calls,
        ),
    ],
    description=(
        "Retrieve clinical evidence using frontier LLM decomposition.\n"
        "- Use subquestion_id for single-step targeted retrieval\n"
        "- Use subquestion_ids for parallel multi-step retrieval (ONE step per plan)\n"
        "- Steps with subquestion_ids MUST have web_search=True\n"
        "- Deduplication: identical (text, web_search) combinations are pruned"
    ),
)

Three things in this one block are worth pulling out, because they're where the abstraction stops being theoretical.

Three of the four invariants have correctors. This is unusual, and it's the contrarian heart of the pattern. Rejecting a tool call is expensive: the LLM has to re-plan, the user waits, and the rejection signal might not even tell the model what it actually did wrong. Correctors give the orchestrator a second option. If we can deterministically fix the call (set the missing web_search=True flag, prune the redundant step), we do it, log that we did, and move on. The LLM never has to know it almost made a mistake.

Correction isn't binary, either. A corrector can do more than patch-and-keep: it can return a DROP_STEP sentinel that tells the registry to prune the step from the plan entirely. SINGLE_BUNDLED_STEP_PER_PLAN uses exactly this. It doesn't fix the second bundled step in a plan; it removes it. So the real menu an invariant chooses from is three options, not two: correct, prune, or reject. Each invariant declares which of those it's capable of, right there in the contract.

The fourth invariant deliberately has no corrector. MUTUAL_EXCLUSIVE_SUBQUESTION_FIELDS is an honest "no." The LLM passed two mutually exclusive arguments, and there's no defensible way to guess which one it meant. The correct response is rejection, and the contract says so by leaving correct=None. The design choice is explicit and local: each invariant declares whether it can rescue itself, and the absence of a corrector is itself a statement.

The description string is the LLM's tool documentation. That block of text ends up in the planner's prompt verbatim. Writing the contract is also writing the tool's marketing copy to the model, which makes the contract a content-design surface as much as a software one. The same record that the validator reads for invariants is the record the LLM reads to decide whether to call the tool at all.

Consequences

What this pattern enables:

Adding a tool is defining one record. Schema, config, invariants, and state effects all live in one file, owned by whoever owns the tool. No orchestrator edit, no scavenger hunt across five files.
The orchestrator stops importing tools. It asks the registry what to bind and gets tools back, instantiated and ready, and the orchestrator file ends up with zero imports from tool packages.
A second product registers a different set of contracts and the orchestrator picks them up unchanged. This is the reuse property the whole catalog is chasing, landing at the tool layer.
Validation, config resolution, and prompt injection all become uniform. They're not per-tool special cases anymore; they're loops over declared data.

What it costs:

A layer of indirection. To understand what a tool enforces, you read the contract and the checker functions it references, not a single inline if in the orchestrator. The tradeoff is worth it the moment you have more than two or three tools, and decisively worth it the moment a second product exists.
Discipline about what belongs in a contract. The temptation is to encode every rule as an invariant. That temptation leads somewhere bad. See the close.

When not to use it

A stateless tool with no invariants and no config doesn't need a contract; the registry tolerates tools that have none. And if your agent has three tools that will never grow and never be reused, the scattered-five-files approach will annoy you without ever actually hurting you. The contract earns its keep when tools accumulate, when their rules get subtle, or (the real forcing function) when a second consumer of the platform appears and you do not want them forking your orchestrator to add a tool.

The surprising payoff: the registry became the dispatcher

Before, the orchestrator imported tool modules by name and called validators directly from its node bodies. After, it imports nothing from the tool packages. The execution layer asks the registry what to bind, the validation layer asks what contracts apply, and the state-mutation layer reads state_effects to know what to do with each tool's output. Everything routes through the registry instead of through an import.

Squint at that and you'll recognize the shape: it's plugin architecture without a plugin system. The registry is the plugin host. The contracts are the plugins. The orchestrator never imports a plugin directly. There's no reflection, no entry points, no dynamic-loading magic underneath it, just a single startup-time call from the application's bootstrap:

registry.register_contract(RETRIEVAL_LLM_CONTRACT)

The application registers its contracts when the process starts. From then on, the orchestrator discovers everything it can do by asking the registry. A second product that wants a different toolbox registers different contracts at its bootstrap and runs through the same orchestrator, unchanged. The registry didn't just store our tools. It became the seam every other layer dispatches through.

That's the same move the Graph Blueprint made for topology, applied one layer down. The blueprint made which nodes exist a value the application supplies. The contract makes which tools exist, and what they enforce a value the application supplies. Same shape, different seam. By the time you've read a few more patterns in this catalog, you'll stop seeing them as separate tricks and start seeing one idea — the application declares, the platform consumes — expressed at every layer where the two used to be tangled.

The honest close: contracts are about tools, not planners

The ToolContract pattern handles structural invariants: rules that depend only on the call, the state, and the resolved config. "Don't pass these two arguments together." "This flag is required when that one is set." "Don't run the same retrieval twice." Those are facts a tool enforces about itself, and the contract is exactly the right home for them.

It does not handle policy invariants: rules that depend on application-specific business logic that spans multiple tools. "For medical questions, a plan must include both a retrieval_llm call and a source_link_retrieval call." That isn't a fact about either tool. It's a fact about what makes a plan acceptable.

We learned this by getting it wrong first. For a while we tried to express those cross-tool rules as invariants on contracts, and they never sat right. The contract is supposed to mean "this is what this tool enforces about itself." A rule about which combination of tools a valid plan needs has a different owner (the planner, not any one tool), a different scope (the whole plan, not one call), and a different lifecycle (it changes when the product's requirements change, not when the tool does). Cramming it into a contract made both the contract and the rule harder to reason about, and the discomfort was the signal that we'd put the rule in the wrong place.

That's where the next pattern in the catalog picks up. Pattern #3, the Planning Policy, is the separate injection point that owns the cross-tool, application-specific rules the contract pattern deliberately refuses to absorb. For now, the line worth remembering about this pattern is the one we started with: a tool's whole interface — schema, config, invariants, state reads, state effects — collapses onto a single record the application declares and the registry dispatches. Define the data; the orchestrator never imports the tool.

Where the boundary between a tool's rules and a plan's rules actually falls is the subject of the next article.