MCP Tool Discovery and the llms.txt Standard

In 2026, AI agents don't come with hardcoded tool lists. They discover them. An agent starts with no knowledge of what services are available, then queries discovery endpoints to find tools, read their documentation, authenticate, and integrate them into its workflow in seconds. This is how modern autonomous systems scale — not through manual integration, but through automated discovery.

The infrastructure for this discovery exists in three places: the llms.txt standard (human-readable tool directories), .well-known/mcp.json (machine-readable discovery), and MCP registries (centralized tool marketplaces like Smithery and PulseMCP). Understanding these is essential if you're building services for agents or scaling autonomous workflows.

What is llms.txt?

llms.txt is a simple standard: a text file at `/.well-known/llms.txt` on any domain that describes services available for LLMs and agents. It's human-readable and machine-parseable — designed to help AI systems understand what they can do with your service.

A basic llms.txt file looks like this:

# SiliconBridge Tools for Autonomous Agents
Product: SiliconBridge
Description: Human infrastructure for AI agents
Documentation: https://siliconbridge.xyz/docs

## Available Tools
- solve_captcha: Solve CAPTCHAs in real-time
- relay_otp: Intercept and relay OTP codes
- web_browse: Browse bot-protected websites
- phone_verify: Handle phone verification

## Authentication
API Key required. Get free credits at https://siliconbridge.xyz

## Rate Limits
Free tier: 100 tasks/day
Paid tier: Unlimited

## Pricing
solve_captcha: $0.50 per task
relay_otp: $1.00 per task
web_browse: $2–5 per task

Agents find this file, parse it, and understand what you offer. Some tools like Claude can read llms.txt directly and decide which services to use. Autonomous agents can fetch and cache it on startup, building their capability map.

The .well-known/mcp.json Standard

llms.txt is great for humans. But agents need structured, machine-readable metadata. That's where `.well-known/mcp.json` comes in. This is the MCP (Model Context Protocol) discovery endpoint that returns JSON describing all available tools, their inputs, outputs, and authentication requirements.

A sample mcp.json file:

{
  "tools": [
    {
      "name": "solve_captcha",
      "description": "Solve CAPTCHAs using real-time human operators",
      "input_schema": {
        "type": "object",
        "properties": {
          "image_url": {
            "type": "string",
            "description": "URL of the CAPTCHA image"
          },
          "context": {
            "type": "object",
            "description": "Optional browser context"
          }
        },
        "required": ["image_url"]
      },
      "output_schema": {
        "type": "object",
        "properties": {
          "solution": {
            "type": "string",
            "description": "The CAPTCHA solution"
          },
          "time_seconds": {
            "type": "number",
            "description": "Time to solve"
          },
          "confidence": {
            "type": "number",
            "description": "Confidence score (0-1)"
          }
        }
      }
    },
    {
      "name": "relay_otp",
      "description": "Intercept and relay OTP codes",
      "input_schema": {
        "type": "object",
        "properties": {
          "phone_number": {
            "type": "string"
          },
          "service": {
            "type": "string",
            "description": "Service expecting OTP (e.g., 'sms', 'email')"
          }
        }
      },
      "output_schema": {
        "type": "object",
        "properties": {
          "code": {
            "type": "string"
          },
          "expires_in_seconds": {
            "type": "number"
          }
        }
      }
    }
  ],
  "authentication": {
    "type": "api_key",
    "header": "Authorization",
    "scheme": "Bearer",
    "get_key_url": "https://siliconbridge.xyz/api/signup/wallet"
  },
  "rate_limits": {
    "free": 100,
    "paid": "unlimited"
  },
  "pricing": {
    "solve_captcha": 0.50,
    "relay_otp": 1.00,
    "web_browse": "2-5 per task"
  }
}

When an agent boots up and loads your service, it fetches this JSON, parses the tool schemas, validates its own input against them, and calls your API with proper error handling. All automatic.

MCP Registries: Discovering Services You've Never Heard Of

llms.txt and mcp.json let agents discover tools from specific domains. But what if an agent doesn't know which domains have useful services? Enter MCP registries — centralized marketplaces where services register themselves.

Smithery (smithery.ai): The largest MCP registry. Services can register and be searched by category. An agent looking for "CAPTCHA solving" discovers SiliconBridge, 2Captcha, and others in seconds.

PulseMCP (pulsemcp.io): Focused on real-time integrations. Services exposing webhooks and async workflows register here.

mcp.so: Lightweight registry emphasizing open-source MCP servers.

An agent's discovery flow looks like:

Agent needs: "solve_captcha"

Query Smithery:
  POST https://smithery.ai/api/tools/search
  { "query": "captcha solver", "category": "human_verification" }

Response:
  [
    { "name": "SiliconBridge", "mcp_endpoint": "mcp.siliconbridge.xyz" },
    { "name": "2Captcha", "mcp_endpoint": "api.2captcha.com/mcp" },
    ...
  ]

Fetch each endpoint's mcp.json, evaluate tool quality, auth
Select: SiliconBridge (best accuracy + cost)

Fetch llms.txt for documentation
Authenticate with API key
Call solve_captcha

Why This Matters for Autonomous Workflows

Dynamic capability expansion: Instead of agents shipping with a fixed set of tools, they discover new capabilities at runtime. Update your tools on SiliconBridge, and agents pick it up immediately without redeploy.

Tool negotiation: If an agent needs "CAPTCHA solving" but doesn't care how, it can query a registry, find multiple providers, compare cost/accuracy/speed, and pick the best one for its task.

Resilience: If your primary CAPTCHA solver goes down, the agent automatically falls back to alternative providers discovered from the registry.

Vendor independence: Services that implement llms.txt and register with Smithery don't get locked into a single agent framework. LangChain, CrewAI, AutoGPT, custom Python agents — all discover and use the same tools.

SiliconBridge's Discovery Implementation

SiliconBridge publishes both llms.txt and mcp.json:

llms.txt: https://siliconbridge.xyz/.well-known/llms.txt

mcp.json: https://siliconbridge.xyz/.well-known/mcp.json

Registry listing: Registered with Smithery, PulseMCP, and mcp.so

Any agent can discover us without prior integration. See our complete API documentation and Chrome extension for real-time integrations.

Building Discoverable Services

If you're building a service for agents, publish discovery endpoints. Here's a minimal checklist:

1. Create llms.txt: Put it at `/.well-known/llms.txt`. Keep it simple — product name, description, tool list, auth info, pricing.

2. Create mcp.json: Full JSON schema of all tools, inputs, outputs, authentication flow. Follow MCP spec.

3. Implement proper auth: API keys, OAuth, or session-based. Document it clearly in mcp.json.

4. Register with at least one registry: Smithery is the de-facto standard. Submit your service at smithery.ai/submit.

5. Keep documentation updated: When you add tools or change pricing, update both files immediately. Agents cache discovery data but refresh regularly.

Integration Examples with LangChain and CrewAI

LangChain agents can load tools from mcp.json automatically:

from langchain.agents import load_tools_from_mcp

# Auto-discover and load all tools from a service
tools = load_tools_from_mcp(
    endpoint="https://siliconbridge.xyz/.well-known/mcp.json",
    api_key="your_api_key"
)

# Agent now has solve_captcha, relay_otp, web_browse, etc.
# No manual integration needed

CrewAI supports registry-based tool loading:

from crewai import Agent
from crewai.integrations import MCP

# Load tools from Smithery registry
mcp = MCP(registry="smithery.ai")
tools = mcp.discover(query="captcha solver", count=3)

agent = Agent(
    role="browser automation",
    tools=tools  # Dynamically loaded
)
# Cost-optimized: CrewAI selects the cheapest CAPTCHA solver

The Future: Mesh Networks and Agent Markets

Discovery endpoints are evolving toward agent mesh networks — decentralized marketplaces where services advertise capabilities, agents query them, and trustless transactions happen automatically. This is where autonomous workflows become truly autonomous: agents discovering, negotiating, and paying for services without human involvement.

SiliconBridge implements L402 protocol natively, meaning agents can pay per task using Bitcoin Lightning. Learn more in our L402 and agent payment guide.

Conclusion

Tool discovery isn't a luxury — it's essential infrastructure for scaling agents. llms.txt and mcp.json let services publish what they offer. Registries like Smithery let agents find those services. Together, they enable agents to dynamically expand capabilities at runtime, negotiate costs, and discover alternatives when things fail.

If you're building autonomous workflows, design your agent to query discovery endpoints and dynamically load tools. Don't hardcode integrations. See our guides on LangChain integration and CrewAI tools for 2026 for practical examples. Get started with $10 free credits.