Soft Tools: Reducing Context Overhead for Complex Tool Use

The Tool Explosion Problem

Rapid advances in agentic tool use methodologies, like Model Context Protocol (MCP) by Anthropic, have been a double-edged sword for AI developers.

While these frameworks empower our agents with amazing abilities, there's a hidden cost: having dozens of tools available for your agent at once quickly starts to eat into the agent's context window and attention limits. It makes tool selection more confusing for the agent, and makes every inference request slower and more expensive.

The concept of adding all tool descriptions to your agent context is simply not scalable.

Applied Example: Business Assistant

One of our recent projects was a business assistant that connects to shared Google resources and a ticket management platform to facilitate property management.

We added just four services:

Gmail
Google Drive
Google Calendar
Trello

The Hidden Cost

Just these four services quickly bloated our context window by about 6,000 tokens.

The problem? Each service includes multiple objects, each with CRUD operations, each with elaborate descriptions and complex tool signatures.

In practice, four services became dozens of tools with an incredibly long JSON signature.

The consequences were immediate:

Increased cost on every completion request
Added latency from processing massive context
Tool confusion - the agent started selecting the wrong tools

The "Soft Tool" Pattern

Soft Tool pattern architecture diagram showing the flow from user request through main agent to specialized sub-agents — The Soft Tool pattern: Main agent delegates to specialized sub-agents

After experimenting with optimizing signatures and dynamic tool loading, we landed on a simpler idea:
abstract many tools behind one natural language interface.

Instead of giving the agent 50+ specialized tools, give it 4 intelligent sub-agents.

Hard Tools vs. Soft Tools

Traditional "Hard" Tools:

Available tools:
- drive_search: Search for files in Google Drive using Drive query syntax
  Parameters: {
  "type": "object",
  "properties": {
    "query": {
      "type": "string",
      "description": "Parameter: query"
    },
    "max_results": {
      "type": "integer",
      "description": "Parameter: max_results",
      "default": 10
    }
  },
  "required": [
    "query"
  ]
}

- drive_get_file: ...
- drive_delete_file: ...
- drive_create_folder: ...
[other drive tools]
..
- gmail_search_messages: ...
- gmail_get_message: ...
- gmail_send_message
- gmail_list_labels
- gmail
[other gmail tools]
...

Soft Tool:

Tool: gmail_agent
Description: Interact with Gmail - search, send, manage, using natural language instructions.
Parameters:
  - query: What you want to do with Gmail

Tool: drive_agent
Description: Search Google Drive using natural language queries.
Parameters:
  - query: What do you want to do with Google Drive

How It Works

graph TD
    A["👤 User
'Check for emails from my manager'"] --> B["🤖 Main Agent
Receives natural language request"]
    B --> C["📤 Calls gmail_agent tool
Instruction: 'Find emails today from tom@manager.com'"]
    C --> F["🔧 Translates to API call"]
    F --> G["📧 Executes: gmail_search_messages()
query='from:tom@manager.com after:2026-01-22'"]
    G --> H["📊 Returns results to Main Agent"]
    H --> I["✅ Main Agent delivers results to User"]

    style A fill:#1e293b,stroke:#60a5fa,stroke-width:2px,color:#e2e8f0
    style B fill:#1e293b,stroke:#60a5fa,stroke-width:2px,color:#e2e8f0
    style C fill:#1e3a5f,stroke:#3b82f6,stroke-width:2px,color:#e2e8f0
    style F fill:#2d1b4e,stroke:#a78bfa,stroke-width:3px,color:#e2e8f0
    style G fill:#2d1b4e,stroke:#a78bfa,stroke-width:2px,color:#e2e8f0
    style H fill:#1e3a5f,stroke:#3b82f6,stroke-width:2px,color:#e2e8f0
    style I fill:#1e293b,stroke:#60a5fa,stroke-width:2px,color:#e2e8f0

The key insight: The main agent speaks natural language. The specialized sub-agent handles the technical complexity.

Sub-agent

These are specialized tool calling agents whose job it is to translate a simple request into one or more tool calls, and then execute them.

There are quite a lot of modalities for the sub-agent workflow, from simple translate → call, to more complicated configurations that reflect on the tool output. However, their main quality is specializing in tool execution accuracy.

Tool Context Isolation: The sub-agent can receive a very complicated and long list of available tools with every request, which is not exposed to the main agent. This allows us to hide the extra complexity away from the calling agent while maintaining rich descriptions of our tool functions.
Enhanced Attention: Having fewer specialized tools preserves the agent's attention to the tool descriptions, allowing for greater accuracy in tool selection.

Real-World Impact

The results were dramatic:

Performance Metrics

Token overhead reduction per call: ~6,000 → 500 tokens

That's a 92% reduction in context overhead

Improved Accuracy

Better tool selection: The agent stopped confusing similar functions when there were only 4 clear choices instead of 50+ specialized and not relevant ones. With fewer options, the agent became more accurate and decisive in tool selection.

Conclusion: Simplicity Through Abstraction

The "Soft Tool" pattern demonstrates a counter-intuitive principle in AI engineering: sometimes the best way to add capabilities is to remove complexity from the interface.

By wrapping specialized tools behind natural language sub-agents, we achieved:

92% reduction in context overhead
Faster response times
Better tool selection accuracy
A more scalable tool architecture

When to Use Soft Tools

Consider this pattern when:

You have multiple tools from the same domain (e.g., Gmail, Drive)
Tool descriptions are complex or numerous
You notice tool selection errors
Context costs are growing unsustainably

Trade-offs

Soft tools add an extra LLM call for each sub-agent invocation, but in practice, the same tool description context ends up being sent sparingly instead of with every request.