Back to Blog
January 2026 5 min read

Soft Tools: Reducing Context Overhead for Complex Tool Use

Aleksandr Titarenko
Syntron Systems Inc.
Diagram showing a main AI agent coordinating with specialized sub-agents for Gmail, Drive, Calendar, and Trello through a message router

The Tool Explosion Problem

Rapid advances in agentic tool use methodologies, like Model Context Protocol (MCP) by Anthropic, have been a double-edged sword for AI developers.

While these frameworks empower our agents with amazing abilities, there's a hidden cost: having dozens of tools available for your agent at once quickly starts to eat into the agent's context window and attention limits. It makes tool selection more confusing for the agent, and makes every inference request slower and more expensive.

The concept of adding all tool descriptions to your agent context is simply not scalable.

Applied Example: Business Assistant

One of our recent projects was a business assistant that connects to shared Google resources and a ticket management platform to facilitate property management.

We added just four services:

The Hidden Cost

Just these four services quickly bloated our context window by about 6,000 tokens.

The problem? Each service includes multiple objects, each with CRUD operations, each with elaborate descriptions and complex tool signatures.

In practice, four services became dozens of tools with an incredibly long JSON signature.

The consequences were immediate:

The "Soft Tool" Pattern

Soft Tool pattern architecture diagram showing the flow from user request through main agent to specialized sub-agents
The Soft Tool pattern: Main agent delegates to specialized sub-agents

After experimenting with optimizing signatures and dynamic tool loading, we landed on a simpler idea:
abstract many tools behind one natural language interface.

Instead of giving the agent 50+ specialized tools, give it 4 intelligent sub-agents.

Hard Tools vs. Soft Tools

Traditional "Hard" Tools:

Available tools:
- drive_search: Search for files in Google Drive using Drive query syntax
  Parameters: {
  "type": "object",
  "properties": {
    "query": {
      "type": "string",
      "description": "Parameter: query"
    },
    "max_results": {
      "type": "integer",
      "description": "Parameter: max_results",
      "default": 10
    }
  },
  "required": [
    "query"
  ]
}

- drive_get_file: ...
- drive_delete_file: ...
- drive_create_folder: ...
[other drive tools]
..
- gmail_search_messages: ...
- gmail_get_message: ...
- gmail_send_message
- gmail_list_labels
- gmail
[other gmail tools]
...

Soft Tool:

Tool: gmail_agent
Description: Interact with Gmail - search, send, manage, using natural language instructions.
Parameters:
  - query: What you want to do with Gmail

Tool: drive_agent
Description: Search Google Drive using natural language queries.
Parameters:
  - query: What do you want to do with Google Drive

How It Works

graph TD
    A["👤 User
'Check for emails from my manager'"] --> B["🤖 Main Agent
Receives natural language request"] B --> C["📤 Calls gmail_agent tool
Instruction: 'Find emails today from tom@manager.com'"] C --> F["🔧 Translates to API call"] F --> G["📧 Executes: gmail_search_messages()
query='from:tom@manager.com after:2026-01-22'"] G --> H["📊 Returns results to Main Agent"] H --> I["✅ Main Agent delivers results to User"] style A fill:#1e293b,stroke:#60a5fa,stroke-width:2px,color:#e2e8f0 style B fill:#1e293b,stroke:#60a5fa,stroke-width:2px,color:#e2e8f0 style C fill:#1e3a5f,stroke:#3b82f6,stroke-width:2px,color:#e2e8f0 style F fill:#2d1b4e,stroke:#a78bfa,stroke-width:3px,color:#e2e8f0 style G fill:#2d1b4e,stroke:#a78bfa,stroke-width:2px,color:#e2e8f0 style H fill:#1e3a5f,stroke:#3b82f6,stroke-width:2px,color:#e2e8f0 style I fill:#1e293b,stroke:#60a5fa,stroke-width:2px,color:#e2e8f0

The key insight: The main agent speaks natural language. The specialized sub-agent handles the technical complexity.

Sub-agent

These are specialized tool calling agents whose job it is to translate a simple request into one or more tool calls, and then execute them.

There are quite a lot of modalities for the sub-agent workflow, from simple translate → call, to more complicated configurations that reflect on the tool output. However, their main quality is specializing in tool execution accuracy.

Real-World Impact

The results were dramatic:

Performance Metrics

Token overhead reduction per call: ~6,000 → 500 tokens

That's a 92% reduction in context overhead

Improved Accuracy

Better tool selection: The agent stopped confusing similar functions when there were only 4 clear choices instead of 50+ specialized and not relevant ones. With fewer options, the agent became more accurate and decisive in tool selection.


Conclusion: Simplicity Through Abstraction

The "Soft Tool" pattern demonstrates a counter-intuitive principle in AI engineering: sometimes the best way to add capabilities is to remove complexity from the interface.

By wrapping specialized tools behind natural language sub-agents, we achieved:

When to Use Soft Tools

Consider this pattern when:

Trade-offs

Soft tools add an extra LLM call for each sub-agent invocation, but in practice, the same tool description context ends up being sent sparingly instead of with every request.

Share this article
Aleksandr Titarenko
Syntron Systems Inc.
AI Engineering & Architecture

Building something similar?

We'd love to hear about your experience with agent orchestration and tool management.