In Part 6, we built a support assistant with guardrails, intent detection, and conversation history. It worked — but the routing was Python keyword matching. “Charge” goes to billing. “Password” goes to account. Simple, brittle, and wrong the moment a customer says “I got billed double” instead of “I was charged twice.”

Your e-commerce company gets 200+ support tickets a day. Billing disputes, shipping problems, account issues — often multiple problems in the same message. “I was charged twice for my yoga mat AND I received the wrong color.” That’s two agents, not one. Keyword matching can’t handle it.

This project replaces manual routing with agents that reason. A supervisor agent reads the customer’s message, understands the intent, and delegates to specialized agents — each with their own tools and expertise. We build it with the Strands Agents SDK — an open-source Python framework where tools are just decorated functions.

The Problem

A real support team doesn’t have one person who handles everything. It has specialists — someone for billing, someone for shipping, someone for accounts. When a customer calls with multiple issues, the receptionist figures out who to involve and routes accordingly.

Part 6’s assistant tried to be all three specialists at once. One system prompt, one set of tools, keyword-based routing. It couldn’t handle “I was charged twice AND my package is late” because that’s two specialists, not one.

You need a system where each specialist is an independent agent with its own tools, and a supervisor that understands natural language well enough to route correctly — even when the customer’s message spans multiple domains.

The Pattern

Customer → Supervisor (reasons about routing) → Specialist Agent(s) → Tools → Response

The supervisor doesn’t use rules. It uses an LLM to read the message, understand the intent, and decide which agents to involve. The specialist agents don’t share tools — each one has exactly the tools it needs for its domain.

How It Works — Three Phases

Each phase builds on the previous one’s limitation.

Phase 1: Build a Single Agent with Strands

Strands is an open-source Python SDK where tools are just decorated functions. The agent runs in your process, calls tools directly, and you control every step.

Step 1: Define tools as Python functions. A @tool decorator turns any function into something the agent can call. The function’s docstring tells the agent what it does — and the function body is the actual execution logic.

@tool
def track_order(order_id: str) -> dict:
    """Track the shipping status of an order."""
    order = ORDERS.get(order_id)
    return {
        "order_id": order_id,
        "status": order["status"],
        "carrier": order["shipping_carrier"],
        "tracking_number": order["tracking_number"],
        "estimated_delivery": order["estimated_delivery"],
    }

This is a plain Python function. It works locally, it’s easy to test with mocks, and — as we’ll see in Part 8 — the same function can be deployed as a Lambda handler or inside an AgentCore container without rewriting it.

Step 2: Create the agent with a system prompt and tools. One Agent() call wires everything together — the model, the persona, and the list of tool functions.

from strands import Agent
from strands.models import BedrockModel

agent = Agent(
    model=BedrockModel(model_id="us.anthropic.claude-sonnet-4-20250514-v1:0"),
    system_prompt="You are an order support agent. Use tools to look up real data.",
    tools=[track_order, lookup_order_history, cancel_order],
)

Step 3: The agent handles the entire conversation in one call. You pass the customer’s message. The agent reasons about it, decides which tool to call, executes the function directly in your Python process, reads the result, and writes the response — all in a single call.

result = agent("Customer CUST-1001 asks: Where is my order ORD-50435?")

Here’s what happens behind the scenes. The agent repeats a decision cycle: read the message → decide if a tool is needed → call the tool → read the result → decide if more tools are needed → respond. This cycle runs entirely in your Python process — no API round-trips for tool execution.

Agent reasoning:
  → Reads query: customer wants order tracking
  → Picks tool: track_order(order_id="ORD-50435")
  → Calls the Python function directly — gets result instantly
  → Formats response from the data

Agent: Your order ORD-50435 is currently in transit with FedEx.
       Tracking number: 794644790132. Estimated delivery: April 23rd.

If the agent needs two tools (track the order AND check payment), it calls both in sequence automatically — one agent() call, no extra round-trips.

What single agents do well:

  • Full control over the decision cycle — add custom logic between tool calls
  • Tools are Python functions — test with mocks, run offline, iterate fast
  • Open-source — inspect, extend, contribute

The limitation: Single agents can’t handle “I was charged twice AND my package is late.” That’s two domains, two tool sets. One agent trying to do everything gives worse answers than focused specialists. You need multiple agents collaborating.

Phase 2: Multi-Agent Orchestration — The Supervisor Pattern

Step 1: Define three specialist agents, each with their own tools. Each agent has a focused system prompt and only the tools it needs. The order agent can’t process refunds. The billing agent can’t track packages. This separation means each agent is an expert in its domain.

order_agent = Agent(
    model=model,
    system_prompt="You are the Order Support Agent...",
    tools=[track_order, lookup_order_history, cancel_order],
)
billing_agent = Agent(
    model=model,
    system_prompt="You are the Billing Support Agent...",
    tools=[check_payment_status, process_refund],
)
account_agent = Agent(
    model=model,
    system_prompt="You are the Account Support Agent...",
    tools=[lookup_customer, reset_password, update_account_info],
)

Step 2: Create a supervisor agent that routes requests. The supervisor doesn’t have domain tools. It has one job: read the customer’s message, figure out which specialists to involve, and return a routing decision as JSON.

supervisor = Agent(
    model=model,
    system_prompt="""Analyze the customer's message and route to specialists:
    ORDER AGENT — for: tracking, shipping, cancellations, returns, wrong items
    BILLING AGENT — for: payments, charges, refunds, duplicate charges
    ACCOUNT AGENT — for: login, password, email/phone changes, app issues

    Return JSON: {"agents": [...], "reasoning": "...", "needs_human_review": true/false}""",
    tools=[lookup_customer],
)

How does the supervisor know what to do? It reads the customer’s message and matches it against the agent descriptions in its system prompt. “I got billed double” → the word “billed” matches the billing agent’s description. “My package vanished” → “package” matches the order agent. The LLM understands meaning, not keywords — that’s the fundamental difference from Article 6’s Python keyword matching.

Step 3: The orchestrator ties it all together. When a customer message arrives, the orchestrator sends it to the supervisor first. The supervisor returns which agents to involve. The orchestrator dispatches to each specialist in sequence, collects their responses, and checks for human review flags.

# Supervisor analyzes: "I was charged twice AND received the wrong color"
routing = supervisor(message)  # → {"agents": ["billing", "order"], ...}

# Dispatch to each specialist
for agent_name in routing["agents"]:
    response = agents[agent_name](message)  # Each agent uses its own tools
Supervisor: Routes to [billing, order] — "Two issues: duplicate charge + wrong item"

Billing Agent: Found duplicate charge of $167.76. Refund submitted
               for manager approval (over $50 threshold).

Order Agent: Confirmed wrong item flag on ORD-50398. Escalated to
             fulfillment team for correct color shipment.

Both agents worked independently on their part of the problem, using their own tools and domain knowledge.

What multi-agent orchestration does well:

  • Handles complex, multi-issue requests that single agents can’t
  • Each specialist is focused — better answers than one agent trying to do everything
  • Supervisor reasoning is transparent — you can audit why each agent was chosen
  • Agents are independent — add a new specialist without changing existing ones

The limitation: The agents can do anything their tools allow — including processing a $500 refund without asking anyone. Some actions need human approval.

Phase 3: Human-in-the-Loop — Safety for Sensitive Actions

Step 1: Tools enforce business rules. The process_refund tool has a built-in threshold. Amounts over $50 return a “pending_approval” status instead of processing immediately. The tool doesn’t need the agent’s permission — it’s a hard business rule in the code.

@tool
def process_refund(customer_id: str, order_id: str, amount: float, reason: str) -> dict:
    """Process a refund. Amounts over $50 require manager approval."""
    if amount > 50:
        return {
            "status": "pending_approval",
            "message": f"Refund of ${amount:.2f} requires manager approval (threshold: $50).",
            "approval_required": True,
        }
    return {"status": "approved", "amount": amount, ...}

Step 2: The supervisor independently flags tickets for human review. The supervisor’s routing decision includes a needs_human_review field. It flags based on refund amounts, customer sentiment, and escalation phrases — independent of what the tools do. Two layers of safety: the tool blocks the action, and the supervisor flags the ticket.

Step 3: The orchestrator checks both layers. After all agents respond, the orchestrator checks the supervisor’s flag AND scans agent responses for pending approvals. If either triggers, the ticket is marked for human review.

Customer: I need a full refund of $167.76.

Billing Agent: Refund submitted → PENDING APPROVAL (exceeds $50 threshold)
Supervisor: ⚠️ FLAGGED FOR HUMAN REVIEW

The customer gets a response explaining the refund is pending approval. The human agent gets a ticket with the full context — customer history, agent reasoning, and the specific action that needs approval.

What human-in-the-loop does well:

  • Two independent safety layers — tool-level rules AND supervisor-level flags
  • Business rules are in code, not prompts — the $50 threshold can’t be prompt-injected away
  • Human agents get full context — they don’t start from scratch
  • Configurable thresholds — change the dollar amount without retraining anything

Real Results — 8 Tickets End-to-End

We ran all 8 sample tickets through the full system — supervisor routing, sub-agent execution, tool calls, and response generation:

Ticket     Agents              Review    Latency
TKT-001    order                   No    10601ms
TKT-002    billing            ⚠️ YES    10902ms
TKT-003    account                 No    12471ms
TKT-004    billing,order      ⚠️ YES    22593ms
TKT-005    billing,order      ⚠️ YES    23603ms
TKT-006    order                   No    10633ms
TKT-007    account                 No    12405ms
TKT-008    account                 No    14047ms

Every ticket routed correctly. Multi-issue tickets (TKT-004, TKT-005) went to multiple agents. Money-related issues got flagged for human review. The supervisor considered customer loyalty tier — Platinum customers with billing issues got higher priority.

What Changed from Part 6

Concept Part 6 (AI Support Assistant) Part 7 (This Project)
Routing Python keyword matching LLM-based supervisor reasoning
Agents One agent does everything Three specialists + supervisor
Tools Shared across all intents Domain-specific per agent
Multi-issue Can’t handle Supervisor delegates to multiple agents
Framework Raw Bedrock Converse API Strands Agents SDK

Getting Started

Prerequisites: Python 3.10+, AWS CLI configured, Bedrock model access for Claude Sonnet 4.

git clone https://github.com/srikanthethiraj/agentic-ai-system.git
cd agentic-ai-system
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Run the demo:

python demo.py --region us-east-1

Run all sample tickets end-to-end:

python run_all_tickets.py

Results are saved to output/full_results.json with every routing decision, agent response, and tool call.

Tear Down

No infrastructure to tear down — the demo runs locally against Bedrock APIs.

What’s Next

The multi-agent system works on your machine. But how do you take it to production? In Part 8, we deploy the same system three ways on AWS — Bedrock Agents (fully managed), AgentCore Harness (config-only), and AgentCore with custom code (full control). Same tools, same logic, three deployment paths.

This is Part 7 of an ongoing series:

  • Part 1Insurance Claim Processor: Bedrock basics, prompt templates, simple RAG
  • Part 2Financial Services AI Assistant: benchmarking, circuit breakers, cross-region resilience
  • Part 3Customer Feedback Pipeline: multimodal data processing
  • Part 4RAG Knowledge Base: vector stores, embeddings, document chunking
  • Part 5Advanced Search & Retrieval: hybrid search, reranking, query expansion
  • Part 6AI Support Assistant with Governance: guardrails, prompt management, conversation flows
  • Part 7 — Multi-Agent Support System (this post): Strands Agents, multi-agent orchestration
  • Part 8 — Deploy AI Agents on AWS: Bedrock Agents vs AgentCore
  • Part 9 — Multi-Tier Model Deployment & Enterprise Integration
  • Part 10 — AI Safety — Guardrails, PII Protection & Threat Detection
  • Part 11 — Cost & Latency Optimization for GenAI
  • Part 12 — AI Governance — Compliance, Fairness & Model Cards
  • Part 13 — Testing, Evaluation & Troubleshooting GenAI Systems

Repository: github.com/srikanthethiraj/agentic-ai-system


I’m Srikanth — a cloud engineer at AWS based in Austin, Texas. I learn by building, and I write about what I build. Follow along on this blog or connect with me on LinkedIn.