Skip to main content
All right — we have an agent running and performing non-trivial tasks. In this lesson we’ll add robust error handling so the agent behaves predictably and remains resilient when tools fail.
A presentation slide titled "Error Handling & Resilience" with a large "Demo" label on a dark curved panel to the right. Small "© Copyright KodeKloud" appears in the bottom-left.
Goals for tool and agent behavior
  • Tools fail gracefully and return clear, structured error signals.
  • The agent does not crash or leave sessions in a broken state.
  • Users always receive a helpful next step (for example: “please restart the service”, “check this file”, or “contact IT”).
ADK supplies management for sessions, tools, and traces, but it is our responsibility to ensure tools return consistent, structured success/failure information that the LLM can reliably interpret.
Always return structured results from tools so the LLM can interpret failures and produce helpful user-facing messages.
Why structured tool responses matter
  • Traditional apps throw exceptions, log them, and show a message. LLM-driven agents need machine-readable failure signals.
  • Convert failures into a predictable structured object so the model can:
    • Recognize failure as a structured signal (not a raw stack trace).
    • Produce concise, user-friendly explanations and next steps.
  • This reduces hallucinations, keeps sessions alive, and enables consistent user experience.
Note about structured returns
  • Instead of letting an exception bubble up as an unstructured Python stack trace, catch exceptions and return a structured object with:
    • status: “success” | “error”
    • payload fields (e.g., “ticket”, “service”, “status_text”)
    • error_message: user-facing text and optional short technical hint (exception type or brief details)
Tool registration and basic imports
from typing import Dict, Any
from google.adk.tools import FunctionTool
from google.adk.agents.llm_agent import Agent
from google.adk.agents import LlmAgent

# Example tool imports (implementations below)
from tools.helpdesk_tools import (
    lookup_user_impl,
    check_service_status_impl,
    create_ticket_impl,
)

# Register function tools
lookup_user_tool = FunctionTool(func=lookup_user_impl)
check_service_status_tool = FunctionTool(func=check_service_status_impl)
create_ticket_tool = FunctionTool(func=create_ticket_impl)
Tool examples and expected structured outputs
Tool nameSuccess payload (typical)Error payload (typical)
create_ticket_implstatus: “success”, ticket: status: “error”, error_message: ”…“
check_service_status_implstatus: “success”, service: “email”, status_text: “operational”status: “error”, error_message: “Unknown service ’…’. Known services: …“
lookup_user_implstatus: “success”, user: status: “error”, error_message: “No user found for email ’…’.”
Example: create_ticket_impl (structured success return)
from typing import Dict, Any
from datetime import datetime
from pydantic import BaseModel

class CreateTicketArgs(BaseModel):
    summary: str
    service: str
    user_email: str
    severity: str
    department: str

class Ticket(BaseModel):
    summary: str
    service: str
    user_email: str
    severity: str
    status: str
    department: str
    created_at: datetime

def create_ticket_impl(args: CreateTicketArgs) -> Dict[str, Any]:
    ticket = Ticket(
        summary=args.summary,
        service=args.service.lower(),
        user_email=args.user_email.lower(),
        severity=args.severity,
        status="open",
        department=args.department,
        created_at=datetime.utcnow(),
    )

    return {
        "status": "success",
        "ticket": ticket.model_dump(),  # pydantic v2 model_dump()
    }
Example: check_service_status_impl with defensive handling
  • Normalize the input, check a status store, and on unexpected errors return a structured error with a friendly message and a concise technical hint.
from typing import Dict, Any

_FAKE_SERVICE_STATUS: Dict[str, str] = {
    "email": "operational",
    "gitlab": "degraded",
    "vpn": "operational",
    "wifi": "down",
}

def check_service_status_impl(service_name: str) -> Dict[str, Any]:
    try:
        normalized = service_name.strip().lower()
        status = _FAKE_SERVICE_STATUS.get(normalized)

        if not status:
            return {
                "status": "error",
                "error_message": (
                    f"Unknown service '{service_name}'. "
                    f"Known services: {', '.join(sorted(_FAKE_SERVICE_STATUS.keys()))}."
                ),
            }

        return {
            "status": "success",
            "service": normalized,
            "status_text": status,
        }
    except Exception as exc:
        # User-facing message with a short technical hint (exception type)
        return {
            "status": "error",
            "error_message": (
                "Internal error while checking service status: "
                "Please try again later or contact IT. "
                f"(Technical details: {type(exc).__name__})"
            ),
        }
Example: lookup_user_impl returning structured error/success
from typing import Dict, Any

_FAKE_USER_DIRECTORY: Dict[str, Dict[str, Any]] = {
    "alice@example.com": {"name": "Alice", "department": "IT", "status": "active"},
    "bob@example.com": {"name": "Bob", "department": "HR", "status": "active"},
}

def lookup_user_impl(email: str) -> Dict[str, Any]:
    user = _FAKE_USER_DIRECTORY.get(email.lower())
    if not user:
        return {
            "status": "error",
            "error_message": f"No user found for email '{email}'.",
        }

    return {
        "status": "success",
        "user": {
            "email": email.lower(),
            "name": user["name"],
            "department": user["department"],
            "status": user["status"],
        },
    }
Important: do not expose full stack traces to end users
Avoid returning raw stack traces or long technical dumps to users. Return a short user-facing error_message and, if needed, a brief technical hint (exception type). Store full traces in logs/tracing for debugging.
How the LLM handles structured failures
  • Tools return a structured “status”: “error” and a clear “error_message” so the LLM can:
    • Produce a concise natural-language explanation (user-facing).
    • Offer an actionable next step (suggest valid service names, propose opening a ticket, etc.).
  • The agent session stays active; the LLM translates structured errors into helpful suggestions rather than crashing or exposing raw exceptions.
Example console run (trimmed and corrected)
(.venv) jeremy@MACSTUDIO ticketpro % adk run helpdesk_agent
Log setup complete: /tmp/agents_log/agent.latest.log
Running agent helpdesk_root_agent, type exit to exit.
[user]: is the 'cause_internal_error' service having issues?
[helpdesk_root_agent]: I can't find a service called 'cause_internal_error'. I can check the status of 'email', 'gitlab', 'vpn', and 'wifi'.
[user]:
  • In this example the tool returned an error indicating an unknown service ‘cause_internal_error’. The LLM examined that structured response and produced a helpful message listing available services and suggesting the user check for typos or try one of the known services.
Inspecting traces in the ADK web UI
  • The ADK web UI traces each function invocation and its structured response. Use the trace tab to inspect the functionResponse and see the exact status and error_message returned by the tool.
Example trace snippet (YAML-like representation from the trace tab)
content:
  parts:
    0:
      functionResponse:
        id: "adk-52a2bfc0-d054-48b5-a206-d695933905ca"
        name: "check_service_status_impl"
        response:
          status: "error"
          error_message: "Unknown service 'cause_internal_error'. Known services: email, gitlab, vpn, wifi."
        role: "user"
        invocationId: "e-29b4c468-60e7-4389-9981-405bff665ccc"
        author: "helpdesk_root_agent"
        id: "88f15d67-ee05-4666-855c-d231bba9269d"
        timestamp: 1765835965.212065
        title: "functionResponse:check_service_status_impl"
Summary — best practices
  • Convert unexpected exceptions into structured error responses (status + payload or error_message).
  • Provide a short user-facing message and an optional brief technical hint (exception type).
  • Keep tool responses consistent across your toolset so the LLM always has predictable signals.
  • Log full traces and technical details to your tracing/log system (e.g., ADK traces) rather than returning them to the user.
Next steps
  • Add a lightweight evaluation loop to verify critical flows automatically (for example: tool returns for common inputs and handling for intentionally injected failures).
  • Apply the same defensive pattern to all function tools so the agent always receives predictable, structured signals about success and failure.
References
  • ADK traces and tooling in the ADK web UI.
  • pydantic documentation: https://docs.pydantic.dev/
  • Follow structured error patterns to improve LLM-agent resiliency and user experience.