Error Handling Resilience

All right — we have an agent running and performing non-trivial tasks. In this lesson we’ll add robust error handling so the agent behaves predictably and remains resilient when tools fail.

Goals for tool and agent behavior

Tools fail gracefully and return clear, structured error signals.
The agent does not crash or leave sessions in a broken state.
Users always receive a helpful next step (for example: “please restart the service”, “check this file”, or “contact IT”).

ADK supplies management for sessions, tools, and traces, but it is our responsibility to ensure tools return consistent, structured success/failure information that the LLM can reliably interpret.

Always return structured results from tools so the LLM can interpret failures and produce helpful user-facing messages.

Why structured tool responses matter

Traditional apps throw exceptions, log them, and show a message. LLM-driven agents need machine-readable failure signals.
Convert failures into a predictable structured object so the model can:
- Recognize failure as a structured signal (not a raw stack trace).
- Produce concise, user-friendly explanations and next steps.
This reduces hallucinations, keeps sessions alive, and enables consistent user experience.

Note about structured returns

Instead of letting an exception bubble up as an unstructured Python stack trace, catch exceptions and return a structured object with:
- status: “success” | “error”
- payload fields (e.g., “ticket”, “service”, “status_text”)
- error_message: user-facing text and optional short technical hint (exception type or brief details)

Tool registration and basic imports

from typing import Dict, Any
from google.adk.tools import FunctionTool
from google.adk.agents.llm_agent import Agent
from google.adk.agents import LlmAgent

# Example tool imports (implementations below)
from tools.helpdesk_tools import (
    lookup_user_impl,
    check_service_status_impl,
    create_ticket_impl,
)

# Register function tools
lookup_user_tool = FunctionTool(func=lookup_user_impl)
check_service_status_tool = FunctionTool(func=check_service_status_impl)
create_ticket_tool = FunctionTool(func=create_ticket_impl)

Tool examples and expected structured outputs

Tool name	Success payload (typical)	Error payload (typical)
create_ticket_impl	status: “success”, ticket:	status: “error”, error_message: ”…“
check_service_status_impl	status: “success”, service: “email”, status_text: “operational”	status: “error”, error_message: “Unknown service ’…’. Known services: …“
lookup_user_impl	status: “success”, user:	status: “error”, error_message: “No user found for email ’…’.”

Example: create_ticket_impl (structured success return)

from typing import Dict, Any
from datetime import datetime
from pydantic import BaseModel

class CreateTicketArgs(BaseModel):
    summary: str
    service: str
    user_email: str
    severity: str
    department: str

class Ticket(BaseModel):
    summary: str
    service: str
    user_email: str
    severity: str
    status: str
    department: str
    created_at: datetime

def create_ticket_impl(args: CreateTicketArgs) -> Dict[str, Any]:
    ticket = Ticket(
        summary=args.summary,
        service=args.service.lower(),
        user_email=args.user_email.lower(),
        severity=args.severity,
        status="open",
        department=args.department,
        created_at=datetime.utcnow(),
    )

    return {
        "status": "success",
        "ticket": ticket.model_dump(),  # pydantic v2 model_dump()
    }

Example: check_service_status_impl with defensive handling

Normalize the input, check a status store, and on unexpected errors return a structured error with a friendly message and a concise technical hint.

from typing import Dict, Any

_FAKE_SERVICE_STATUS: Dict[str, str] = {
    "email": "operational",
    "gitlab": "degraded",
    "vpn": "operational",
    "wifi": "down",
}

def check_service_status_impl(service_name: str) -> Dict[str, Any]:
    try:
        normalized = service_name.strip().lower()
        status = _FAKE_SERVICE_STATUS.get(normalized)

        if not status:
            return {
                "status": "error",
                "error_message": (
                    f"Unknown service '{service_name}'. "
                    f"Known services: {', '.join(sorted(_FAKE_SERVICE_STATUS.keys()))}."
                ),
            }

        return {
            "status": "success",
            "service": normalized,
            "status_text": status,
        }
    except Exception as exc:
        # User-facing message with a short technical hint (exception type)
        return {
            "status": "error",
            "error_message": (
                "Internal error while checking service status: "
                "Please try again later or contact IT. "
                f"(Technical details: {type(exc).__name__})"
            ),
        }

Example: lookup_user_impl returning structured error/success

from typing import Dict, Any

_FAKE_USER_DIRECTORY: Dict[str, Dict[str, Any]] = {
    "alice@example.com": {"name": "Alice", "department": "IT", "status": "active"},
    "bob@example.com": {"name": "Bob", "department": "HR", "status": "active"},
}

def lookup_user_impl(email: str) -> Dict[str, Any]:
    user = _FAKE_USER_DIRECTORY.get(email.lower())
    if not user:
        return {
            "status": "error",
            "error_message": f"No user found for email '{email}'.",
        }

    return {
        "status": "success",
        "user": {
            "email": email.lower(),
            "name": user["name"],
            "department": user["department"],
            "status": user["status"],
        },
    }

Important: do not expose full stack traces to end users

Avoid returning raw stack traces or long technical dumps to users. Return a short user-facing error_message and, if needed, a brief technical hint (exception type). Store full traces in logs/tracing for debugging.

How the LLM handles structured failures

Tools return a structured “status”: “error” and a clear “error_message” so the LLM can:
- Produce a concise natural-language explanation (user-facing).
- Offer an actionable next step (suggest valid service names, propose opening a ticket, etc.).
The agent session stays active; the LLM translates structured errors into helpful suggestions rather than crashing or exposing raw exceptions.

Example console run (trimmed and corrected)

(.venv) jeremy@MACSTUDIO ticketpro % adk run helpdesk_agent
Log setup complete: /tmp/agents_log/agent.latest.log
Running agent helpdesk_root_agent, type exit to exit.
[user]: is the 'cause_internal_error' service having issues?
[helpdesk_root_agent]: I can't find a service called 'cause_internal_error'. I can check the status of 'email', 'gitlab', 'vpn', and 'wifi'.
[user]:

In this example the tool returned an error indicating an unknown service ‘cause_internal_error’. The LLM examined that structured response and produced a helpful message listing available services and suggesting the user check for typos or try one of the known services.

Inspecting traces in the ADK web UI

The ADK web UI traces each function invocation and its structured response. Use the trace tab to inspect the functionResponse and see the exact status and error_message returned by the tool.

Example trace snippet (YAML-like representation from the trace tab)

content:
  parts:
    0:
      functionResponse:
        id: "adk-52a2bfc0-d054-48b5-a206-d695933905ca"
        name: "check_service_status_impl"
        response:
          status: "error"
          error_message: "Unknown service 'cause_internal_error'. Known services: email, gitlab, vpn, wifi."
        role: "user"
        invocationId: "e-29b4c468-60e7-4389-9981-405bff665ccc"
        author: "helpdesk_root_agent"
        id: "88f15d67-ee05-4666-855c-d231bba9269d"
        timestamp: 1765835965.212065
        title: "functionResponse:check_service_status_impl"

Summary — best practices

Convert unexpected exceptions into structured error responses (status + payload or error_message).
Provide a short user-facing message and an optional brief technical hint (exception type).
Keep tool responses consistent across your toolset so the LLM always has predictable signals.
Log full traces and technical details to your tracing/log system (e.g., ADK traces) rather than returning them to the user.

Next steps

Add a lightweight evaluation loop to verify critical flows automatically (for example: tool returns for common inputs and handling for intentionally injected failures).
Apply the same defensive pattern to all function tools so the agent always receives predictable, structured signals about success and failure.

References

ADK traces and tooling in the ADK web UI.
pydantic documentation: https://docs.pydantic.dev/
Follow structured error patterns to improve LLM-agent resiliency and user experience.

Introduction

ADK Fundamentals

Deploying and Operating ADK Agents

Error Handling Resilience

Watch Video