Guidelines for implementing structured error handling in LLM agents and tools to produce predictable, machine readable failures and helpful user facing messages while keeping sessions resilient
All right — we have an agent running and performing non-trivial tasks. In this lesson we’ll add robust error handling so the agent behaves predictably and remains resilient when tools fail.
Goals for tool and agent behavior
Tools fail gracefully and return clear, structured error signals.
The agent does not crash or leave sessions in a broken state.
Users always receive a helpful next step (for example: “please restart the service”, “check this file”, or “contact IT”).
ADK supplies management for sessions, tools, and traces, but it is our responsibility to ensure tools return consistent, structured success/failure information that the LLM can reliably interpret.
Always return structured results from tools so the LLM can interpret failures and produce helpful user-facing messages.
Why structured tool responses matter
Traditional apps throw exceptions, log them, and show a message. LLM-driven agents need machine-readable failure signals.
Convert failures into a predictable structured object so the model can:
Recognize failure as a structured signal (not a raw stack trace).
Produce concise, user-friendly explanations and next steps.
This reduces hallucinations, keeps sessions alive, and enables consistent user experience.
Note about structured returns
Instead of letting an exception bubble up as an unstructured Python stack trace, catch exceptions and return a structured object with:
from typing import Dict, Any_FAKE_USER_DIRECTORY: Dict[str, Dict[str, Any]] = { "alice@example.com": {"name": "Alice", "department": "IT", "status": "active"}, "bob@example.com": {"name": "Bob", "department": "HR", "status": "active"},}def lookup_user_impl(email: str) -> Dict[str, Any]: user = _FAKE_USER_DIRECTORY.get(email.lower()) if not user: return { "status": "error", "error_message": f"No user found for email '{email}'.", } return { "status": "success", "user": { "email": email.lower(), "name": user["name"], "department": user["department"], "status": user["status"], }, }
Important: do not expose full stack traces to end users
Avoid returning raw stack traces or long technical dumps to users. Return a short user-facing error_message and, if needed, a brief technical hint (exception type). Store full traces in logs/tracing for debugging.
How the LLM handles structured failures
Tools return a structured “status”: “error” and a clear “error_message” so the LLM can:
Produce a concise natural-language explanation (user-facing).
Offer an actionable next step (suggest valid service names, propose opening a ticket, etc.).
The agent session stays active; the LLM translates structured errors into helpful suggestions rather than crashing or exposing raw exceptions.
Example console run (trimmed and corrected)
Copy
(.venv) jeremy@MACSTUDIO ticketpro % adk run helpdesk_agentLog setup complete: /tmp/agents_log/agent.latest.logRunning agent helpdesk_root_agent, type exit to exit.[user]: is the 'cause_internal_error' service having issues?[helpdesk_root_agent]: I can't find a service called 'cause_internal_error'. I can check the status of 'email', 'gitlab', 'vpn', and 'wifi'.[user]:
In this example the tool returned an error indicating an unknown service ‘cause_internal_error’. The LLM examined that structured response and produced a helpful message listing available services and suggesting the user check for typos or try one of the known services.
Inspecting traces in the ADK web UI
The ADK web UI traces each function invocation and its structured response. Use the trace tab to inspect the functionResponse and see the exact status and error_message returned by the tool.
Example trace snippet (YAML-like representation from the trace tab)
Convert unexpected exceptions into structured error responses (status + payload or error_message).
Provide a short user-facing message and an optional brief technical hint (exception type).
Keep tool responses consistent across your toolset so the LLM always has predictable signals.
Log full traces and technical details to your tracing/log system (e.g., ADK traces) rather than returning them to the user.
Next steps
Add a lightweight evaluation loop to verify critical flows automatically (for example: tool returns for common inputs and handling for intentionally injected failures).
Apply the same defensive pattern to all function tools so the agent always receives predictable, structured signals about success and failure.