Chapter 3 - Core ADK Concepts and Building Blocks
This article is part of my web book series. All of the chapters can be found here and the code is available on Github. For any issues around this book, contact me on LinkedIn
In the previous chapters, we set up our environment and had a first look at a simple ADK agent. Now, it’s time to delve deeper into the fundamental components that make up the Agent Development Kit. Understanding these core concepts is crucial for designing, building, and managing sophisticated AI agents. We’ll explore each building block, its purpose, and how it interacts with others, often referring back to our simple_assistant
or greeting_agent
examples. This chapter is heavy on theory and will have primarily conceptual code examples, including a lot of diagrams to help you wrap your head around these important ADK concepts.
Agents: The Heart of ADK (BaseAgent
, LlmAgent
)
At the very core of ADK are Agents. An agent is an entity capable of perceiving, reasoning, and acting. In ADK, agents are Python classes that encapsulate specific behaviors or skills.
google.adk.agents.BaseAgent
: This is the abstract base class for all agents in ADK. Any custom agent you create must inherit fromBaseAgent
. It defines the fundamental interface for an agent, including:name: str
: A unique identifier for the agent.description: str
: A natural language description of the agent’s capabilities, used by other agents (or LLMs) to decide when to delegate tasks to it.run_async(...)
: The primary coroutine method that executes the agent’s logic.run_live(...)
: An experimental coroutine for real-time, streaming interactions.sub_agents: list[BaseAgent]
: A list to hold child agents, enabling multi-agent systems.parent_agent: Optional[BaseAgent]
: A reference to its parent agent in a hierarchy.
google.adk.agents.LlmAgent
(aliased asAgent
): This is the most commonly used agent type and the one we’ve seen so far. It’s a specializedBaseAgent
designed to interact with Large Language Models (LLMs).- It inherits all properties of
BaseAgent
. - Key additional properties:
model: Union[str, BaseLlm]
: Specifies the LLM to use (e.g.,"gemini-1.5-flash-latest"
).instruction: Union[str, InstructionProvider]
: The system prompt or guiding instructions for the LLM.tools: list[ToolUnion]
: A list of tools the agent can use.planner: Optional[BasePlanner]
: For enabling more complex reasoning and planning.
1 2 3 4 5 6 7 8 9 10
# From previous examples: from google.adk.agents import Agent # This is LlmAgent simple_assistant = Agent( name="simple_assistant", model="gemini-2.0-flash", instruction="You are a friendly and helpful assistant.", description="A basic assistant to answer questions." )
- It inherits all properties of
Best Practice: Clear Agent Descriptions
For LlmAgent, the name should be a concise identifier, while the description should be a clear, natural language explanation of what the agent does and when it should be used. This description is often used by an orchestrating LLM (in multi-agent systems) to decide if this agent is the right one for a task. Make it informative!
Agents are the fundamental actors in an ADK system. They can be simple, single-purpose entities or complex orchestrators managing other sub-agents.
Runners: Executing Your Agents (Runner
, InMemoryRunner
)
A Runner is responsible for the actual execution of an agent. It takes user input, manages the session, invokes the appropriate agent, and streams back the events generated by the agent.
google.adk.runners.Runner
: The primary class for running agents.- It requires an
app_name
, theroot_agent
to run, and instances of various services (likeSessionService
,ArtifactService
,MemoryService
). - Core methods:
run(user_id, session_id, new_message, ...)
: Synchronous wrapper for local testing.run_async(user_id, session_id, new_message, ...)
: The asynchronous method that yieldsEvent
objects. This is the primary method used internally and for production scenarios.run_live(...)
: For experimental bidirectional streaming interactions.
- It requires an
google.adk.runners.InMemoryRunner
: A convenient subclass ofRunner
that comes pre-configured with in-memory implementations forSessionService
,ArtifactService
, andMemoryService
. This is ideal for quick local development, testing, and examples where persistence is not required.1 2 3 4 5 6 7 8 9 10 11 12 13 14
from google.adk.runners import InMemoryRunner from google.adk.agents import Agent root_agent = Agent(name="my_agent", model="gemini-1.5-flash-latest", instruction="Be helpful.") runner = InMemoryRunner(agent=root_agent, app_name="MyApp") # Conceptual usage for run_async: # async for event in runner.run_async( # user_id="user123", # session_id="sessionXYZ", # new_message=Content(parts=[Part(text="Hello")]) # ): # print(event)
The Runner orchestrates the entire lifecycle of an agent interaction for a given user session.
InMemoryRunner vs. Runner
Use InMemoryRunner for quick local tests, examples, and when you don’t need conversation history or state to persist between runs. Switch to the base Runner class when you need to integrate with persistent services like DatabaseSessionService or VertexAiSessionService for more robust applications.
Tools & Toolsets: Extending Agent Capabilities (BaseTool
, BaseToolset
, FunctionTool
)
LLMs are powerful, but their knowledge is limited to their training data and they can’t inherently interact with the outside world. Tools bridge this gap, allowing agents to:
- Fetch real-time information (e.g., web search, stock prices).
- Interact with external APIs (e.g., booking systems, databases).
- Perform calculations or data manipulations.
- Access local files or user-specific data.
google.adk.tools.BaseTool
: The abstract base class for all tools.name: str
: The name the LLM will use to refer to this tool.description: str
: A clear explanation of what the tool does, its parameters, and what it returns. This is crucial for the LLM to understand when and how to use the tool._get_declaration()
: Returns aFunctionDeclaration
(OpenAPI-like schema) describing the tool’s parameters. This is what the LLM sees.run_async(args, tool_context)
: The method that executes the tool’s logic.
google.adk.tools.FunctionTool
: A convenient way to wrap any Python callable (function or method) as an ADKBaseTool
. ADK automatically infers theFunctionDeclaration
from the Python function’s signature and docstring.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
from google.adk.tools import FunctionTool from google.adk.agents import Agent def get_current_weather(location: str, unit: str = "celsius") -> str: """ Gets the current weather for a given location. Args: location: The city and state, e.g., San Francisco, CA unit: The temperature unit, either "celsius" or "fahrenheit". Returns: A string describing the current weather. """ if "london" in location.lower(): return f"The weather in London is 15 degrees {unit} and cloudy." return f"Sorry, I don't have weather information for {location}." weather_tool = FunctionTool(func=get_current_weather) weather_agent = Agent( name="weather_reporter", model="gemini-2.0-flash", instruction="You are a weather reporter. Use the available tools to answer questions about weather.", tools=[weather_tool] )
Best Practice: Docstrings are Tool Descriptions
For FunctionTool, the Python function’s docstring becomes the description provided to the LLM. Write clear, comprehensive docstrings explaining what the function does, its parameters (including their types if not obvious from type hints), and what it returns. This directly impacts how well the LLM can understand and use your tool.
Tool Name Uniqueness and LLM Interpretation
Ensure tool names are unique within the set of tools an agent can access. Also, be mindful that LLMs interpret tool names and descriptions literally. A poorly named or described tool can lead to the LLM misusing it or failing to use it when appropriate.
google.adk.tools.BaseToolset
: An abstract class for grouping related tools. Toolsets can dynamically provide a list of tools based on context. Examples includeOpenAPIToolset
(from OpenAPI specs) andGoogleApiToolset
(for Google APIs).
Models: The Brains of Your Agents (BaseLlm
, Model Registry)
The “intelligence” of an LlmAgent
comes from an underlying Large Language Model. ADK provides an abstraction layer for interacting with different LLMs.
google.adk.models.BaseLlm
: The abstract base class for all LLM integrations.model: str
: The specific model identifier (e.g.,"gemini-1.5-flash-latest"
).generate_content_async(llm_request, stream)
: The core method for sending a request to the LLM and receiving a response.connect(llm_request)
: For establishing live, bidirectional streaming connections.
- Concrete Implementations:
google.adk.models.Gemini
: For interacting with Google’s Gemini family of models.google.adk.models.AnthropicLlm
: For Anthropic’s Claude models (via Vertex AI).google.adk.models.LiteLlm
: A wrapper around thelitellm
library, enabling support for a wide range of models (OpenAI, Azure, Cohere, etc.).
google.adk.models.LLMRegistry
: A central registry that maps model name patterns (regex) to their correspondingBaseLlm
implementation classes. This allows ADK to automatically instantiate the correct LLM client based on the model string provided to anLlmAgent
.1 2 3 4 5 6 7
# When an LlmAgent is defined: # agent = Agent(model="gemini-1.5-pro-latest", ...) # ADK's LLMRegistry resolves "gemini-1.5-pro-latest" to the Gemini class. # agent = Agent(model=LiteLlm(model="openai/gpt-4"), ...) # Here, we explicitly provide a LiteLlm instance.
The BaseLlm
interface ensures that agents can interact with different models in a consistent way.
Sessions & State: Managing Conversations and Memory (Session
, State
, BaseSessionService
)
Agents often need to remember past parts of a conversation or maintain information across multiple turns. ADK provides Session
and State
objects for this.
google.adk.sessions.Session
: Represents a single, continuous interaction (conversation) between a user and an agent system.id: str
: A unique identifier for the session.app_name: str
: The application this session belongs to.user_id: str
: The user initiating the session.events: list[Event]
: A chronological list of all events that have occurred in the session.state: dict[str, Any]
: A dictionary holding the current state associated with this specific session. This state persists only for the duration of this session unless a persistentSessionService
is used.last_update_time: float
: Timestamp of the last modification.
google.adk.sessions.State
: A utility class that provides a delta-aware view into the session’s state, typically used withinCallbackContext
orToolContext
. It allows modifications to be tracked and persisted. ADK also supports scoped state:- Session State (default):
state['my_var'] = value
- User State:
state['user:my_user_pref'] = value
(persists across sessions for that user). - App State:
state['app:global_config'] = value
(persists across all users and sessions for that app). - Temp State:
state['temp:transient_info'] = value
(not persisted byDatabaseSessionService
). Temp state changes are available only for one user-turn.
- Session State (default):
Scoped State for Clarity
Using state scopes (user:, app:) helps organize your session data and clarify its intended lifecycle and persistence. For example, user:theme_preference is clearly tied to a specific user across sessions, while app:api_version could be a global setting. temp: is useful for data that should not be persisted by DatabaseSessionService but is needed during a single Runner.run() invocation.
google.adk.sessions.BaseSessionService
: An abstract class defining the interface for managing session persistence.create_session(...)
get_session(...)
append_event(...)
- Implementations include:
InMemorySessionService
: Keeps sessions in memory (lost when the process ends). Used byInMemoryRunner
.DatabaseSessionService
: Persists sessions to a SQL database (e.g., MySQL, PostgreSQL) using SQLAlchemy.VertexAiSessionService
: Leverages Google Cloud for managed session storage.
Events: The Communication Protocol (Event
, EventActions
)
Events are the primary means of communication and data logging within ADK. Every significant occurrence during an agent’s execution is captured as an Event
.
google.adk.events.Event
: Represents a single event in a session. It’s a Pydantic model inheriting fromLlmResponse
and adding more context.id: str
: Unique ID for the event.invocation_id: str
: ID of the overall user turn/invocation this event belongs to.author: str
: Who generated this event (e.g.,"user"
,"simple_assistant"
).content: Optional[types.Content]
: The actual payload (text, function call, function response, file data).timestamp: float
: When the event occurred.actions: EventActions
: Actions associated with this event (see below).partial: Optional[bool]
: True if this is part of a streaming response.branch: Optional[str]
: For multi-agent systems, indicates the agent path.- Other fields inherited from
LlmResponse
likeerror_code
,usage_metadata
.
google.adk.events.EventActions
: A Pydantic model attached to anEvent
, indicating what actions should be taken or what state changes occurred.state_delta: dict
: Changes to be applied to the session state.artifact_delta: dict
: Information about saved artifacts.transfer_to_agent: Optional[str]
: If set, indicates the agent system should transfer control to the named agent.skip_summarization: Optional[bool]
: For tool responses, indicates if the LLM should summarize the tool output or use it directly.escalate: Optional[bool]
: Used byLoopAgent
to signal exiting the loop.
When a Runner
executes run_async
, it yields a stream of these Event
objects.
Best Practice: Leverage Event Granularity for Debugging
The stream of Event objects provides a fine-grained log of the agent’s activity. When debugging, inspect the sequence of events (especially in the Dev UI’s Trace view) to understand the exact flow of text, tool calls, tool responses, and state changes. This is much more powerful than simple print debugging.
Partial Events in Streaming
When streaming responses from an LLM, you’ll receive multiple Event objects where event.partial is True, followed by a final event where event.partial is False (or None). Your application code consuming these events needs to handle this by accumulating partial text if a continuous stream is desired for the UI.
Contexts
ADK uses context objects to pass around necessary information during agent execution.
google.adk.agents.invocation_context.InvocationContext
:- This object holds the context for a single invocation (one complete turn of user input and agent response). It’s created by the
Runner
and passed down through the agent chain. - Contains references to services (
SessionService
,ArtifactService
,MemoryService
), the currentSession
, the currentAgent
being run, the initialuser_content
for this invocation,RunConfig
, and more. - It’s the primary way agents access shared services and session data.
- This object holds the context for a single invocation (one complete turn of user input and agent response). It’s created by the
google.adk.agents.callback_context.CallbackContext
:- Used as the argument for various callback functions (e.g.,
before_agent_callback
,after_tool_callback
). - Provides a read-only view of most
InvocationContext
attributes but allows modification ofstate
(viastate_delta
inEventActions
) and saving artifacts.
- Used as the argument for various callback functions (e.g.,
google.adk.tools.tool_context.ToolContext
:- A subclass of
CallbackContext
, specifically passed to tool execution methods (tool.run_async
). - Includes the
function_call_id
for the current tool invocation. - Allows tools to access session state, save artifacts, and request credentials.
- A subclass of
These context objects ensure that different parts of the ADK framework have the necessary information to perform their tasks without tightly coupling them.
Context Objects for Decoupling
InvocationContext, CallbackContext, and ToolContext are key to ADK’s modularity. They provide necessary information to components (agents, callbacks, tools) without requiring direct dependencies on the Runner or other high-level orchestrators. This promotes cleaner, more testable code.
Artifacts: Storing and Retrieving Agent-Generated Files (BaseArtifactService
)
Agents might need to work with files – reading input files, generating output files (like images, reports, or code). The Artifact Service manages these.
google.adk.artifacts.BaseArtifactService
: Abstract interface for artifact storage.save_artifact(...)
load_artifact(...)
- Implementations:
InMemoryArtifactService
: Stores artifacts in memory.GcsArtifactService
: Stores artifacts in Google Cloud Storage.
- Agents and tools can interact with artifacts via
CallbackContext.save_artifact()
andCallbackContext.load_artifact()
, or by using theLoadArtifactsTool
.
Memory: Long-term Knowledge for Agents (BaseMemoryService
)
While session state handles short-term memory within a single conversation, the Memory Service allows agents to retain and recall information across different sessions, providing a form of long-term memory.
google.adk.memory.BaseMemoryService
: Abstract interface for long-term memory.add_session_to_memory(session)
: Ingests a session’s events into the memory.search_memory(app_name, user_id, query)
: Searches the memory for relevant information.
google.adk.memory.MemoryEntry
: The structure representing a piece of retrieved memory.- Implementations:
InMemoryMemoryService
: A simple keyword-based in-memory store.VertexAiRagMemoryService
: Leverages Vertex AI RAG for powerful semantic search over ingested session data.
- Tools like
LoadMemoryTool
andPreloadMemoryTool
facilitate agent interaction with this service.
What’s Next?
With a solid understanding of these core ADK building blocks, we are now equipped to start assembling them into functional agents. In the next part of the book, “Part 2: Building and Empowering Single Agents,” we will begin by crafting our first LlmAgent
in detail, exploring how to give it instructions, interact with the LLM, and handle its responses.