Chapter 16 - ADK Runner and Runtime Configuration
This article is part of my web book series. All of the chapters can be found here and the code is available on Github. For any issues around this book, contact me on LinkedIn
We’ve spent considerable time designing and building agents, equipping them with tools, and even orchestrating them into multi-agent systems. Now, we turn our attention to the engine that brings these agents to life: the ADK Runner. This chapter delves into the Runner
class, its core execution methods, and how you can customize the runtime behavior of your agents using the RunConfig
object for features like streaming, speech interaction, and more.
The Runner
Class in Depth
The google.adk.runners.Runner
is the central component responsible for executing your ADK agents. It acts as the bridge between your application code (which initiates an agent interaction) and the agent system itself.
Key Responsibilities of the Runner
:
- Session Management: It interacts with a
BaseSessionService
to create, retrieve, and update agent sessions. This includes loading conversation history and state at the beginning of an interaction and saving new events and state changes. - Agent Invocation: It identifies the correct root agent to invoke (or the appropriate agent to resume a conversation with) and calls its
run_async
(orrun_live
) method. - Context Creation: It constructs the
InvocationContext
object, providing the agent with all necessary information (session, services, user input, run configuration). - Event Streaming: It consumes the asynchronous stream of
Event
objects yielded by the agent and makes them available to your application. - Input Handling: It processes new user messages, potentially saving input blobs as artifacts before passing them to the agent.
Initializing a Runner
: You typically initialize a Runner
by providing:
app_name: str
: A name for your application, used for namespacing sessions.agent: BaseAgent
: The root agent of your application.session_service: BaseSessionService
: An instance of a session service (e.g.,InMemorySessionService
,DatabaseSessionService
).artifact_service: Optional[BaseArtifactService]
: (Optional) An instance of an artifact service.memory_service: Optional[BaseMemoryService]
: (Optional) An instance of a memory service.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
from google.adk.agents import Agent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService, DatabaseSessionService
from google.adk.artifacts import InMemoryArtifactService, GcsArtifactService
from google.adk.memory import InMemoryMemoryService
import os
from building_intelligent_agents.utils import load_environment_variables, DEFAULT_LLM
load_environment_variables()
# Define a simple agent
my_root_agent = Agent(
name="main_app_agent",
model=DEFAULT_LLM,
instruction="You are the main agent for this application."
)
# Option 1: Using all InMemory services (similar to InMemoryRunner)
in_memory_session_svc = InMemorySessionService()
in_memory_artifact_svc = InMemoryArtifactService()
in_memory_memory_svc = InMemoryMemoryService()
runner_in_memory = Runner(
app_name="MyInMemoryApp",
agent=my_root_agent,
session_service=in_memory_session_svc,
artifact_service=in_memory_artifact_svc,
memory_service=in_memory_memory_svc
)
print(f"Runner initialized with InMemory services for app: {runner_in_memory.app_name}")
# Option 2: Using persistent services (conceptual, requires setup)
# Ensure GOOGLE_CLOUD_PROJECT and potentially other env vars are set for GCS/Database
GOOGLE_CLOUD_PROJECT = os.getenv("GOOGLE_CLOUD_PROJECT", "my-gcp-project")
GCS_BUCKET_NAME = os.getenv("ADK_ARTIFACT_GCS_BUCKET", "my-adk-artifacts-bucket")
# Example: "mysql+pymysql://user:pass@host/db" or "sqlite:///./my_adk_sessions.db"
DATABASE_URL = os.getenv("ADK_DATABASE_URL", "sqlite:///./adk_sessions_chapter15.db")
if GOOGLE_CLOUD_PROJECT and GCS_BUCKET_NAME and DATABASE_URL:
try:
db_session_svc = DatabaseSessionService(db_url=DATABASE_URL)
gcs_artifact_svc = GcsArtifactService(bucket_name=GCS_BUCKET_NAME, project=GOOGLE_CLOUD_PROJECT)
# memory_svc_persistent = VertexAiRagMemoryService(...) # Or other persistent memory
runner_persistent = Runner(
app_name="MyPersistentApp",
agent=my_root_agent,
session_service=db_session_svc,
artifact_service=gcs_artifact_svc,
# memory_service=memory_svc_persistent
memory_service=InMemoryMemoryService() # Placeholder for simplicity
)
print(f"Runner initialized with persistent services for app: {runner_persistent.app_name}")
except Exception as e:
print(f"Could not initialize persistent Runner: {e}")
print("Ensure Database, GCS bucket, and relevant SDKs/permissions are set up.")
else:
print("Skipping persistent Runner setup due to missing env vars (GOOGLE_CLOUD_PROJECT, ADK_ARTIFACT_GCS_BUCKET, ADK_DATABASE_URL).")
Decoupling Agent Logic from Persistence
The Runner’s design, requiring explicit service instances, promotes loose coupling. Your core agent logic (LlmAgent definitions, tools) remains independent of how sessions, artifacts, or memory are stored. This makes it easy to switch from local in-memory development to production-grade persistent backends.
Core Execution Methods:
run_async(user_id, session_id, new_message, run_config=RunConfig()) -> AsyncGenerator[Event, None]
:- This is the primary asynchronous method for executing an agent turn.
- It retrieves or creates the session, appends the
new_message
(if any), constructs theInvocationContext
, invokes the appropriate agent’srun_async
method, and yields the stream ofEvent
objects generated by the agent. - It also handles persisting events and state changes via the
SessionService
for non-partial events.
run(user_id, session_id, new_message, run_config=RunConfig()) -> Generator[Event, None, None]
:- A synchronous wrapper around
run_async
. It’s convenient for simple scripts and local testing where fullasyncio
orchestration isn’t desired. - Internally, it runs
run_async
in a separate thread and uses a queue to yield events back to the synchronous caller.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
from google.adk.agents import Agent from google.adk.runners import InMemoryRunner # InMemoryRunner is a pre-configured Runner from google.genai.types import Content, Part from building_intelligent_agents.utils import load_environment_variables, create_session, DEFAULT_LLM load_environment_variables() # Load environment variables for ADK configuration greet_agent = Agent(name="greeter", model=DEFAULT_LLM, instruction="Greet the user warmly.") runner = InMemoryRunner(agent=greet_agent, app_name="GreetApp") user_msg = Content(parts=[Part(text="Hello there!")], role="user") # User message to the agent async def use_run_async(): print(" --- Using run_async ---") session_id = "s_async" user_id = "async_user" create_session(runner, user_id=user_id, session_id=session_id) async for event in runner.run_async(user_id=user_id, session_id=session_id, new_message=user_msg): if event.content and event.content.parts and event.content.parts[0].text: print(f"Async Event from {event.author}: {event.content.parts[0].text.strip()}") def use_run_sync(): print(" --- Using run (sync wrapper) ---") session_id = "s_sync" user_id = "sync_user" create_session(runner, user_id=user_id, session_id=session_id) for event in runner.run(user_id="sync_user", session_id="s_sync", new_message=user_msg): if event.content and event.content.parts and event.content.parts[0].text: print(f"Sync Event from {event.author}: {event.content.parts[0].text.strip()}")
- A synchronous wrapper around
run_live(user_id, session_id, live_request_queue, run_config=RunConfig(), ...)
:- An experimental method for bidirectional streaming interactions, typically involving audio input/output.
- It takes a
LiveRequestQueue
for sending real-time data (like audio chunks) to the agent and yieldsEvent
objects as the agent processes and responds. - This is used for more advanced scenarios like voice bots. We’ll touch upon this conceptually when discussing
RunConfig
.
InMemoryRunner
for Local Development and Testing
As seen in many examples, google.adk.runners.InMemoryRunner
is a subclass of Runner
that simplifies setup for local development:
1
2
3
4
5
6
7
from google.adk.runners import InMemoryRunner
from google.adk.agents import Agent
my_agent = Agent(name="test_agent", model="gemini-2.0-flash", instruction="Be brief.")
# InMemoryRunner automatically sets up InMemorySessionService, InMemoryArtifactService, InMemoryMemoryService
runner = InMemoryRunner(agent=my_agent, app_name="TestApp")
It’s perfect for:
- Quickly testing agent logic.
- Running examples from this book.
- Unit tests where you don’t need persistent state across test runs.
Best Practice: Start with InMemoryRunner
For new projects or when learning ADK, InMemoryRunner is the easiest way to get started. You can focus on defining your agent’s logic and tools without worrying about database or cloud storage setup. Transition to a Runner with persistent services when you need to save conversation history or state beyond a single execution of your script.
Understanding InvocationContext
and its Lifecycle
The google.adk.agents.invocation_context.InvocationContext
is a Pydantic model that acts as a carrier for all relevant information during a single agent invocation (one full processing turn for a new_message
).
Key Attributes of InvocationContext
:
invocation_id: str
: A unique ID for this specific turn.session: Session
: The currentSession
object (including history and state).agent: BaseAgent
: The current agent being executed within this invocation.user_content: Optional[types.Content]
: The initial user message that triggered this invocation.run_config: Optional[RunConfig]
: The runtime configuration for this invocation.- References to
artifact_service
,session_service
,memory_service
. live_request_queue: Optional[LiveRequestQueue]
: Forrun_live
.end_invocation: bool
: A flag that can be set by callbacks or tools to prematurely terminate the current invocation._invocation_cost_manager
: Tracks metrics like LLM calls.
The Runner
creates an InvocationContext
at the start of run_async
or run_live
. This same context object (or a copy with the agent
attribute updated) is passed down if control transfers between agents in a multi-agent system. This ensures all agents in a single turn share the same session view, services, and run configuration.
Customizing Runtime with RunConfig
The google.adk.agents.run_config.RunConfig
Pydantic model allows you to customize various runtime behaviors of the agent when you call runner.run_async
or runner.run
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
from google.adk.agents import Agent
from google.adk.runners import InMemoryRunner, RunConfig
from google.adk.agents.run_config import StreamingMode
from google.genai.types import SpeechConfig, Content, Part # For speech and transcription
import asyncio
from building_intelligent_agents.utils import load_environment_variables, create_session, DEFAULT_LLM
load_environment_variables() # Load environment variables for ADK configuration
# This example is conceptual for some features like speech, as they require
# actual audio input/output capabilities not easily shown in a text script.
story_agent = Agent(
name="story_teller",
model=DEFAULT_LLM,
instruction="Tell a very short, one-sentence story."
)
runner = InMemoryRunner(agent=story_agent, app_name="ConfigDemoApp")
session_id = "s_user_rc" # Session ID for the user
user_id = "user_rc" # User ID for the session
create_session(runner, user_id=user_id, session_id=session_id) # Create a session for the user
user_input_message = Content(parts=[Part(text="A story please.")], role="user") # User's input message to the agent
async def demo_run_configs():
# Scenario 1: Default RunConfig (no streaming)
print("
--- Scenario 1: Default (No Streaming) ---")
default_config = RunConfig()
async for event in runner.run_async(user_id=user_id, session_id=session_id, new_message=user_input_message, run_config=default_config):
if event.content and event.content.parts[0].text: print(event.content.parts[0].text.strip())
# Scenario 2: SSE Streaming
print("
--- Scenario 2: SSE Streaming ---")
sse_config = RunConfig(streaming_mode=StreamingMode.SSE)
async for event in runner.run_async(user_id=user_id, session_id=session_id, new_message=user_input_message, run_config=sse_config):
if event.content and event.content.parts[0].text:
print(event.content.parts[0].text, end="", flush=True) # Print chunks
print()
# Scenario 3: Limiting LLM Calls (Conceptual)
# This agent doesn't make many calls, but shows the config
print("
--- Scenario 3: Max LLM Calls (Conceptual) ---")
# If agent tries more than 1 LLM call, LlmCallsLimitExceededError would be raised by InvocationContext
# For this simple agent, it will likely make only 1 call.
limit_config = RunConfig(max_llm_calls=1)
try:
async for event in runner.run_async(user_id=user_id, session_id=session_id, new_message=user_input_message, run_config=limit_config):
if event.content and event.content.parts[0].text: print(event.content.parts[0].text.strip())
except Exception as e: # Catching generic Exception for demo
print(f" Caught expected error due to max_llm_calls: {type(e).__name__} - {e}")
# Scenario 4: Input Blobs as Artifacts (Conceptual)
print("
--- Scenario 4: Save Input Blobs as Artifacts (Conceptual) ---")
artifact_config = RunConfig(save_input_blobs_as_artifacts=True)
# If user_input_message contained a Part with inline_data (e.g., an image),
# the Runner would save it to the ArtifactService before passing to agent.
# The agent would see a text part like "Uploaded file: artifact_..."
# This requires an ArtifactService to be configured with the Runner.
# runner_with_artifacts = Runner(..., artifact_service=InMemoryArtifactService())
# await runner_with_artifacts.run_async(..., run_config=artifact_config, new_message=message_with_blob)
print(" (This would save input blobs to ArtifactService if message contained them and ArtifactService was active)")
# Scenario 5: Compositional Function Calling (CFC) - Experimental for SSE
# Requires a model supporting CFC (e.g., Gemini 2.0+ via LIVE API)
# and BuiltInCodeExecutor or tools that benefit from it.
# print("
--- Scenario 5: Compositional Function Calling (CFC) via SSE ---")
# cfc_config = RunConfig(
# support_cfc=True,
# streaming_mode=StreamingMode.SSE # CFC currently implies SSE via LIVE API usage
# )
# An agent using BuiltInCodeExecutor or complex tools would benefit.
# For this simple agent, it won't show much difference in output.
# The underlying LLM call mechanism changes to use the LIVE API.
# print(" (Agent would use LIVE API for potential CFC if tools/code exec were involved)")
# async for event in runner.run_async(user_id=user_id, session_id=session_id, new_message=user_input_message, run_config=cfc_config):
# if event.content and event.content.parts[0].text: print(event.content.parts[0].text.strip())
# Scenario 6: Bidirectional Streaming (BIDI) with Speech & Transcription - Highly Conceptual for CLI
# This is for `runner.run_live()` and requires actual audio streams.
# print("
--- Scenario 6: BIDI Streaming with Speech & Transcription (Conceptual for CLI) ---")
# bidi_config = RunConfig(
# streaming_mode=StreamingMode.BIDI,
# speech_config=SpeechConfig(
# # See google.genai.types.SpeechConfig for options
# # Example: engine="chirp_universal", language_codes=["en-US"]
# ),
# response_modalities=["AUDIO", "TEXT"], # Agent can respond with audio and/or text
# output_audio_transcription=AudioTranscriptionConfig(), # Get text of agent's audio
# input_audio_transcription=AudioTranscriptionConfig() # Transcribe user's audio input
# )
# print(f" Configured for BIDI: {bidi_config.model_dump_json(indent=2, exclude_none=True)}")
# To use this:
# from google.adk.agents.live_request_queue import LiveRequestQueue
# live_queue = LiveRequestQueue()
# # In a real app, you'd feed audio chunks to live_queue.send_realtime(Blob(...))
# async for event in runner.run_live(..., live_request_queue=live_queue, run_config=bidi_config):
# # Process events, which might include audio blobs or transcriptions
# live_queue.close()
print(" (Actual run_live with BIDI/speech needs real audio input/output handling)")
if __name__ == "__main__":
asyncio.run(demo_run_configs())
Key RunConfig
Attributes:
streaming_mode: StreamingMode
:StreamingMode.NONE
(default): Standard request/response.StreamingMode.SSE
(Server-Sent Events): Enables unidirectional streaming of text responses from the LLM. TheRunner
uses the LLM’s streaming generation endpoint.StreamingMode.BIDI
(Bidirectional): Experimental. For live, two-way streaming, typically involving audio. This mode makes theRunner
use the LLM’sconnect()
method andrun_live()
.
speech_config: Optional[types.SpeechConfig]
: (For BIDI streaming) Configures speech-to-text (STT) and text-to-speech (TTS) engines, language codes, etc., if the agent is interacting via voice.response_modalities: Optional[list[str]]
: (For BIDI streaming) Specifies what kind of output the agent can produce (e.g.,["AUDIO", "TEXT"]
).save_input_blobs_as_artifacts: bool
: IfTrue
, anyPart
in thenew_message
that containsinline_data
(e.g., an image or audio file uploaded by the user) will be automatically saved to theArtifactService
by theRunner
before the agent processes the message. ThePart
in the message passed to the agent will be replaced with a text placeholder like “Uploaded file: artifact_…”.support_cfc: bool
: Experimental. IfTrue
(andstreaming_mode
isSSE
), ADK will attempt to use the LLM’s LIVE API endpoint, which may enable Compositional Function Calling (CFC) for models that support it. CFC allows for more complex, nested, or parallel tool calls in a single LLM turn.output_audio_transcription: Optional[types.AudioTranscriptionConfig]
: (For BIDI streaming with audio output) If set, requests a text transcription of the agent’s spoken audio response.input_audio_transcription: Optional[types.AudioTranscriptionConfig]
: (For BIDI streaming with audio input) If set, instructs the LLM (or ADK’s internal transcriber if model doesn’t support it directly on input) to provide a text transcription of the user’s spoken audio.max_llm_calls: int
: (Default: 500) A safeguard to prevent runaway loops or excessive LLM interactions within a singlerunner.run_async()
invocation. If the number of calls tollm.generate_content_async()
exceeds this limit, anLlmCallsLimitExceededError
is raised. Set to0
or negative to disable the limit.
Best Practice: Use RunConfig for Runtime Flexibility
RunConfig allows you to change how an agent executes (e.g., streaming vs. non-streaming) without modifying the agent’s core definition. This is useful for adapting the same agent logic to different interaction modalities or performance requirements.
Experimental Features in RunConfig
Features like StreamingMode.BIDI and support_cfc are often experimental and their behavior or API might change in future ADK versions. Always check the latest ADK documentation for the status of these features. BIDI streaming, in particular, requires significant application-side logic to handle actual audio input/output.
What’s Next?
We’ve now thoroughly explored the ADK Runner
and how RunConfig
allows for fine-grained control over agent execution. This knowledge is essential for moving your agents from simple local scripts to more robust and interactive applications. Next, we’ll focus on “Session Management and State Persistence,” diving deep into how ADK handles conversation history and state using different SessionService
implementations.