Chapter 19 - Long-Term Memory for Agents
This article is part of my web book series. All of the chapters can be found here and the code is available on Github. For any issues around this book, contact me on LinkedIn
While session state provides agents with memory within a single conversation, and artifacts allow them to work with files, true intelligence often requires the ability to learn and recall information across many conversations and over extended periods. This is the domain of Long-Term Memory. ADK provides a MemoryService
abstraction to equip agents with this capability, allowing them to store and retrieve knowledge gleaned from past interactions.
This chapter introduces the BaseMemoryService
interface, its implementations (InMemoryMemoryService
and VertexAiRagMemoryService
), and the tools agents use to interact with this persistent knowledge base.
The BaseMemoryService
Interface and MemoryEntry
The google.adk.memory.BaseMemoryService
is an abstract class defining the contract for how agents store and retrieve long-term memories.
Key Abstract Methods of BaseMemoryService
:
add_session_to_memory(session: Session)
: Ingests an entireSession
object (or relevant parts of it) into the memory store. This is how the agent “learns” from past conversations. The service implementation decides how to chunk, embed, and index this session data.search_memory(app_name: str, user_id: str, query: str) -> SearchMemoryResponse
: Searches the memory for information relevant to the givenquery
, scoped to theapp_name
anduser_id
.
google.adk.memory.SearchMemoryResponse
:
- A Pydantic model that wraps the results of a memory search.
- Its primary attribute is
memories: list[MemoryEntry]
.
google.adk.memory.MemoryEntry
:
- A Pydantic model representing a single piece of information retrieved from memory.
- Key attributes:
content: types.Content
: The actual content of the memory (e.g., a user’s statement, an agent’s important response).author: Optional[str]
: Who originally authored this piece of information.timestamp: Optional[str]
: An ISO 8601 formatted string indicating when the original content occurred.
The Runner
is typically configured with a MemoryService
instance, making it accessible to agents and tools via their contexts.
Strategies for Populating and Retrieving Memory
- Populating Memory (
add_session_to_memory
):- When to call: Typically, you might call
add_session_to_memory
at the end of a user session, or perhaps after significant milestones within a long-running session. This allows the agent to “commit” the learnings from that interaction to its long-term store. - What to store: The
MemoryService
implementation decides how to process theSession
object. It might extract key user statements, agent summaries, important facts identified, or even entire conversational turns. For services likeVertexAiRagMemoryService
, it often involves converting text into vector embeddings for semantic search.
- When to call: Typically, you might call
- Retrieving Memory (
search_memory
):- Active Retrieval: An agent can explicitly decide to search its memory using a tool like
LoadMemoryTool
(see Section 18.5). The LLM formulates a query based on the current conversation to find relevant past information. - Proactive Retrieval: The
PreloadMemoryTool
(see Section 18.5) automatically searches the memory based on the current user’s input before the main LLM call for the turn. Relevant retrieved memories are then injected into the LLM’s prompt as context.
- Active Retrieval: An agent can explicitly decide to search its memory using a tool like
InMemoryMemoryService
The google.adk.memory.InMemoryMemoryService
provides a very basic, keyword-based memory implementation that stores session events directly in memory.
- Pros:
- No external dependencies, easy to use for local development and testing.
- Automatically used by
InMemoryRunner
if no other memory service is provided.
- Cons:
- Memory is lost when the Python process ends.
- Search is a simple keyword match on the text parts of events, not semantic.
- Can become slow and memory-intensive with very large histories.
- How it works:
add_session_to_memory
: Stores a reference to the list ofEvent
objects from the session, keyed byapp_name/user_id
andsession_id
.search_memory
: Iterates through all stored events for the user. It splits the query and event text into words and returns events where any query word (case-insensitive) matches any word in the event’s text.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
from google.adk.agents import Agent
from google.adk.tools import FunctionTool, ToolContext # For saving to memory & searching
from google.adk.runners import InMemoryRunner
from google.adk.sessions.session import Session
from google.adk.events.event import Event
from google.adk.memory import InMemoryMemoryService # Explicit import for clarity
from google.genai.types import Content, Part
import asyncio
from building_intelligent_agents.utils import load_environment_variables, create_session, DEFAULT_LLM
load_environment_variables() # Load environment variables for ADK configuration
# --- Agent and Tools ---
async def save_current_session_to_long_term_memory(tool_context: ToolContext) -> str:
"""Explicitly saves the current session's events to long-term memory."""
if not tool_context._invocation_context.memory_service:
return "Error: Memory service is not available."
# Get the full current session object from the context
current_session = tool_context._invocation_context.session
await tool_context._invocation_context.memory_service.add_session_to_memory(current_session)
return f"Session {current_session.id} has been committed to long-term memory."
save_to_memory_tool = FunctionTool(save_current_session_to_long_term_memory)
async def recall_information(query: str, tool_context: ToolContext) -> dict:
"""Searches long-term memory for information related to the query."""
if not tool_context._invocation_context.memory_service:
return {"error": "Memory service is not available."}
search_response = await tool_context.search_memory(query)
if not search_response.memories:
return {"found_memories": 0, "results": "No relevant memories found."}
# For InMemoryMemoryService, results are raw MemoryEntry objects. Let's format them.
formatted_results = []
for mem_entry in search_response.memories[:3]: # Limit for brevity
text_parts = [p.text for p in mem_entry.content.parts if p.text]
formatted_results.append(
f"Author: {mem_entry.author}, Timestamp: {mem_entry.timestamp}, Content: {' '.join(text_parts)}"
)
return {"found_memories": len(search_response.memories), "results": formatted_results}
recall_tool = FunctionTool(recall_information)
memory_interactive_agent = Agent(
name="memory_keeper",
model=DEFAULT_LLM,
instruction=(
"You have access to a shared, long-term memory that persists across our conversations (sessions). "
"Use 'save_current_session_to_long_term_memory' to save important facts. "
"When asked to recall information, especially if it seems like something discussed previously (even in a different session), "
"use 'recall_information' with a relevant search query to check this long-term memory."
),
tools=[save_to_memory_tool, recall_tool]
)
if __name__ == "__main__":
# Explicitly create an InMemoryMemoryService instance to interact with
memory_service = InMemoryMemoryService()
# Initialize runner with our specific memory_service
runner = InMemoryRunner(agent=memory_interactive_agent, app_name="MemoryDemoApp")
runner.memory_service = memory_service # Override the default one from InMemoryRunner
user_id = "user_mem_test"
session_id_1 = "memory_session_1"
session_id_2 = "memory_session_2"
create_session(runner, user_id=user_id, session_id=session_id_1) # Create first session
create_session(runner, user_id=user_id, session_id=session_id_2) # Create second session
async def run_turn(uid, sid, prompt_text, expected_tool_call=None):
print(f"
--- Running for User: {uid}, Session: {sid} ---")
print(f"YOU: {prompt_text}")
user_message = Content(parts=[Part(text=prompt_text)], role="user")
print("AGENT: ", end="", flush=True)
async for event in runner.run_async(user_id=uid, session_id=sid, new_message=user_message):
if event.author == memory_interactive_agent.name and event.content and event.content.parts and not event.get_function_calls() and not event.get_function_responses():
print(event.content.parts[0].text.strip())
# For debugging, show tool calls
if event.get_function_calls():
for fc in event.get_function_calls():
print(f"
[TOOL CALL by {event.author}]: {fc.name}({fc.args})")
if expected_tool_call and fc.name == expected_tool_call:
print(f" (Matches expected tool call: {expected_tool_call})")
if event.get_function_responses():
for fr in event.get_function_responses():
print(f"
[TOOL RESPONSE to {fr.name}]: {fr.response}")
async def main():
# Turn 1 (Session 1): Store some information
await run_turn(user_id, session_id_1, "Please remember that my favorite hobby is hiking and save this session.",
expected_tool_call="save_current_session_to_long_term_memory")
# Turn 2 (Session 1): Recall information
await run_turn(user_id, session_id_1, "What did I tell you about my favorite hobby?",
expected_tool_call="recall_information")
# Turn 3 (New Session - Session 2): Try to recall information from Session 1
await run_turn(user_id, session_id_2, "Do you remember my favorite hobby?",
expected_tool_call="recall_information")
# Verify memory content (InMemoryMemoryService specific inspection)
search_res = await memory_service.search_memory(app_name="MemoryDemoApp", user_id=user_id, query="hobby")
print(f"
[DEBUG] Direct memory search for 'hobby' found {len(search_res.memories)} entries.")
for entry in search_res.memories:
print(f" - Author: {entry.author}, Content: {entry.content.parts[0].text if entry.content.parts else ''}")
asyncio.run(main())
InMemoryMemoryService is for Prototyping Only
Due to its simple keyword matching and lack of persistence, InMemoryMemoryService should not be used for production applications requiring reliable long-term memory. It’s primarily for understanding the mechanics and for basic local testing.
VertexAiRagMemoryService
: Leveraging Vertex AI RAG
For robust, scalable, and semantically rich long-term memory, ADK provides VertexAiRagMemoryService
. This service integrates with Vertex AI RAG (Retrieval-Augmented Generation), which allows you to build a corpus of your data (including past conversations) and perform semantic searches over it.
Prerequisites:
- Google Cloud Project with Vertex AI API enabled.
- Vertex AI RAG set up:
- A RAG Corpus created.
- Files (representing session data) uploaded to this corpus.
VertexAiRagMemoryService
handles formatting session data into files and uploading them.
- Authentication configured for ADK to access Vertex AI (e.g.,
gcloud auth application-default login
or a service account with roles like “Vertex AI User” and permissions for RAG operations). google-cloud-aiplatform
library installed with RAG support (usually included with recent versions or viagoogle-cloud-aiplatform[rag]
).
Initialization:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
from google.adk.memory import VertexAiRagMemoryService
import os
GCP_PROJECT_RAG = os.getenv("GOOGLE_CLOUD_PROJECT")
# Vertex AI RAG Corpus Name/ID or full resource name
# e.g., "my_adk_sessions_corpus" or
# "projects/<PROJECT_ID>/locations/<LOCATION>/ragCorpora/<CORPUS_ID>"
RAG_CORPUS_ID = os.getenv("ADK_RAG_CORPUS_ID")
# Optional location for the RAG corpus if not part of the full resource name
RAG_CORPUS_LOCATION = os.getenv("ADK_RAG_CORPUS_LOCATION", "us-central1")
if GCP_PROJECT_RAG and RAG_CORPUS_ID:
try:
# Construct the full corpus resource name if only ID is given
full_rag_corpus_name = RAG_CORPUS_ID
if not RAG_CORPUS_ID.startswith("projects/"):
full_rag_corpus_name = f"projects/{GCP_PROJECT_RAG}/locations/{RAG_CORPUS_LOCATION}/ragCorpora/{RAG_CORPUS_ID}"
rag_memory_service = VertexAiRagMemoryService(
rag_corpus=full_rag_corpus_name,
similarity_top_k=5, # Retrieve top 5 most similar chunks
# vector_distance_threshold=0.7 # Optional: filter by similarity score
)
print(f"VertexAiRagMemoryService initialized with corpus: {full_rag_corpus_name}")
# This instance can now be passed to a Runner
# from google.adk.runners import Runner
# my_agent_for_rag = ...
# runner_with_rag_mem = Runner(
# app_name="MyRagApp", # This app_name is used in file display names in RAG
# agent=my_agent_for_rag,
# session_service=...,
# memory_service=rag_memory_service
# )
except ImportError:
print("Vertex AI SDK with RAG support not found. Ensure 'google-cloud-aiplatform[rag]' is installed or similar.")
except Exception as e:
print(f"Failed to initialize VertexAiRagMemoryService: {e}")
print("Ensure RAG Corpus exists, Vertex AI API is enabled, and auth is correct.")
else:
print("GOOGLE_CLOUD_PROJECT or ADK_RAG_CORPUS_ID environment variables not set. Skipping VertexAiRagMemoryService setup.")
How VertexAiRagMemoryService
Works:
add_session_to_memory(session)
:- Converts the
session.events
into a structured text format (JSON lines, each representing an event with author, timestamp, and text). - Saves this structured text to a temporary local file.
- Uploads this temporary file to the configured Vertex AI RAG Corpus using
vertexai.preview.rag.upload_file()
. The file in RAG is typically given a display name like{app_name}.{user_id}.{session_id}
for identification. - Vertex AI RAG then automatically chunks, embeds, and indexes the content of this file.
- Converts the
search_memory(app_name, user_id, query)
:- Calls
vertexai.preview.rag.retrieval_query()
with the user’squery
and the configured RAG resources. - Vertex AI RAG performs a semantic search across the indexed documents in the corpus.
- (Currently, filtering by
user_id
orapp_name
within the RAG query itself might be limited. The service primarily searches the whole corpus it’s pointed to. Post-retrieval filtering based oncontext.source_display_name
might be needed if strict user-scoping is critical directly from this tool without LLM re-filtering.)
- (Currently, filtering by
- The retrieved
Contexts
(chunks of text from the ingested session files) are parsed back intoMemoryEntry
objects. The service attempts to reconstruct event-like structures if the ingested format (JSON lines) is found.
- Calls
Best Practice: Semantic Search with VertexAiRagMemoryService
For intelligent recall based on meaning rather than just keywords, VertexAiRagMemoryService is the way to go. It allows your agent to find relevant past conversations even if the exact phrasing isn’t used in the current query. This is crucial for building truly knowledgeable and context-aware long-term memory.
RAG Corpus Setup and Costs
- Setting up a Vertex AI RAG Corpus and ensuring your service account has the right permissions (e.g., “Vertex AI User”, “Storage Object User” for the underlying GCS bucket of the RAG corpus) is essential.
- Vertex AI RAG and its underlying services (like Vector Search, GCS) incur costs. Monitor your usage and understand the pricing model.
- Ingestion into RAG is asynchronous. There might be a delay between calling
add_session_to_memory
and the data being fully searchable.
Tools for Memory Interaction: LoadMemoryTool
and PreloadMemoryTool
As briefly introduced previously and demonstrated in the InMemoryMemoryService
example, ADK provides two key tools for agents to interact with the configured MemoryService
:
google.adk.tools.load_memory_tool
(LoadMemoryTool):- A standard
FunctionTool
that an LLM can explicitly call. - Its declaration asks for a
query: str
. - When called, its
run_async
method (theload_memory
function) invokestool_context.search_memory(query)
. - The
SearchMemoryResponse
(containing a list ofMemoryEntry
objects) is returned to the LLM. The LLM then uses this information. - Use Case: When the agent, based on its reasoning, identifies a specific need to look up past information.
- A standard
google.adk.tools.preload_memory_tool
(PreloadMemoryTool):- This is a “request processor”
BaseTool
(itsprocess_llm_request
method does the work). - It automatically runs before each main LLM call in a turn.
- It takes the current
tool_context.user_content
(the user’s latest message) as an implicit query totool_context.search_memory()
. - If relevant memories are found, their text content is formatted and prepended to the
LlmRequest
’s system instruction under a<PAST_CONVERSATIONS>
tag. - The LLM receives this past context proactively, without needing to ask for it.
- Use Case: To give the LLM relevant background from previous conversations automatically, making it seem more contextually aware from the start of its reasoning process for the current turn.
- This is a “request processor”
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
from google.adk.agents import Agent
from google.adk.tools import load_memory, preload_memory
from google.adk.runners import InMemoryRunner
from google.adk.events.event import Event
from google.adk.memory import InMemoryMemoryService
from google.genai.types import Content, Part
import time
import asyncio
from building_intelligent_agents.utils import load_environment_variables, create_session, DEFAULT_LLM, DEFAULT_REASONING_LLM
load_environment_variables() # Load environment variables for ADK configuration
# Setup dummy memory as in the InMemoryMemoryService example
memory_service_instance = InMemoryMemoryService()
past_events_for_memory = [
Event(author="user", timestamp=time.time()-7200, content=Content(parts=[Part(text="I'm planning a trip to Japan next year.")])),
Event(author="planner_bot", timestamp=time.time()-7190, content=Content(parts=[Part(text="Okay, Japan trip noted for next year!")])),
Event(author="user", timestamp=time.time()-3600, content=Content(parts=[Part(text="For my Japan trip, I'm interested in Kyoto.")])),
Event(author="planner_bot", timestamp=time.time()-3590, content=Content(parts=[Part(text="Kyoto is a great choice for your Japan trip!")])),
]
memory_service_instance._session_events["MemoryToolsApp/mem_tools_user"] = {"past_trip_planning": past_events_for_memory}
# Agent using both preload and allowing explicit load
smart_planner_agent = Agent(
name="smart_trip_planner",
model=DEFAULT_REASONING_LLM, # Needs good reasoning
instruction="You are a trip planning assistant. Past conversation snippets might be provided under <PAST_CONVERSATIONS> to help you. "
"If you need more specific details from past conversations not automatically provided, use the 'load_memory' tool with a targeted query. "
"Remember to be helpful and use retrieved information effectively.",
tools=[
preload_memory, # Will automatically add relevant context to system instruction
load_memory # Allows LLM to explicitly query memory
]
)
if __name__ == "__main__":
runner = InMemoryRunner(agent=smart_planner_agent, app_name="MemoryToolsApp")
runner.memory_service = memory_service_instance # Inject our populated memory service
user_id = "mem_tools_user"
session_id = "current_trip_planning"
create_session(runner, user_id=user_id, session_id=session_id) # Create a session for the user
async def run_planner_turn(prompt_text):
print(f"
YOU: {prompt_text}")
user_message = Content(parts=[Part(text=prompt_text)], role="user") # User message to the agent
print("SMART_TRIP_PLANNER: ", end="", flush=True)
# The Dev UI Trace is invaluable here to see:
# 1. What `preload_memory_tool` adds to the system instruction.
# 2. If/when `load_memory_tool` is called by the LLM.
# 3. The results of `load_memory_tool`.
async for event in runner.run_async(user_id=user_id, session_id=session_id, new_message=user_message):
if event.content and event.content.parts:
for part in event.content.parts:
if part.text and not (hasattr(part, 'thought') and part.thought): # Don't print thoughts for cleaner CLI
print(part.text, end="", flush=True)
elif part.function_call:
print(f"
[TOOL CALL by {event.author}]: {part.function_call.name}({part.function_call.args})", end="")
# We might not see tool responses directly if preload_memory_tool is effective
# or if load_memory_tool's output is summarized by LLM.
print()
async def main():
# Query that should benefit from preload_memory_tool
await run_planner_turn("I mentioned a country I want to visit. What was it and which city was I interested in there?")
# Expected: PreloadMemoryTool finds "Japan" and "Kyoto". Agent answers directly.
# Query that might require explicit load_memory if preload isn't specific enough
# (or if we wanted to demonstrate explicit call)
# await run_planner_turn("What were the exact first two things I told you about my trip planning?")
# Expected (if LLM decides to use load_memory): LLM calls load_memory(query="first two things about trip planning")
asyncio.run(main())
Combining PreloadMemoryTool and LoadMemoryTool
For many applications, using both tools provides a good balance:
PreloadMemoryTool
gives the LLM immediate, relevant context for the current user query.LoadMemoryTool
allows the LLM to dig deeper or search for different aspects of the memory if the preloaded information isn’t sufficient or if it needs to answer a very specific historical question.Your agent’s instruction should guide it on how and when to use
load_memory
ifpreload_memory
is also active.
Querying Effectiveness of Memory Search
The quality of results from search_memory (and thus the effectiveness of both memory tools) heavily depends on:
- For
InMemoryMemoryService
: How well the keywords in thequery
match words in the stored events.- For
VertexAiRagMemoryService
: The quality of the embeddings, the semantic similarity between thequery
and the stored content, and the RAG configuration (chunking strategy,similarity_top_k
, etc.).Crafting effective queries (either by the LLM for
LoadMemoryTool
or by using the user’s direct input forPreloadMemoryTool
) is key.
What’s Next?
With session management, artifact handling, and now long-term memory, our ADK agents are becoming quite sophisticated in how they manage and utilize information. This ability to remember and learn from past interactions is a cornerstone of building truly intelligent and personalized agents.
Next we’ll shift our focus to a critical aspect of the development lifecycle: how to measure and assess the quality, correctness, and effectiveness of the agents we build.