Post

Chapter 9 - Enabling Agents with Code Execution

This article is part of my web book series. All of the chapters can be found here and the code is available on Github. For any issues around this book, contact me on LinkedIn

So far, our agents have learned to use pre-defined tools and APIs. But what if a task requires dynamic computation, data manipulation beyond the scope of existing tools, or the generation and execution of a script to achieve a goal? This is where Code Execution comes in. ADK provides a robust framework for allowing agents, particularly those powered by advanced LLMs like Gemini, to generate code (typically Python) and have it executed in a controlled environment, with the results fed back to the LLM to inform its next steps.

This capability transforms agents from simple tool users into more versatile problem solvers capable of scripting solutions on the fly.

The BaseCodeExecutor and its Importance

The foundation of code execution in ADK is the google.adk.code_executors.BaseCodeExecutor abstract class. Any component that executes code generated by the LLM must implement this interface.

Key aspects managed or defined by a BaseCodeExecutor:

  • execute_code(invocation_context, code_execution_input) -> CodeExecutionResult: The core abstract method that takes a CodeExecutionInput (containing the code string and any input files) and returns a CodeExecutionResult (containing stdout, stderr, and any output files).
  • Environment: The executor defines the environment in which the code runs (e.g., local Python interpreter, Docker container, managed cloud service).
  • Statefulness (stateful: bool): Some executors can maintain state between code executions within the same session (e.g., variables defined in one code block are available in the next).
  • File Handling (optimize_data_file: bool): Advanced executors can automatically manage data files, making them available to the executed code.
  • Error Handling & Retries (error_retry_attempts: int): Defines how many times to retry if code execution fails.
  • Delimiters:
    • code_block_delimiters: Specifies how the LLM should format code blocks in its output (e.g., python code ). ADK uses these to extract the code.
    • execution_result_delimiters: Specifies how ADK should format the code execution results when feeding them back to the LLM (e.g., tool_output result ).

*Diagram: Hierarchy of `BaseCodeExecutor` and its implementations.*

ADK provides several concrete implementations of BaseCodeExecutor, each suited for different use cases and security considerations. An LlmAgent is configured to use a code executor by setting its code_executor attribute to an instance of one of these classes.

BuiltInCodeExecutor: Leveraging Model’s Native Capabilities

For certain models, code execution is a native, built-in capability. The model itself can generate, execute (in a sandboxed environment), and reason about Python code.

The google.adk.code_executors.BuiltInCodeExecutor is a special executor that signals to ADK that the model itself will handle code execution.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
from google.adk.agents import Agent
from google.adk.code_executors import BuiltInCodeExecutor # Key import
from google.adk.runners import InMemoryRunner
from google.genai.types import Content, Part

from building_intelligent_agents.utils import load_environment_variables, create_session, DEFAULT_LLM
load_environment_variables()

# This agent will use the model's internal code interpreter.
# Ensure you are using a model that supports this.
# For Gemini API, this is often enabled by default.
# For Vertex AI, you might need to ensure the model version supports it.

code_savvy_agent_builtin = Agent(
    name="code_savvy_agent_builtin",
    model=DEFAULT_LLM, # A model with built-in code execution
    instruction="You are a helpful assistant that can write and execute Python code to answer questions, especially for calculations or data analysis. When you write code, it will be automatically executed.",
    code_executor=BuiltInCodeExecutor() # Assign the executor
)

if __name__ == "__main__":
    runner = InMemoryRunner(agent=code_savvy_agent_builtin, app_name="BuiltInCodeApp")
    session_id = "s_builtin_code_test"
    user_id = "builtin_user"
    create_session(runner, session_id, user_id)
    prompts = [
        "What is the factorial of 7?",
        "Calculate the square root of 12345.",
        "Generate a list of the first 10 prime numbers."
    ]

    async def main():
        for prompt_text in prompts:
            print(f"\nYOU: {prompt_text}")
            user_message = Content(parts=[Part(text=prompt_text)], role="user")
            print("ASSISTANT: ", end="", flush=True)
            async for event in runner.run_async(user_id=user_id, session_id=session_id, new_message=user_message):
                # The trace for this would show the model generating code,
                # and then a `code_execution_result` part directly from the model,
                # followed by the model's textual interpretation.
                if event.content and event.content.parts:
                    for part in event.content.parts:
                        if part.text:
                            print(part.text, end="", flush=True)
                        elif part.executable_code: # Code generated by LLM
                            print(f"\n  CODE BLOCK:\n{part.executable_code.code.strip()}\n  END CODE BLOCK", end="")
                        elif part.code_execution_result: # Result from model's interpreter
                            print(f"\n  EXECUTION RESULT: {part.code_execution_result.outcome}\n  OUTPUT:\n{part.code_execution_result.output.strip()}\n  END EXECUTION RESULT", end="")
            print()

    import asyncio
    asyncio.run(main())

How BuiltInCodeExecutor Works with ADK:

  1. When BuiltInCodeExecutor is assigned to an agent, its process_llm_request method (called by the LLM Flow) modifies the LlmRequest to enable the model’s code interpreter tool (e.g., by adding types.Tool(code_execution=types.ToolCodeExecution()) to request.config.tools).
  2. The LLM, when it deems necessary, generates a Part containing executable_code.
  3. The model internally executes this code.
  4. The LLM then includes another Part in its response containing the code_execution_result.
  5. ADK receives these parts within the LlmResponse and yields corresponding Event objects.

Seamless and Secure Code Execution

BuiltInCodeExecutor is the most seamless way to enable code execution if your chosen LLM supports it. The execution happens in a sandboxed environment, offering a high degree of security and abstracting away the complexities of setting up an execution environment.

UnsafeLocalCodeExecutor: For Development and Trusted Environments

The google.adk.code_executors.UnsafeLocalCodeExecutor executes Python code directly in the same Python process where your ADK application is running, using Python’s exec() function.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
from google.adk.agents import Agent
from google.adk.code_executors import UnsafeLocalCodeExecutor # Key import
from google.adk.runners import InMemoryRunner
from google.genai.types import Content, Part

from building_intelligent_agents.utils import load_environment_variables, create_session, DEFAULT_LLM
load_environment_variables()

# ⚠️ DANGER ⚠️: UnsafeLocalCodeExecutor executes arbitrary code from the LLM
# in your local Python environment. ONLY use this in trusted development
# environments and NEVER in production or with untrusted LLM outputs.

unsafe_code_agent = Agent(
    name="unsafe_code_agent",
    model=DEFAULT_LLM, # Can be any model that generates code
    instruction="You are an assistant that can write Python code to solve problems. I will execute the code you provide in my local environment. Focus on simple calculations that don't require external libraries beyond standard Python.",
    code_executor=UnsafeLocalCodeExecutor() # Assign the executor
)

if __name__ == "__main__":
    print("⚠️ WARNING: Running UnsafeLocalCodeExecutor. This is not recommended for production. ⚠️")
    runner = InMemoryRunner(agent=unsafe_code_agent, app_name="UnsafeCodeApp")
    session_id = "s_unsafe_code_test"
    user_id = "unsafe_user"
    create_session(runner, session_id, user_id)
    prompts = [
        "Define a variable x as 10 and y as 20, then print their sum.",
        "What is 2 to the power of 10?",
    ]

    async def main():
        for prompt_text in prompts:
            print(f"\nYOU: {prompt_text}")
            user_message = Content(parts=[Part(text=prompt_text)], role="user")
            print("ASSISTANT (via UnsafeLocalCodeExecutor): ", end="", flush=True)
            async for event in runner.run_async(user_id=user_id, session_id=session_id, new_message=user_message):
                # Trace: LLM -> code -> UnsafeLocalCodeExecutor.execute_code() -> result -> LLM -> final text
                if event.content and event.content.parts:
                    for part in event.content.parts:
                        if part.text: print(part.text, end="", flush=True)
                        # We might not see executable_code/code_execution_result directly in the final agent output
                        # if the LLM summarizes it, but they'll be in the Trace.
            print()

    import asyncio
    asyncio.run(main())

How UnsafeLocalCodeExecutor Works with ADK:

  1. The LLM generates code, typically formatted with delimiters like python\ncode\n.
  2. ADK’s LLM Flow (specifically the _code_execution.response_processor) extracts this code from the LlmResponse.
  3. It creates a CodeExecutionInput with the extracted code.
  4. It calls unsafe_local_executor.execute_code(..., code_input).
  5. UnsafeLocalCodeExecutor uses exec(code_input.code, globals_dict, locals_dict) to run the code. Standard output is captured.
  6. A CodeExecutionResult (with stdout/stderr) is returned.
  7. This result is formatted (e.g., tool_output\nstdout_content\n) and sent back to the LLM in the next turn.
  8. The LLM uses this execution result to formulate its final response or decide the next step.

Extreme Security Risk with UnsafeLocalCodeExecutor

The name “Unsafe” is there for a critical reason. This executor runs LLM-generated code directly in your application’s Python environment. A malicious or poorly written piece of code from the LLM could:

  • Access/delete local files.
  • Make arbitrary network calls.
  • Consume excessive resources.
  • Introduce security vulnerabilities.

NEVER use UnsafeLocalCodeExecutor in production environments or with untrusted models/users. It is strictly for isolated, trusted local development and experimentation.

ContainerCodeExecutor: Secure, Isolated Execution via Docker

For a more secure way to execute arbitrary code, ADK provides the ContainerCodeExecutor. This executor runs the LLM-generated Python code inside a Docker container, providing strong isolation from your host system.

Prerequisites:

  • Docker installed and running on your system.
  • The docker Python library installed (pip install docker).

You can either use a pre-built Python image or provide a path to a Dockerfile to build a custom image.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
from google.adk.agents import Agent
# Ensure 'docker' is installed: pip install google-adk[extensions] or pip install docker
try:
    from google.adk.code_executors import ContainerCodeExecutor # Key import
    DOCKER_AVAILABLE = True
except ImportError:
    print("Docker SDK not found. Please install it ('pip install docker') to run this example.")
    DOCKER_AVAILABLE = False

from google.adk.runners import InMemoryRunner
from google.genai.types import Content, Part
import os
import atexit # To ensure container cleanup

from building_intelligent_agents.utils import load_environment_variables, create_session, DEFAULT_LLM
load_environment_variables()

container_agent = None
container_executor_instance = None # To manage its lifecycle

if DOCKER_AVAILABLE:
    # Option 1: Use a pre-existing Python image from Docker Hub
    # container_executor_instance = ContainerCodeExecutor(image="python:3.10-slim")

    # Option 2: Build a custom image from a Dockerfile
    # Create a simple Dockerfile in the same directory (e.g., my_python_env/Dockerfile)
    dockerfile_dir = "my_python_env"
    os.makedirs(dockerfile_dir, exist_ok=True)
    with open(os.path.join(dockerfile_dir, "Dockerfile"), "w") as df:
        df.write("FROM python:3.10-slim
")
        df.write("RUN pip install numpy pandas
") # Example: add libraries
        df.write("WORKDIR /app
")
        df.write("COPY . /app
") # Not strictly needed if only executing ephemeral code

    try:
        print("Initializing ContainerCodeExecutor (may take a moment to build/pull image)...")
        container_executor_instance = ContainerCodeExecutor(
            docker_path=dockerfile_dir # Path to the directory containing the Dockerfile
            # image="my-custom-adk-executor:latest" # If you build and tag it manually first
        )
        print("ContainerCodeExecutor initialized.")

        container_agent = Agent(
            name="container_code_agent",
            model=DEFAULT_LLM,
            instruction="You are an assistant that writes Python code. I will execute your code in a sandboxed Docker container. You can use numpy and pandas.",
            code_executor=container_executor_instance
        )

        # Ensure the container is cleaned up on exit - ADK should do it on its own. Provided here only for reference
        # def cleanup_container():
        #     if container_executor_instance and hasattr(container_executor_instance, "_ContainerCodeExecutor__cleanup_container"):
        #         print("Cleaning up Docker container...")
        #         # Note: __cleanup_container is an internal method, direct call is for example clarity.
        #         # Proper resource management would ideally be handled by making ContainerCodeExecutor
        #         # an async context manager if it holds long-lived resources like a running container.
        #         # For now, ADK's MCPToolset shows a pattern with AsyncExitStack for resource cleanup.
        #         # A simpler direct cleanup call if the executor instance itself manages its container:
        #         if hasattr(container_executor_instance, "_container") and container_executor_instance._container:
        #             try:
        #                 container_executor_instance._container.stop()
        #                 container_executor_instance._container.remove()
        #                 print(f"Container {container_executor_instance._container.id} stopped and removed.")
        #             except Exception as e:
        #                 print(f"Error during manual container cleanup: {e}")

        # atexit.register(cleanup_container)

    except Exception as e:
        print(f"Failed to initialize ContainerCodeExecutor. Is Docker running and configured? Error: {e}")
        container_agent = None # Fallback
else:
    print("Skipping ContainerCodeExecutor example as Docker SDK is not available.")


if __name__ == "__main__":
    if not container_agent:
        print("Container Agent not initialized. Exiting.")
    else:
        runner = InMemoryRunner(agent=container_agent, app_name="ContainerCodeApp")
        session_id = "s_container_code_test"
        user_id = "container_user"
        create_session(runner, session_id, user_id)
        prompts = [
            "Import numpy and create a 3x3 matrix of zeros, then print it.",
            "Use pandas to create a DataFrame with two columns, 'Name' and 'Age', and add one row of data. Print the DataFrame."
        ]
        # ... (runner and async main loop as in UnsafeLocalCodeExecutor example) ...
        # The interaction flow is similar, but execution is inside Docker.
        print("Container agent ready. Note: First execution might be slower due to Docker image layers.")
        # Add the async main loop here if you want to run prompts
        async def main():
            for prompt_text in prompts:
                print(f"\nYOU: {prompt_text}")
                user_message = Content(parts=[Part(text=prompt_text)], role="user")
                print("ASSISTANT (via ContainerCodeExecutor): ", end="", flush=True)
                async for event in runner.run_async(user_id=user_id, session_id=session_id, new_message=user_message):
                    if event.content and event.content.parts:
                        for part in event.content.parts:
                            if part.text: print(part.text, end="", flush=True)
                print()
        import asyncio
        asyncio.run(main())

How ContainerCodeExecutor Works with ADK: The flow is similar to UnsafeLocalCodeExecutor, but step 5 is different: ContainerCodeExecutor.execute_code(...) starts a Docker container (if not already running for a stateful session, though this example uses non-stateful by default) using the specified image. It then uses docker exec (or equivalent Docker SDK call) to run the Python code inside the container. Stdout and stderr are captured from the container’s execution.

ContainerCodeExecutor for Enhanced Security

For most use cases involving LLM-generated code, ContainerCodeExecutor offers a much better security posture than UnsafeLocalCodeExecutor due to Docker’s isolation. Define a minimal Docker image with only the necessary Python libraries your agent needs.

Docker Overhead and Configuration

  • Running Docker containers introduces some overhead (image pulling/building, container startup time), which might make initial code executions slower.
  • Requires Docker to be properly installed and running on the host machine where the ADK application executes.
  • Managing Docker images and ensuring they have the correct dependencies can add complexity.

VertexAiCodeExecutor: Cloud-Native, Managed Code Execution

For a fully managed and scalable code execution environment in the cloud, ADK integrates with Vertex AI Code Interpreter Extension.

The google.adk.code_executors.VertexAiCodeExecutor uses this Google-managed service.

Prerequisites:

  • Google Cloud Project with Vertex AI API enabled.
  • Authentication configured (e.g., gcloud auth application-default login or service account).
  • The google-cloud-aiplatform library with the necessary [preview] extras or ensure your version is recent enough to include code interpreter extensions (pip install "google-cloud-aiplatform>=1.47.0" or as specified by ADK).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
from google.adk.agents import Agent
# Ensure google-cloud-aiplatform is installed
try:
    from google.adk.code_executors import VertexAiCodeExecutor # Key import
    VERTEX_SDK_AVAILABLE = True
except ImportError:
    print("Vertex AI SDK (with preview features for extensions) not found. Please ensure 'google-cloud-aiplatform' is installed and up to date.")
    VERTEX_SDK_AVAILABLE = False

from google.adk.runners import InMemoryRunner
from google.genai.types import Content, Part
import os

from building_intelligent_agents.utils import load_environment_variables, create_session, DEFAULT_LLM
load_environment_variables()

vertex_agent = None
if VERTEX_SDK_AVAILABLE:
    # Ensure GOOGLE_CLOUD_PROJECT is set in your environment
    if not os.getenv("GOOGLE_CLOUD_PROJECT"):
        print("Error: GOOGLE_CLOUD_PROJECT environment variable must be set for VertexAiCodeExecutor.")
    else:
        try:
            print("Initializing VertexAiCodeExecutor...")
            # You can optionally provide a resource_name for an existing Code Interpreter Extension instance
            # vertex_executor = VertexAiCodeExecutor(resource_name="projects/.../locations/.../extensions/...")
            vertex_executor = VertexAiCodeExecutor() # Will create or use an existing one based on env var or default
            print(f"VertexAiCodeExecutor initialized. Using extension: {vertex_executor._code_interpreter_extension.gca_resource.name}")


            vertex_agent = Agent(
                name="vertex_code_agent",
                model=DEFAULT_LLM,
                instruction="You are an advanced AI assistant. Write Python code to perform calculations or data tasks. Your code will be executed in a secure Vertex AI environment. Default libraries like pandas, numpy, matplotlib are available.",
                code_executor=vertex_executor
            )
        except Exception as e:
            print(f"Failed to initialize VertexAiCodeExecutor. Ensure Vertex AI API is enabled and auth is correct. Error: {e}")
else:
    print("Skipping VertexAiCodeExecutor example as Vertex AI SDK is not available/configured.")


if __name__ == "__main__":
    if not vertex_agent:
        print("Vertex Agent not initialized. Exiting.")
    else:
        runner = InMemoryRunner(agent=vertex_agent, app_name="VertexCodeApp")
        session_id = "s_vertex"
        user_id = "vertex_user"
        create_session(runner, user_id=user_id, session_id=session_id)

        prompts = [
            "Plot a simple sine wave using matplotlib and save it as 'sine_wave.png'. Describe the plot.",
            "Create a pandas DataFrame with columns 'City' and 'Population' for three cities, then print the average population."
        ]
        # ... (runner and async main loop as in UnsafeLocalCodeExecutor example) ...
        # The Vertex AI Code Interpreter handles file outputs (like 'sine_wave.png')
        # and makes them available in the CodeExecutionResult.
        # ADK can then save these as artifacts.
        print("Vertex AI Code Interpreter agent ready.")
        # Add async main loop here
        async def main():
            for prompt_text in prompts:
                print(f"\nYOU: {prompt_text}")
                user_message = Content(parts=[Part(text=prompt_text)], role="user")
                print("ASSISTANT (via VertexAiCodeExecutor): ", end="", flush=True)
                # Note: The actual plot image won't be printed to console here.
                # In the Dev UI or a proper app, you'd handle the output_files
                # from the CodeExecutionResult (which are then put into Event.actions.artifact_delta).
                async for event in runner.run_async(user_id=user_id, session_id=session_id, new_message=user_message):
                    if event.content and event.content.parts:
                        for part in event.content.parts:
                            if part.text: print(part.text, end="", flush=True)
                print()
                # To see artifacts:
                if runner.artifact_service:
                    artifacts = await runner.artifact_service.list_artifact_keys(
                        app_name="VertexCodeApp", user_id="vertex_user", session_id="s_vertex"
                    )
                    if artifacts:
                        print(f"  (Artifacts created: {artifacts})")
        import asyncio
        asyncio.run(main())

Managed, Scalable, and Feature-Rich Execution with Vertex AI

VertexAiCodeExecutor is the recommended choice for production cloud deployments.

  • Managed Environment: No need to manage Docker or Python environments.
  • Security: Runs in a Google-managed sandbox.
  • Pre-installed Libraries: Common data science libraries (pandas, numpy, matplotlib, scipy) are typically available.
  • File I/O: Supports generating and returning files (e.g., plots, data files), which ADK can then handle as artifacts.
  • Stateful Execution: The Vertex AI Code Interpreter can be stateful by default (using session_id in execute_code), meaning variables and imports persist across code blocks within the same agent session. ADK’s VertexAiCodeExecutor is also marked as stateful=True by default.

The Code Execution Cycle

Regardless of the executor used (except BuiltInCodeExecutor which is more integrated), the general cycle facilitated by ADK’s LLM Flow (specifically _code_execution.py processors) is:

  1. LLM Generates Code: The LlmAgent, guided by its instruction, generates a code snippet in response to a user query or as part of a plan. This code is typically embedded in its text response, marked by delimiters (e.g., python ...).
  2. ADK Extracts Code: The _code_execution.response_processor in the LLM Flow detects and extracts this code block from the LlmResponse. The original LlmResponse content (up to the code block) is yielded as a partial Event.
  3. ADK Invokes Executor: The processor creates a CodeExecutionInput and calls the agent.code_executor.execute_code() method.
  4. Executor Runs Code: The chosen BaseCodeExecutor implementation runs the code in its specific environment.
  5. Executor Returns Result: A CodeExecutionResult (containing stdout, stderr, and any output files) is returned to ADK. Output files from VertexAiCodeExecutor or a custom executor can be automatically saved as artifacts if an ArtifactService is configured.
  6. ADK Formats Result: The CodeExecutionResult is formatted into a string (e.g., tool_output ...) and packaged as a userrole Content object within a new Event.
  7. Result Fed Back to LLM: This new Event (containing the execution result) is appended to the conversation history. The LLM Flow then constructs a new LlmRequest (including this result) and calls the LLM again.
  8. LLM Interprets Result: The LLM uses the code’s output to formulate a final natural language response, generate more code, or decide on its next action.

*Diagram: The general code execution cycle in ADK (for non-built-in executors).*

Managing Code Execution Context (CodeExecutorContext)

For stateful executors or when optimizing data file inputs, ADK uses CodeExecutorContext. This object, typically managed internally by the code execution flow processors, stores information relevant to the code execution process within the session state. This can include:

  • The execution_id for stateful sessions (e.g., for VertexAiCodeExecutor).
  • A list of processed input files to avoid redundant processing (optimize_data_file).
  • Error counts for retry logic.

You generally won’t interact with CodeExecutorContext directly unless you are building a custom code executor or deeply customizing the code execution flow.

Best Practice: Iterative Prompting for Code Generation

Getting an LLM to generate correct and useful code often requires iterative prompting.

  • Be Specific: Clearly state the desired inputs, outputs, and any constraints.
  • Provide Examples: If possible, include examples of desired code snippets in the agent’s instruction or few-shot examples.
  • Error Handling: Instruct the agent on how to interpret error messages from code execution and how to attempt to fix its code. ADK’s error_retry_attempts in code executors helps with this.
  • Start Simple: For complex tasks, ask the LLM to generate code in smaller, verifiable chunks.

What’s Next?

You’ve now unlocked a very advanced capability for your agents: the ability to write and execute code. This opens doors to solving a much wider range of problems. Next, we’ll return to the core of agent intelligence: the LLMs themselves. We’ll explore how ADK interfaces with different models, how to configure requests, and how to interpret their diverse responses, including streaming and multimodal content.

This post is licensed under CC BY 4.0 by the author.