How Gemini 3 is Revolutionizing NFL Film Study

Posted Dec 10, 2025

By Amulya Bhatia 4 min read

Football is a game of inches, but understanding it is a game of information. For decades, “All-22” film breakdown—analyzing formations, coverages, and blocking schemes—was a skill reserved for professional scouts and obsessive coaches.

By simply pasting a YouTube link, my new application called X-OG acts as your personal AI Assistant, breaking down game tape with a level of visual fidelity and tactical understanding that was previously impossible for AI. The code for X-OG can be found here.

But the magic isn’t just in the model architecture; it’s in the System Instruction. Let’s look under the hood at how Gemini 3’s state-of-the-art video understanding pairs with expert-level prompt engineering to shift the paradigm of sports analytics.

The Engine: Native Video Understanding

The “secret sauce” behind X-OG is the raw power of the Gemini 3 Pro model.

In the past, “analyzing a video” with AI usually meant reading a transcript or analyzing static frames. Neither works for football. You cannot identify a “Cover 3 Zone Blitz” by reading the announcer’s commentary, and you cannot understand a blocking scheme from a single still image. You need to see the motion.

X-OG utilizes Gemini 3’s native multimodal video processing. The app sends the YouTube URL directly to the model. The model effectively “watches” the game, allowing it to perform tasks that require genuine visual cognition:

Formation Recognition: Distinguishing between “Shotgun, Trips Right” and “I-Formation.”
Motion Tracking: Analyzing receiver routes (e.g., “Deep Post” vs. “Seam”).
Line Play: Watching the interaction in the trenches to distinguish run blocking from pass protection.

The Steering Wheel: The “X-OG” Persona Prompt

While Gemini 3 provides the eyes, the System Instruction (found in constants.ts) provides the brain.

1. Establishing Authority via Role-Playing

The prompt begins by explicitly defining the AI’s psychological stance:

“You are an Expert Football Analyst and Film Breakdown Specialist. Your name is ‘X-OG.’ You possess the deep tactical knowledge of a Football offensive and defensive coordinator, the sharp eye of a professional scout…”

This is crucial. By priming the model as a “coordinator,” X-OG shifts the probability distribution of the output. The model is less likely to say “The player ran with the ball” (generic) and more likely to say “The tailback executed a zone-read handoff” (expert).

2. The “Significant Play” Filter

Video models can get overwhelmed by noise (huddles, commercials, crowd shots). The prompt instructs the model to filter the signal from the noise:

“Identify as many significant plays as possible. A significant play is defined as a touchdown, a turnover, a sack, a key 4th down conversion/stop…”

This instructs Gemini to use its reasoning capabilities to judge the value of a moment, not just its visual content.

3. Asking for the “Inference,” Not Just the Observation

The prompt explicitly asks Gemini to look at the preSnapAnalysis (formations) and the executionDetails (routes/blocking) to work backward and deduce what the Offensive Coordinator actually called. It asks Gemini to deduce the intent behind the play, not just the outcome.

4. The Algorithm in the Prompt

The prompt also acts as a code function. It creates deep links to specific moments in the video:

“To create the Timestamped Link, take the EXACT YouTube URL… and append &t=[#]s, where [#] is the total number of seconds corresponding to the play’s start time.”

This requires Gemini to identify the timestamp visually (OCR of the game clock or internal video timing), perform the math to convert minutes to seconds, and string manipulate the URL.

Structured Thinking: The JSON Schema

The prompt doesn’t allow Gemini to ramble. It enforces a strict JSON Schema.

The schema (in constants.ts) forces the model to categorize its thoughts into buckets:

Situation: Score, Down & Distance (Requires OCR/Graphics reading).
Pre-Snap: Formations (Requires object detection/spatial reasoning).
Breakdown: QB, O-Line, Secondary (Requires motion analysis).
Outcome: The result (Requires causality analysis).

This structure ensures that the “Pre-Snap Analysis” is separated from the “Outcome,” forcing Gemini to reason chronologically through the play, just like a coach watching film.

The Developer Experience

The implementation is deceptively simple. The prompt and the video are sent in a single request:

  
const response = await ai.models.generateContent({
  model: 'gemini-3-pro-preview',
  contents: {
    parts: [
      { fileData: { mimeType: 'video/youtube', fileUri: youtubeUrl } },
    ],
  },
  config: {
    systemInstruction: FOOTBALL_FILM_ROOM_SYSTEM_INSTRUCTION, // The Persona
    responseMimeType: 'application/json',
    responseSchema: ANALYSIS_SCHEMA, // The Structure
  },
});

There are no complex computer vision pipelines or frame extraction scripts. The prompt is the logic layer.

Conclusion

X-OG is a case study in modern AI application development. It shows that State-of-the-Art Models (Gemini 3) + Domain-Specific Prompting (The X-OG Persona) = Expert-Level Insight.

We are moving away from passive viewing towards active, AI-assisted understanding. With Gemini 3’s ability to process long-context video with high visual acuity, and prompts that provide domain-specific context, all kinds of videos and images can be analyzed with expert-level insight.

Gen AI, Video Understanding

This post is licensed under CC BY 4.0 by the author.