Media Intelligence and Summarization

F1 Intelligence Hub

A motorsport intelligence product that treats race-week information as a structured signal system rather than a stream of disconnected headlines.

Signal over noise Race-week explainers

Context

F1 race weeks create a very specific information problem. Official regulations, team updates, practice sessions, qualifying, race results, penalties, technical analysis, and media narratives all move on different clocks. Fans do not just need more content. They need a product that can sort what is official, what is interpretive, what changed recently, and what actually matters.

Problem

F1 content moves fast, and most fans end up piecing together race-week context from scattered sources. It is hard to track what changed since the last race, which regulation notes matter, how team performance is trending, and where the line sits between verified information and speculation.

What I Built

  • An ingestion layer for official regulations, results, press releases, and selected media sources with timestamped source attribution
  • A topic and entity model for drivers, teams, race weekends, regulations, technical themes, and championship context
  • A summarization workflow that separates official updates, factual race developments, and speculative or analytical commentary
  • Reusable output surfaces for race-week explainers, trend cards, regulation updates, and short-form content drafts built from the same structured source set
  • An editorial guardrail layer that preserves links, timestamps, and source labels so users can inspect where a claim came from

Notes

System overview

F1 Intelligence Hub is designed as an information system first and a content product second. The useful part is not just summarizing articles. It is organizing race-week state across regulations, results, team narratives, and technical updates so the user can understand what changed and why.

The underlying flow looks like this:

Official + media sources -> ingestion -> deduplication -> entity tagging -> topic clustering ->
fact / analysis separation -> summary generation -> race-week cards, explainers, and media outputs

That matters because a news-style feed alone is not enough. The product has to decide which reports belong together, which sources are authoritative, how fresh a summary is, and whether a claim should be treated as fact, interpretation, or prediction.

Product guardrails

  • facts and predictions should never share the same label
  • every summary needs source attribution
  • recency should be obvious at a glance

Event model

The system becomes much more useful once the main content objects are explicit:

  • sources: official FIA, Formula 1, teams, and approved media outlets
  • documents: raw articles, regulations, result pages, press releases, and race reports
  • entities: drivers, teams, circuits, weekends, rule topics, and technical themes
  • clusters: grouped race-week topics such as penalty changes, floor-upgrade discussion, or qualifying performance shifts
  • summary_outputs: source-backed summaries with labels like official update, race result, analysis, or prediction
  • distribution_assets: shorter derived outputs for digest cards, carousel scripts, or social snippets

That schema lets the product behave like a real intelligence layer instead of a collection of copied summaries.

Race-week workflow

The workflow changes depending on where the weekend sits:

Pre-race build-up -> practice signal collection -> qualifying state update ->
race result synthesis -> post-race technical and championship context

Each phase has different content priorities:

  • before the weekend: regulation notes, upgrades, penalties, and narrative setup
  • during practice: pace signals, long-run hints, and team-level anomalies
  • after qualifying: grid context and real performance shifts
  • after the race: results, incidents, penalties, and championship implications

Making that stage explicit keeps the product from treating all F1 content as one undifferentiated stream.

Tagging and clustering

The tagging layer is what makes the summarization useful.

The system should not just tag an article as Ferrari or Monaco. It should also be able to say:

  • this item is about an official regulation change
  • this item is about speculative setup analysis
  • this item overlaps with three other reports already in the same cluster
  • this item changes the current state of a race-week topic rather than repeating it

That makes the downstream product much cleaner. Instead of five near-identical summaries about the same regulation story, the user gets one structured topic card with source coverage beneath it.

Summary and distribution logic

The summary layer should produce different output modes from the same clustered data:

  • concise fact summaries for product surfaces
  • deeper explainers for regulations or technical changes
  • race-week digest blocks
  • short-form scripts for faceless media formats

The important part is that each output should retain provenance. The user should be able to see what the summary came from, when it was published, and whether the statement is official or interpretive.

Why the system is interesting

This project is really about structured information handling:

  • source quality management
  • entity modeling
  • recency and stage awareness
  • clustering and deduplication
  • summary generation with editorial boundaries

That combination is what makes it feel like a full system rather than a simple “AI sports summary” feature.

Research anchors