ML Ops and Workflow Automation

Auto Rating ML Workflow

An applied ML pipeline for media classification, QA automation, policy-mismatch detection, and structured downstream review routing.

Applied ML Workflow over hype

Context

This work sat closer to practical ML infrastructure than model research. The important question was not how to train the fanciest classifier. It was how to ingest media reliably, attach the right metadata, run inference and validation consistently, compare outputs against merchant or product signals, and route only the uncertain or high-risk cases into manual review.

Problem

Manual QA and media-review workflows were slow, repetitive, and difficult to scale. Teams needed a systematic way to ingest files, extract or attach metadata, classify content, validate that the media matched the declared merchant or product category, surface likely policy violations or misrepresentation, and generate structured outputs for downstream review.

What I Built

Cloud-based ingestion flow for batches of media files and associated metadata using event-driven serverless processing
ETL stages for archive extraction, metadata normalization, media preprocessing, classification, validation, and downstream packaging
Hugging Face model integration for image categorization and policy-sensitive classification validation
Mismatch-detection logic that compared image-based classification against declared merchant, product, or catalog signals and flagged likely misrepresentation
Manual-review routing for low-confidence classifications, policy violations, and cross-signal inconsistencies instead of forcing reviewers to inspect every asset
Automation for CSV and stakeholder-facing reports rather than leaving results trapped in raw model output

Notes

Context

This project is best understood as ML ops for a real workflow. Media review systems often fall into a gap between manual operations and overhyped AI tooling. The practical need is narrower and more useful: get files in, preserve the right metadata, run consistent classification and validation logic, compare outputs against merchant or product context, and deliver only the ambiguous or risky cases to human reviewers.

Technical design

The architecture leaned on cloud-native events and storage rather than a heavyweight platform. Uploaded batches landed in Cloud Storage, which triggered Pub/Sub messages to fan work out across serverless handlers. That kept ingestion, preprocessing, inference, mismatch detection, and reporting loosely coupled enough to retry independently.

At a high level, the workflow looked like this:

Upload batch -> extract and normalize metadata -> preprocess media -> run classification and validation ->
compare against merchant / product signals -> auto-pass or flag -> route uncertain cases to manual review -> generate reports

The classification layer used Hugging Face models as one part of the system, not the entire system. Model outputs were most useful when paired with validation logic:

does the detected category match the declared merchant category?
does the image content align with the product type or listing description?
does the content suggest policy-violating material even if the merchant metadata looks clean?

That is what turned the pipeline from a simple classifier into a review-acceleration system.

Review routing and policy checks

The most valuable product behavior came from triage:

high-confidence, matching cases could move through automatically
low-confidence or cross-signal mismatches could be flagged as likely misrepresentation
policy-sensitive categories could be routed directly into manual review

That reduced repetitive manual review while still keeping a human in the loop where the cost of a bad decision was high.

I treated the reporting layer as part of the system, not just a final export. That meant shaping outputs around operational usefulness: CSVs, summaries, reason codes, and structured exception records that were easy to review, hand off, or audit later.

Closing thought

The biggest lesson from this project was that practical ML value comes from workflow design: dependable ingestion, clear validation logic, explicit triage, and outputs that fit real review operations. The model helped, but the system around it is what actually made the automation useful.

Role / ownership

Owned the system design for ingestion, processing, validation, and structured output flow
Worked on the operational plumbing that connected cloud storage, Pub/Sub events, serverless compute, metadata handling, and report generation
Framed the problem as applied ML workflow design rather than isolated model work

Impact

Turned a repetitive media-review workflow into a scalable GCP pipeline with event-driven ingestion, model-assisted categorization, policy-mismatch flagging, and faster manual review loops.

Stack

GCP
Pub/Sub
Cloud Functions
Python
Cloud Storage
ETL
Hugging Face
CSV Reporting
ML Ops

Technical design

GCP-native pipeline using Cloud Storage upload events, Pub/Sub fan-out, and Cloud Functions as Lambda-style serverless processing stages
Storage-centered workflow for uploaded archives, extracted files, normalized assets, model artifacts, and generated outputs
Metadata ETL layer that maps file batches into a consistent internal representation before inference or validation
Hugging Face inference and validation layer for category detection, content checks, and policy-sensitive classification
Cross-signal comparison layer that checks whether image evidence aligns with merchant metadata, declared category, and other product signals
Manual-review queue and report generation layer producing structured outputs for reviewers who needed explainable exceptions rather than opaque scores

Engineering decisions

Focused on workflow reliability and output consistency instead of over-optimizing model sophistication
Used Pub/Sub to decouple ingestion, preprocessing, inference, and reporting so failed stages could retry independently
Kept metadata normalization explicit because bad metadata silently degrades every downstream report and every cross-signal comparison
Designed the pipeline around rerun safety so operators could recover from partial batch failures without manual cleanup
Used model confidence and mismatch thresholds to triage cases into auto-pass, auto-flag, and manual-review buckets
Generated structured exports as a first-class product output rather than treating reporting as an afterthought

Tradeoffs

Serverless compute kept the pipeline lightweight, though large or inconsistent media batches require careful timeout, batching, and retry design
Automated rating and validation reduce repetitive manual work, but they still need clear confidence thresholds and review boundaries to stay trustworthy
Cross-signal mismatch detection improves policy coverage, though it depends heavily on metadata quality from upstream systems
Richer reporting improves downstream usability, though it adds schema and versioning discipline to the pipeline

Outcome / impact

Reduced review times by pre-sorting media into categories, likely violations, and manual-review queues
Automated a large share of repetitive review work while preserving human review for uncertain or policy-sensitive cases
Improved identification of policy violations and merchant or product misrepresentation through image-to-metadata comparison
More consistent batch outputs for internal or client-facing stakeholders
A cleaner operational model for reruns, file provenance, and reporting visibility

Lessons learned

Most ML operations pain comes from the edges of the workflow rather than the model call itself
Versioning and file provenance matter as much as inference quality in media pipelines
Mismatch detection becomes much more valuable when model output can be compared against structured merchant or product signals
A useful ML product is often the one that produces dependable structured outputs and review queues, not just predictions