ML Ops and Workflow Automation
Auto Rating ML Workflow
An applied ML pipeline for media classification, QA automation, policy-mismatch detection, and structured downstream review routing.
Context
This work sat closer to practical ML infrastructure than model research. The important question was not how to train the fanciest classifier. It was how to ingest media reliably, attach the right metadata, run inference and validation consistently, compare outputs against merchant or product signals, and route only the uncertain or high-risk cases into manual review.
Problem
Manual QA and media-review workflows were slow, repetitive, and difficult to scale. Teams needed a systematic way to ingest files, extract or attach metadata, classify content, validate that the media matched the declared merchant or product category, surface likely policy violations or misrepresentation, and generate structured outputs for downstream review.
What I Built
- Cloud-based ingestion flow for batches of media files and associated metadata using event-driven serverless processing
- ETL stages for archive extraction, metadata normalization, media preprocessing, classification, validation, and downstream packaging
- Hugging Face model integration for image categorization and policy-sensitive classification validation
- Mismatch-detection logic that compared image-based classification against declared merchant, product, or catalog signals and flagged likely misrepresentation
- Manual-review routing for low-confidence classifications, policy violations, and cross-signal inconsistencies instead of forcing reviewers to inspect every asset
- Automation for CSV and stakeholder-facing reports rather than leaving results trapped in raw model output
Notes
Context
This project is best understood as ML ops for a real workflow. Media review systems often fall into a gap between manual operations and overhyped AI tooling. The practical need is narrower and more useful: get files in, preserve the right metadata, run consistent classification and validation logic, compare outputs against merchant or product context, and deliver only the ambiguous or risky cases to human reviewers.
Technical design
The architecture leaned on cloud-native events and storage rather than a heavyweight platform. Uploaded batches landed in Cloud Storage, which triggered Pub/Sub messages to fan work out across serverless handlers. That kept ingestion, preprocessing, inference, mismatch detection, and reporting loosely coupled enough to retry independently.
At a high level, the workflow looked like this:
Upload batch -> extract and normalize metadata -> preprocess media -> run classification and validation ->
compare against merchant / product signals -> auto-pass or flag -> route uncertain cases to manual review -> generate reports
The classification layer used Hugging Face models as one part of the system, not the entire system. Model outputs were most useful when paired with validation logic:
- does the detected category match the declared merchant category?
- does the image content align with the product type or listing description?
- does the content suggest policy-violating material even if the merchant metadata looks clean?
That is what turned the pipeline from a simple classifier into a review-acceleration system.
Review routing and policy checks
The most valuable product behavior came from triage:
- high-confidence, matching cases could move through automatically
- low-confidence or cross-signal mismatches could be flagged as likely misrepresentation
- policy-sensitive categories could be routed directly into manual review
That reduced repetitive manual review while still keeping a human in the loop where the cost of a bad decision was high.
I treated the reporting layer as part of the system, not just a final export. That meant shaping outputs around operational usefulness: CSVs, summaries, reason codes, and structured exception records that were easy to review, hand off, or audit later.
Closing thought
The biggest lesson from this project was that practical ML value comes from workflow design: dependable ingestion, clear validation logic, explicit triage, and outputs that fit real review operations. The model helped, but the system around it is what actually made the automation useful.