Trust, Fraud, and Risk Intelligence

Merchant Vetting Intelligence

A trust and safety workflow that combines policy signals, merchant classification, shoppability checks, pricing signals, and review-friendly decision support.

Earlier intervention Lifecycle risk detection

Context

This work sits in the space between trust engineering, analytics, and operator tooling. Merchant ecosystems are noisy and ambiguous, so the goal was not a single fraud score. The goal was to surface the right evidence early enough that review teams could act before problems spread downstream, while still preserving enough context to explain why a merchant looked suspicious or out of policy.

Problem

Merchant review teams often juggle policy docs, website signals, account behavior, prior incidents, product data, pricing signals, and regional context across too many tools. Without a better workflow, risky merchants are found late, manual effort grows, and decisions become inconsistent across reviewers and markets.

What I Built

Data workflows and review logic to bring merchant signals into a more structured lifecycle view
Decision-support patterns that combined rule-based, behavioral, shoppability, pricing, and policy-aware evidence instead of relying on a single score
Merchant categorization flows that used structured signals and LLM-assisted APIs to summarize site content, classify merchant intent, and surface likely policy domains
Review-friendly summaries that made risky patterns, likely violations, and evidence gaps easier to inspect and escalate
Operational thinking around regional nuance, exception handling, and cross-team feedback between policy, analysts, product, and engineering

Notes

Context

Merchant vetting is one of those domains where the hardest part is rarely the absence of data. It is the presence of too much uneven, ambiguous, and operationally messy data. Website signals, account behavior, prior incidents, policy nuance, regional patterns, and reviewer judgment all show up at once.

The product challenge is to turn that mess into a workflow that helps people make better trust decisions earlier.

What I built

The core contribution here was not a single model or one dashboard. It was a more structured merchant intelligence workflow:

bring lifecycle signals into one review path
separate raw observations from interpreted risk factors
classify merchant type, surface shoppability issues, and summarize price signals in a more consistent way
support earlier intervention instead of waiting for downstream failure
reduce repeated manual investigation steps
keep enough evidence visible that reviewers can still make defensible decisions

System design themes

Lifecycle modeling

Merchant risk should be treated as a lifecycle system, not a one-time intake score. Earlier signals matter because bad actors often reveal themselves through combinations of weak indicators before there is one obvious event.

Evidence before automation

The system works best when it accelerates human judgment rather than hiding it. In ambiguous trust domains, evidence quality matters more than aggressive auto-decisioning.

LLMs as a bounded layer

It is reasonable to use LLM APIs in this kind of system, but only for bounded tasks:

summarize merchant website content
classify merchant category when the website is messy or weakly structured
extract likely policy-relevant themes from product descriptions or landing pages
identify obvious gaps in shoppability, pricing transparency, or catalog coherence

The important boundary is that the LLM should not make the final trust decision. It should help compress evidence, normalize messy inputs, and propose likely categories or concerns that the rest of the system can validate.

That means the safer shape looks like:

Merchant signals -> rules and heuristics -> LLM-assisted classification / summarization ->
evidence package -> human review or policy-aware workflow decision

How this could evolve into a RAG workflow

The next useful step would be to turn the bounded LLM layer into a retrieval-backed system rather than relying only on the model prompt plus current merchant signals.

In a stronger RAG version, the workflow could retrieve:

the most relevant policy documents and restricted-business rules
prior merchant cases with similar evidence patterns
reviewer decisions and override history for the same policy area
merchant website snapshots, catalog extracts, and pricing evidence
region-specific guidance or enforcement notes

That would change the flow into something more grounded:

Merchant signals -> retrieve policy, history, and evidence -> LLM synthesizes grounded findings ->
evidence package with citations -> human review or policy-aware workflow decision

That kind of retrieval layer would improve accuracy in a few ways:

reduce hallucinated or overly generic classifications
anchor recommendations to the actual policy language being enforced
make it easier to explain why a merchant was flagged
preserve consistency across reviewers by reusing prior decisions and policy interpretations

The important part is that retrieval would not replace the current rules and signals. It would make the evidence package more grounded, more auditable, and easier for reviewers to trust.

Shoppability and pricing signals

One useful part of the workflow was treating merchant quality signals as risk inputs:

is the site actually shoppable?
are prices visible and internally consistent?
do product titles, images, and descriptions line up?
does the declared merchant category match what the site appears to sell?

Those signals are not just UX concerns. They often become early indicators of weak quality, evasiveness, policy violations, or merchant misrepresentation.

Regional nuance

Merchant behavior differs across markets. Good risk tooling has to preserve local nuance instead of assuming every signal generalizes cleanly.

Closing thought

The biggest takeaway from merchant vetting is that the hard part is not generating more signals. It is turning messy evidence into a review system that helps people make consistent decisions faster. LLMs can help with classification and evidence compression, but the real value still comes from structured signals, policy-aware logic, and human-review surfaces that keep uncertainty visible.

Research anchors

Role / ownership

Worked across data, workflow design, and trust-oriented system thinking rather than only point analytics
Focused on making review surfaces more actionable for operators and more legible for cross-functional partners
Helped shape a system that sits between policy interpretation, signal processing, and product workflow

Impact

Improved merchant vetting by making signal gathering, evidence review, merchant categorization, and lifecycle risk assessment more structured and less dependent on scattered manual investigation.

Stack

Python
SQL
Analytics
Trust & Safety
Workflow Design
Data Platform
Risk Systems
LLM-assisted classification

Technical design

Signal pipelines that combine merchant profile, website, catalog, policy, behavioral, and pricing context into a lifecycle-aware review surface
Structured review model separating raw observations, derived risk indicators, LLM-assisted classifications, and final human decisions
Decision-support views designed to help operators move from signal to evidence to action quickly
Feedback loops that make it possible to inspect override patterns, reviewer disagreement, false-positive clusters, and category-drift patterns over time

Engineering decisions

Kept human review visible instead of optimizing for fully automated approval or rejection flows
Modeled merchant risk as a lifecycle problem so earlier signals could be surfaced before downstream harm accumulated
Preferred evidence-backed risk factors over opaque recommendation-only interfaces
Used LLM APIs only as an evidence-compression and categorization layer, not as a final policy decision-maker
Designed for ambiguity, including regional nuance and conflicting signals, rather than assuming every merchant case fits a clean template

Tradeoffs

Richer evidence models improve review quality, though they add complexity to ingestion and maintenance
Human-in-the-loop design makes the workflow safer, but it limits how aggressively the system can automate
More lifecycle context improves early intervention, though it also increases the amount of ambiguity the product has to handle explicitly

Outcome / impact

Earlier visibility into suspicious merchants and policy-relevant risk patterns
Lower manual investigation burden because the workflow surfaced more actionable evidence up front
Better identification of likely violations, weak shoppability, suspicious pricing, and merchant-category mismatches before downstream harm grew
Stronger consistency in how trust decisions could be reasoned about across teams and regions

Lessons learned

Trust systems are most useful when they improve review quality, not just alert volume
Ambiguous domains need product surfaces that preserve uncertainty rather than hiding it behind a score
LLM APIs can be useful for categorization, summarization, and evidence extraction, but they should sit behind explicit policy rules and human review
The best risk tooling sits between operations, product judgment, and data quality instead of pretending those are separate layers