Knowledge Graph and Discovery Product

YourStory Startup Graph

Built a graph-based startup intelligence system to map founders, startups, investors, domains, and funding relationships.

Structured discovery Graph-backed product

Context

This was a three-month internship during my undergrad and one of the earliest projects that changed how I thought about systems. I worked directly with the CEO on a startup intelligence product that treated startup coverage as structured discovery data rather than static articles. It was also the period that pushed me fully into coding and product-building as a real discipline.

Problem

Startup media is useful to read, but difficult to navigate systematically when users want to understand who invested where, which founders cluster together, or what adjacent companies exist within a domain.

What I Built

  • A graph database in Neo4j for startups, founders, investors, domains, and funding rounds
  • Relationship-driven discovery logic that made investor-style exploration more useful than flat search
  • Java service logic for ingestion, graph writes, and repeatable structured updates
  • Cypher query patterns for domain-level investor discovery, founder-investor traversal, and funding-pattern exploration
  • An Inshorts-style daily product concept for startup news that generated compact updates using funding stage, amount raised, investor names, and related startup signals

Notes

Overview

During a three-month internship at YourStory in undergrad, I worked on a problem that still feels current: how do you turn a stream of startup stories into something users can explore, not just read?

The answer I pursued was a graph-backed product. Rather than treating funding news as isolated text, I modeled startups, founders, investors, rounds, domains, and stories as connected entities. That shift turned the product from search into discovery.

Why a graph made sense

The startup ecosystem is naturally relational. Investors participate in rounds. Founders connect companies over time. Domains create clusters. Stories mention multiple entities at once. Once I framed the product that way, a graph was the cleanest representation of the actual problem.

At a practical level, the system needed to support questions like:

  • which investors are most active in a specific domain?
  • what similar founder or company clusters exist around this startup?
  • what rounds or entities connect to a recent story?

Those are relationship questions, not just text-search questions.

System shape

The pipeline conceptually looked like this:

Story input -> entity normalization -> graph upsert -> traversal/query layer -> editorial or discovery surface

The middle step, normalization, mattered more than anything else. Without canonical entity handling, a graph quietly becomes misleading. Different spellings of the same company or investor create duplicate nodes and distort the network.

The engineering lesson was that the graph itself was not the hard part. Trustworthy entity resolution was.

At the storage layer, the graph was organized around a simple but expressive property-graph model:

  • Startup
  • Founder
  • Investor
  • Domain
  • Round

with relationship types like:

  • FOUNDED_BY
  • FUNDED_BY
  • OPERATES_IN
  • RAISED_IN

That model made it possible to move through the ecosystem the way users naturally think about it. An investor does not want only “articles mentioning fintech.” They want to traverse from a startup to its founders, from a founder to adjacent companies, from a round to participating investors, and from there into domain clusters or follow-on patterns.

On top of that graph, I used Cypher queries for the actual discovery layer. The key value was not just storing connected data, but making multi-hop questions cheap to express:

  • which investors are repeatedly showing up in a domain
  • which founders connect otherwise separate startup clusters
  • which recent rounds create interesting adjacency between companies
  • which entities should appear together in a compact daily update

That was one of the first times I saw clearly that query design is really product design in another form.

Product surface

The graph enabled a more investor-style product surface. A user could move from one company to its founders, to the investors in a round, to other companies that shared those investors or domain patterns. That kind of traversal feels obvious once it exists, but it is hard to fake with traditional article archives.

Alongside the graph work, I also built toward a short-form startup updates product inspired by compact news formats. The interesting part there was not just templating text. It was using structured fields like funding stage, amount raised, participating investors, and startup/domain tags to make daily updates fast, legible, and consistent.

That became my first real exposure to the connection between data pipelines and product output. Once the entity structure is dependable, the same system can power search, traversal, and editorial surfaces without redoing the logic for every new view.

What this project taught me

This was one of the first projects where I was not just coding a component. I was thinking about the full chain:

  • how the world should be modeled
  • how data should be normalized
  • how queries reflect user intent
  • how editorial output can be powered by structure rather than manual repetition

That combination of knowledge modeling, infrastructure, and user-facing product thinking has stayed with me ever since.

It also made the direction feel personal. This was the point in undergrad where coding stopped feeling abstract and started feeling like the way I wanted to think and build.

Closing thought

What stayed with me from this project was not just the graph itself. It was the realization that product usefulness often depends on whether the data model matches how people actually think. Once startup coverage was structured as relationships instead of isolated articles, the product became easier to explore, easier to query, and more aligned with real user intent. That was one of the earliest moments where backend modeling, discovery design, and product thinking all clicked together for me.