Sirui Chen·sirui.dev

Project

Distributed Web Search Engine

Crawler, indexer, PageRank, KVS, and RDD-like Flame layer in Java.

JavaConcurrencyNetworkingDistributed SystemsCustom KVSRDD-style Compute

Overview

End-to-end distributed search platform with a custom crawler, indexer, PageRank, replicated key-value store, multithreaded web server, and an RDD-style compute layer named Flame.

Problem

Building a working search stack from scratch means coordinating crawling, storage, ranking, and serving across multiple machines without leaning on existing distributed frameworks.

What I built

Built every layer in Java: a polite multi-threaded crawler, an indexer, a PageRank implementation, a custom partitioned and replicated key-value store, a multithreaded web server, and a Spark-RDD-style compute layer called Flame that powers the indexing and ranking jobs.

Architecture

  • Crawler — multi-threaded fetcher with politeness rules and URL frontier
  • KVS — partitioned, replicated key-value store with custom wire protocol
  • Flame — RDD-like compute layer with map/filter/fold primitives backed by the KVS
  • Indexer + PageRank — Flame jobs that produce searchable indexes and ranking scores
  • Web server — multithreaded request handler that serves the search UI

Technical highlights

  • Implemented custom networking, concurrent I/O, partitioning, replication, and distributed computation primitives across storage, compute, and serving layers.
  • Debugged crawler, indexer, PageRank, storage, and server interactions end-to-end to improve reliability across the pipeline.

Impact

  • Built a working full-stack search platform without relying on existing distributed frameworks, exercising every layer of a search system from raw sockets to ranking.