Project
Distributed Web Search Engine
Crawler, indexer, PageRank, KVS, and RDD-like Flame layer in Java.
JavaConcurrencyNetworkingDistributed SystemsCustom KVSRDD-style Compute
Overview
End-to-end distributed search platform with a custom crawler, indexer, PageRank, replicated key-value store, multithreaded web server, and an RDD-style compute layer named Flame.
Problem
Building a working search stack from scratch means coordinating crawling, storage, ranking, and serving across multiple machines without leaning on existing distributed frameworks.
What I built
Built every layer in Java: a polite multi-threaded crawler, an indexer, a PageRank implementation, a custom partitioned and replicated key-value store, a multithreaded web server, and a Spark-RDD-style compute layer called Flame that powers the indexing and ranking jobs.
Architecture
- Crawler — multi-threaded fetcher with politeness rules and URL frontier
- KVS — partitioned, replicated key-value store with custom wire protocol
- Flame — RDD-like compute layer with map/filter/fold primitives backed by the KVS
- Indexer + PageRank — Flame jobs that produce searchable indexes and ranking scores
- Web server — multithreaded request handler that serves the search UI
Technical highlights
- Implemented custom networking, concurrent I/O, partitioning, replication, and distributed computation primitives across storage, compute, and serving layers.
- Debugged crawler, indexer, PageRank, storage, and server interactions end-to-end to improve reliability across the pipeline.
Impact
- Built a working full-stack search platform without relying on existing distributed frameworks, exercising every layer of a search system from raw sockets to ranking.