Project

Distributed Web Search Engine

Crawler, indexer, PageRank, KVS, and RDD-like Flame layer in Java.

JavaConcurrencyNetworkingDistributed SystemsCustom KVSRDD-style Compute

Distributed Web Search Engine architecture preview

Overview

End-to-end distributed search platform with a custom crawler, indexer, PageRank, replicated key-value store, multithreaded web server, and an RDD-style compute layer named Flame.

Problem

Building a working search stack from scratch means coordinating crawling, storage, ranking, and serving across multiple machines without leaning on existing distributed frameworks.

What I built

Built every layer in Java: a polite multi-threaded crawler, an indexer, a PageRank implementation, a custom partitioned and replicated key-value store, a multithreaded web server, and a Spark-RDD-style compute layer called Flame that powers the indexing and ranking jobs.

Architecture

Crawler — multi-threaded fetcher with politeness rules and URL frontier
KVS — partitioned, replicated key-value store with custom wire protocol
Flame — RDD-like compute layer with map/filter/fold primitives backed by the KVS
Indexer + PageRank — Flame jobs that produce searchable indexes and ranking scores
Web server — multithreaded request handler that serves the search UI

Technical highlights

Implemented custom networking, concurrent I/O, partitioning, replication, and distributed computation primitives across storage, compute, and serving layers.
Debugged crawler, indexer, PageRank, storage, and server interactions end-to-end to improve reliability across the pipeline.

Impact

Built a working full-stack search platform without relying on existing distributed frameworks, exercising every layer of a search system from raw sockets to ranking.