U.C. Berkeley Technical
Report UCB//CSD-02-1215,
November 2002.
As new and interesting peer-to-peer applications combine with advancements
in networking technology, they are reaching millions of users across the
globe. Numerous studies have shown, however, that loss of connectivity is
common on the wide-area network, due to hardware and software failures,
and network misconfigurations. Despite the natural redundancy present in
underlying network links, the current IP layer fails to recognize and
recover from these frequent failures in a timely fashion. This paper
presents fault-tolerant routing on the Tapestry overlay network, which
exploits existing network redundancy by dynamically switching traffic onto
precomputed alternate routes. Furthermore, messages in our system can be
duplicated and multicast ``around'' network congestion and failure hotspots
with rapid reconvergence to drop duplicates. Our simulations show
fault-tolerant Tapestry to be highly effective at circumventing link and
node failures, with reasonable cost in terms of additional routing latency
and bandwidth cost.