Appears in Proceedings of the
ACM/IFIP/USENIX International Middleware Conference (Middleware 2003),
Recent work in P2P overlay networks allow for decentralized object
location and routing (DOLR) across networks based on unique IDs. In this
paper, we propose an extension to DOLR systems to publish objects using
generic feature vectors instead of content-hashed GUIDs, which enables
the systems to locate similar objects. We discuss the design of a distributed
text similarity engine, named Approximate Text Addressing (ATA), built
on top of this extension that locates objects by their text descriptions.
We then outline the design and implementation of a motivating application
on ATA, a decentralized spam-filtering service. We evaluate this system
with 30,000 real spam email messages and 10,000 non-spam messages, and
find a spam identification ratio of over 97% with zero false positives.