A system and a method are described for rapidly determining document similarity among a set of documents, such as a set of documents obtained from an information retrieval (IR) system. A ranked list of the most important terms in each document is obtained using a phrase recognizer system. The list is...http://www.google.co.uk/patents/US20030172066?utm_source=gb-gplus-sharePatent US20030172066 - System and method for detecting duplicate and similar documents