Many companies provide online search facilities that enable users to conduct computerized searches for documents. Unfortunately, these searches frequently provide results that include duplicate documents—that is, documents that are completely or substantially identical to each other. This problem is...http://www.google.co.uk/patents/US7809695?utm_source=gb-gplus-sharePatent US7809695 - Information retrieval systems with duplicate document detection and presentation functions