WO2008019133A3 - Detecting duplicate and near-duplicate files - Google Patents

Detecting duplicate and near-duplicate files Download PDF

Info

Publication number
WO2008019133A3
WO2008019133A3 PCT/US2007/017487 US2007017487W WO2008019133A3 WO 2008019133 A3 WO2008019133 A3 WO 2008019133A3 US 2007017487 W US2007017487 W US 2007017487W WO 2008019133 A3 WO2008019133 A3 WO 2008019133A3
Authority
WO
WIPO (PCT)
Prior art keywords
technique
documents
duplicate
determine whether
duplicates
Prior art date
Application number
PCT/US2007/017487
Other languages
French (fr)
Other versions
WO2008019133A2 (en
WO2008019133A9 (en
Inventor
Monika H Henzinger
Original Assignee
Google Inc
Monika H Henzinger
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google Inc, Monika H Henzinger filed Critical Google Inc
Priority to CN2007800366340A priority Critical patent/CN101523343B/en
Priority to EP20070836544 priority patent/EP2054797A4/en
Priority to CA2660202A priority patent/CA2660202C/en
Publication of WO2008019133A2 publication Critical patent/WO2008019133A2/en
Publication of WO2008019133A3 publication Critical patent/WO2008019133A3/en
Publication of WO2008019133A9 publication Critical patent/WO2008019133A9/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Abstract

Near duplicate documents may be identified by processing an accepted set of documents to determine a first set of near duplicate documents using a first technique, and processing the first set to determine a second set of near duplicate documents using a second technique. The first technique might be token order dependent, and the second technique might be order independent. The first technique might be token frequency independent, and the second technique might be frequency dependent. The first technique might determine whether two documents are near duplicates using representations based on a subset of the words or tokens of the documents, and the second technique might determine whether two documents are near duplicates using representations based on all of the words or tokens of the documents. The first technique might use set intersection to determine whether or not documents are near duplicates, and the second technique might use random projections to determine whether or not documents are near duplicates.
PCT/US2007/017487 2006-08-04 2007-08-03 Detecting duplicate and near-duplicate files WO2008019133A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN2007800366340A CN101523343B (en) 2006-08-04 2007-08-03 Detecting duplicate and near-duplicate files
EP20070836544 EP2054797A4 (en) 2006-08-04 2007-08-03 Detecting duplicate and near-duplicate files
CA2660202A CA2660202C (en) 2006-08-04 2007-08-03 Detecting duplicate and near-duplicate files

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/499,260 US8015162B2 (en) 2006-08-04 2006-08-04 Detecting duplicate and near-duplicate files
US11/499,260 2006-08-04

Publications (3)

Publication Number Publication Date
WO2008019133A2 WO2008019133A2 (en) 2008-02-14
WO2008019133A3 true WO2008019133A3 (en) 2008-11-20
WO2008019133A9 WO2008019133A9 (en) 2009-04-30

Family

ID=39033519

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/017487 WO2008019133A2 (en) 2006-08-04 2007-08-03 Detecting duplicate and near-duplicate files

Country Status (5)

Country Link
US (2) US8015162B2 (en)
EP (1) EP2054797A4 (en)
CN (2) CN102982053B (en)
CA (1) CA2660202C (en)
WO (1) WO2008019133A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102033962B (en) * 2010-12-31 2012-05-30 中国传媒大学 File data replication method for quick deduplication

Families Citing this family (98)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6658423B1 (en) * 2001-01-24 2003-12-02 Google, Inc. Detecting duplicate and near-duplicate files
US7685126B2 (en) 2001-08-03 2010-03-23 Isilon Systems, Inc. System and methods for providing a distributed file system utilizing metadata to track information about data stored throughout the system
US7146524B2 (en) 2001-08-03 2006-12-05 Isilon Systems, Inc. Systems and methods for providing a distributed file system incorporating a virtual hot spare
EP2284735A1 (en) 2002-11-14 2011-02-16 Isilon Systems, Inc. Systems and methods for restriping files in a distributed file system
US7739363B1 (en) * 2003-05-09 2010-06-15 Apple Inc. Configurable offline data store
US8051425B2 (en) 2004-10-29 2011-11-01 Emc Corporation Distributed system with asynchronous execution systems and methods
US8055711B2 (en) * 2004-10-29 2011-11-08 Emc Corporation Non-blocking commit protocol systems and methods
US8238350B2 (en) 2004-10-29 2012-08-07 Emc Corporation Message batching with checkpoints systems and methods
US7797283B2 (en) 2005-10-21 2010-09-14 Isilon Systems, Inc. Systems and methods for maintaining distributed data
US7551572B2 (en) * 2005-10-21 2009-06-23 Isilon Systems, Inc. Systems and methods for providing variable protection
US7917474B2 (en) * 2005-10-21 2011-03-29 Isilon Systems, Inc. Systems and methods for accessing and updating distributed data
US7788303B2 (en) 2005-10-21 2010-08-31 Isilon Systems, Inc. Systems and methods for distributed system scanning
US7848261B2 (en) * 2006-02-17 2010-12-07 Isilon Systems, Inc. Systems and methods for providing a quiescing protocol
US8370455B2 (en) * 2006-03-09 2013-02-05 24/7 Media Systems and methods for mapping media content to web sites
US7756898B2 (en) * 2006-03-31 2010-07-13 Isilon Systems, Inc. Systems and methods for notifying listeners of events
US7676465B2 (en) * 2006-07-05 2010-03-09 Yahoo! Inc. Techniques for clustering structurally similar web pages based on page features
US7941420B2 (en) * 2007-08-14 2011-05-10 Yahoo! Inc. Method for organizing structurally similar web pages from a web site
US8539056B2 (en) * 2006-08-02 2013-09-17 Emc Corporation Systems and methods for configuring multiple network interfaces
US7899800B2 (en) 2006-08-18 2011-03-01 Isilon Systems, Inc. Systems and methods for providing nonlinear journaling
US7752402B2 (en) * 2006-08-18 2010-07-06 Isilon Systems, Inc. Systems and methods for allowing incremental journaling
US7590652B2 (en) * 2006-08-18 2009-09-15 Isilon Systems, Inc. Systems and methods of reverse lookup
US7676691B2 (en) 2006-08-18 2010-03-09 Isilon Systems, Inc. Systems and methods for providing nonlinear journaling
US7680842B2 (en) * 2006-08-18 2010-03-16 Isilon Systems, Inc. Systems and methods for a snapshot of data
US7953704B2 (en) * 2006-08-18 2011-05-31 Emc Corporation Systems and methods for a snapshot of data
US7680836B2 (en) * 2006-08-18 2010-03-16 Isilon Systems, Inc. Systems and methods for a snapshot of data
US7882071B2 (en) 2006-08-18 2011-02-01 Isilon Systems, Inc. Systems and methods for a snapshot of data
US7822932B2 (en) * 2006-08-18 2010-10-26 Isilon Systems, Inc. Systems and methods for providing nonlinear journaling
US8286029B2 (en) 2006-12-21 2012-10-09 Emc Corporation Systems and methods for managing unavailable storage devices
US7593938B2 (en) * 2006-12-22 2009-09-22 Isilon Systems, Inc. Systems and methods of directory entry encodings
US8234277B2 (en) * 2006-12-29 2012-07-31 Intel Corporation Image-based retrieval for high quality visual or acoustic rendering
US7509448B2 (en) 2007-01-05 2009-03-24 Isilon Systems, Inc. Systems and methods for managing semantic locks
US7779048B2 (en) * 2007-04-13 2010-08-17 Isilon Systems, Inc. Systems and methods of providing possible value ranges
US8966080B2 (en) * 2007-04-13 2015-02-24 Emc Corporation Systems and methods of managing resource utilization on a threaded computer system
US7900015B2 (en) * 2007-04-13 2011-03-01 Isilon Systems, Inc. Systems and methods of quota accounting
US7698317B2 (en) * 2007-04-20 2010-04-13 Yahoo! Inc. Techniques for detecting duplicate web pages
US20090012984A1 (en) 2007-07-02 2009-01-08 Equivio Ltd. Method for Organizing Large Numbers of Documents
US7882068B2 (en) 2007-08-21 2011-02-01 Isilon Systems, Inc. Systems and methods for adaptive copy on write
US7949692B2 (en) 2007-08-21 2011-05-24 Emc Corporation Systems and methods for portals into snapshot data
US7966289B2 (en) * 2007-08-21 2011-06-21 Emc Corporation Systems and methods for reading objects in a file system
US7895225B1 (en) * 2007-12-06 2011-02-22 Amazon Technologies, Inc. Identifying potential duplicates of a document in a document corpus
US8131751B1 (en) 2008-01-18 2012-03-06 Google Inc. Algorithms for selecting subsequences
US8184953B1 (en) * 2008-02-22 2012-05-22 Google Inc. Selection of hash lookup keys for efficient retrieval
US8239387B2 (en) * 2008-02-22 2012-08-07 Yahoo! Inc. Structural clustering and template identification for electronic documents
US7870345B2 (en) 2008-03-27 2011-01-11 Isilon Systems, Inc. Systems and methods for managing stalled storage devices
US7953709B2 (en) * 2008-03-27 2011-05-31 Emc Corporation Systems and methods for a read only mode for a portion of a storage system
US7984324B2 (en) 2008-03-27 2011-07-19 Emc Corporation Systems and methods for managing stalled storage devices
US7949636B2 (en) * 2008-03-27 2011-05-24 Emc Corporation Systems and methods for a read only mode for a portion of a storage system
US7962523B2 (en) * 2008-04-11 2011-06-14 Yahoo! Inc. System and method for detecting templates of a website using hyperlink analysis
US7930306B2 (en) * 2008-04-30 2011-04-19 Msc Intellectual Properties B.V. System and method for near and exact de-duplication of documents
US8121991B1 (en) 2008-12-19 2012-02-21 Google Inc. Identifying transient paths within websites
US8086953B1 (en) * 2008-12-19 2011-12-27 Google Inc. Identifying transient portions of web pages
US8862691B2 (en) * 2008-12-22 2014-10-14 Microsoft Corporation Media aggregation and presentation
US20100169311A1 (en) * 2008-12-30 2010-07-01 Ashwin Tengli Approaches for the unsupervised creation of structural templates for electronic documents
US20150010143A1 (en) * 2009-04-30 2015-01-08 HGST Netherlands B.V. Systems and methods for signature computation in a content locality based cache
US9176883B2 (en) 2009-04-30 2015-11-03 HGST Netherlands B.V. Storage of data reference blocks and deltas in different storage devices
US8180773B2 (en) * 2009-05-27 2012-05-15 International Business Machines Corporation Detecting duplicate documents using classification
CN101788976B (en) * 2010-02-10 2012-05-09 北京播思软件技术有限公司 File splitting method based on contents
US8650195B2 (en) * 2010-03-26 2014-02-11 Palle M Pedersen Region based information retrieval system
US8825641B2 (en) 2010-11-09 2014-09-02 Microsoft Corporation Measuring duplication in search results
US8594239B2 (en) * 2011-02-21 2013-11-26 Microsoft Corporation Estimating document similarity using bit-strings
DE212011100098U1 (en) * 2011-04-28 2013-01-10 Google Inc. Present search results for gallery web pages
US20120290678A1 (en) * 2011-05-12 2012-11-15 International Business Machines Corporation Dynamic, user-driven service catalog
US9501455B2 (en) * 2011-06-30 2016-11-22 The Boeing Company Systems and methods for processing data
US9407463B2 (en) * 2011-07-11 2016-08-02 Aol Inc. Systems and methods for providing a spam database and identifying spam communications
US8954458B2 (en) 2011-07-11 2015-02-10 Aol Inc. Systems and methods for providing a content item database and identifying content items
US8521769B2 (en) 2011-07-25 2013-08-27 The Boeing Company Locating ambiguities in data
US8484170B2 (en) * 2011-09-19 2013-07-09 International Business Machines Corporation Scalable deduplication system with small blocks
US20130086083A1 (en) * 2011-09-30 2013-04-04 Microsoft Corporation Transferring ranking signals from equivalent pages
US20130097704A1 (en) * 2011-10-13 2013-04-18 Bitdefender IPR Management Ltd. Handling Noise in Training Data for Malware Detection
US8914668B2 (en) 2012-09-06 2014-12-16 International Business Machines Corporation Asynchronous raid stripe writes to enable response to media errors
US8843784B2 (en) 2012-09-06 2014-09-23 International Business Machines Corporation Remapping disk drive I/O in response to media errors
US8843493B1 (en) * 2012-09-18 2014-09-23 Narus, Inc. Document fingerprint
US20140156624A1 (en) * 2012-12-04 2014-06-05 Microsoft Corporation Producing, Archiving and Searching Social Content
US9563677B2 (en) 2012-12-11 2017-02-07 Melissa Data Corp. Systems and methods for clustered matching of records using geographic proximity
US9111183B2 (en) 2013-01-04 2015-08-18 International Business Machines Corporation Performing a comparison between two images which are scaled to a common resolution
US9218701B2 (en) 2013-05-28 2015-12-22 Bank Of America Corporation Image overlay for duplicate image detection
US9213820B2 (en) * 2013-09-10 2015-12-15 Ebay Inc. Mobile authentication using a wearable device
JP6386089B2 (en) * 2014-06-26 2018-09-05 グーグル エルエルシー Optimized browser rendering process
CN106462582B (en) 2014-06-26 2020-05-15 谷歌有限责任公司 Batch optimized rendering and fetching architecture
JP6211722B2 (en) 2014-06-26 2017-10-11 グーグル インコーポレイテッド Optimized browser rendering process
US9607029B1 (en) * 2014-12-17 2017-03-28 Amazon Technologies, Inc. Optimized mapping of documents to candidate duplicate documents in a document corpus
CN104850574B (en) * 2015-02-15 2018-07-06 博彦科技股份有限公司 A kind of filtering sensitive words method of text-oriented information
US10147107B2 (en) * 2015-06-26 2018-12-04 Microsoft Technology Licensing, Llc Social sketches
US10381108B2 (en) * 2015-09-16 2019-08-13 Charles Jianping Zhou Web search and information aggregation by way of molecular network
CN105760445A (en) * 2016-02-03 2016-07-13 北京光年无限科技有限公司 Junk word filtering method and system
CN105893463B (en) * 2016-03-23 2019-11-05 广州酷狗计算机科技有限公司 Album input method and device
WO2018148591A1 (en) * 2017-02-10 2018-08-16 Secured FTP Hosting, LLC d/b/a SmartFile System for describing and tracking the creation and evolution of digital files
US10417269B2 (en) 2017-03-13 2019-09-17 Lexisnexis, A Division Of Reed Elsevier Inc. Systems and methods for verbatim-text mining
WO2019098732A1 (en) 2017-11-16 2019-05-23 Samsung Electronics Co., Ltd. Method and system for management and operation over image in a computing system
US10261784B1 (en) * 2018-06-20 2019-04-16 Terbium Labs, Inc. Detecting copied computer code using cryptographically hashed overlapping shingles
GB201821327D0 (en) 2018-12-31 2019-02-13 Transversal Ltd A system and method for discriminating removing boilerplate text in documents comprising structured labelled text elements
CN112131340B (en) * 2019-06-25 2024-02-20 杭州萤石软件有限公司 Character string detection method, device and storage medium
JP7400543B2 (en) * 2020-02-28 2023-12-19 富士フイルムビジネスイノベーション株式会社 Information processing device and program
CN111367563B (en) * 2020-03-06 2024-02-23 中国银行股份有限公司 Host version merging method and device
US11526506B2 (en) * 2020-05-14 2022-12-13 Code42 Software, Inc. Related file analysis
US11726779B2 (en) * 2021-11-03 2023-08-15 Sap Se Code simplification system
US11797486B2 (en) 2022-01-03 2023-10-24 Bank Of America Corporation File de-duplication for a distributed database
CN114091428A (en) * 2022-01-20 2022-02-25 北京搜狐互联网信息服务有限公司 Method for determining duplication of information content, related device and computer storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6119124A (en) * 1998-03-26 2000-09-12 Digital Equipment Corporation Method for clustering closely resembling data objects
US6871200B2 (en) * 2002-07-11 2005-03-22 Forensic Eye Ltd. Registration and monitoring system

Family Cites Families (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6850252B1 (en) * 1999-10-05 2005-02-01 Steven M. Hoffberg Intelligent electronic appliance system and method
US5465299A (en) * 1992-12-03 1995-11-07 Hitachi, Ltd. Electronic document processing system and method of forming digital signature
US5850490A (en) * 1993-12-22 1998-12-15 Xerox Corporation Analyzing an image of a document using alternative positionings of a class of segments
US6505160B1 (en) * 1995-07-27 2003-01-07 Digimarc Corporation Connected audio and other media objects
US5778395A (en) * 1995-10-23 1998-07-07 Stac, Inc. System for backing up files from disk volumes on multiple nodes of a computer network
US5909677A (en) * 1996-06-18 1999-06-01 Digital Equipment Corporation Method for determining the resemblance of documents
US6052693A (en) * 1996-07-02 2000-04-18 Harlequin Group Plc System for assembling large databases through information extracted from text sources
US5745900A (en) * 1996-08-09 1998-04-28 Digital Equipment Corporation Method for indexing duplicate database records using a full-record fingerprint
US6285999B1 (en) * 1997-01-10 2001-09-04 The Board Of Trustees Of The Leland Stanford Junior University Method for node ranking in a linked database
US6088707A (en) * 1997-10-06 2000-07-11 International Business Machines Corporation Computer system and method of displaying update status of linked hypertext documents
US6134532A (en) * 1997-11-14 2000-10-17 Aptex Software, Inc. System and method for optimal adaptive matching of users to most relevant entity and information in real-time
US6263348B1 (en) * 1998-07-01 2001-07-17 Serena Software International, Inc. Method and apparatus for identifying the existence of differences between two files
US6363377B1 (en) * 1998-07-30 2002-03-26 Sarnoff Corporation Search data processor
US6240409B1 (en) * 1998-07-31 2001-05-29 The Regents Of The University Of California Method and apparatus for detecting and summarizing document similarity within large document sets
US6317722B1 (en) * 1998-09-18 2001-11-13 Amazon.Com, Inc. Use of electronic shopping carts to generate personal recommendations
US6360215B1 (en) * 1998-11-03 2002-03-19 Inktomi Corporation Method and apparatus for retrieving documents based on information other than document content
JP2000187668A (en) * 1998-12-22 2000-07-04 Hitachi Ltd Grouping method and overlap excluding method
US6873982B1 (en) * 1999-07-16 2005-03-29 International Business Machines Corporation Ordering of database search results based on user feedback
US6718363B1 (en) * 1999-07-30 2004-04-06 Verizon Laboratories, Inc. Page aggregation for web sites
US6665661B1 (en) * 2000-09-29 2003-12-16 Battelle Memorial Institute System and method for use in text analysis of documents and records
US6978419B1 (en) * 2000-11-15 2005-12-20 Justsystem Corporation Method and apparatus for efficient identification of duplicate and near-duplicate documents and text spans using high-discriminability text fragments
US6658423B1 (en) 2001-01-24 2003-12-02 Google, Inc. Detecting duplicate and near-duplicate files
US7203343B2 (en) * 2001-09-21 2007-04-10 Hewlett-Packard Development Company, L.P. System and method for determining likely identity in a biometric database
US20040139072A1 (en) * 2003-01-13 2004-07-15 Broder Andrei Z. System and method for locating similar records in a database
US20040210575A1 (en) * 2003-04-18 2004-10-21 Bean Douglas M. Systems and methods for eliminating duplicate documents
US8296304B2 (en) * 2004-01-26 2012-10-23 International Business Machines Corporation Method, system, and program for handling redirects in a search engine
EP1776629A4 (en) * 2004-07-21 2011-05-04 Equivio Ltd A method for determining near duplicate data objects
US7966327B2 (en) * 2004-11-08 2011-06-21 The Trustees Of Princeton University Similarity search system with compact data structures
US20060149820A1 (en) * 2005-01-04 2006-07-06 International Business Machines Corporation Detecting spam e-mail using similarity calculations
US7739314B2 (en) * 2005-08-15 2010-06-15 Google Inc. Scalable user clustering based on set similarity
US7747614B2 (en) * 2005-10-31 2010-06-29 Yahoo! Inc. Difference control for generating and displaying a difference result set from the result sets of a plurality of search engines
US7472121B2 (en) * 2005-12-15 2008-12-30 International Business Machines Corporation Document comparison using multiple similarity measures

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6119124A (en) * 1998-03-26 2000-09-12 Digital Equipment Corporation Method for clustering closely resembling data objects
US6871200B2 (en) * 2002-07-11 2005-03-22 Forensic Eye Ltd. Registration and monitoring system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHARIKAR M.: "Similarity Estimation Techniques from Rounding Algorithms", 2002, pages 380 - 388, XP008103597 *
See also references of EP2054797A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102033962B (en) * 2010-12-31 2012-05-30 中国传媒大学 File data replication method for quick deduplication

Also Published As

Publication number Publication date
CA2660202C (en) 2013-03-12
CA2660202A1 (en) 2008-02-14
CN101523343A (en) 2009-09-02
WO2008019133A2 (en) 2008-02-14
US20120290597A1 (en) 2012-11-15
US20080044016A1 (en) 2008-02-21
CN101523343B (en) 2012-07-04
WO2008019133A9 (en) 2009-04-30
US8015162B2 (en) 2011-09-06
CN102982053A (en) 2013-03-20
EP2054797A4 (en) 2013-09-04
CN102982053B (en) 2016-08-10
EP2054797A2 (en) 2009-05-06

Similar Documents

Publication Publication Date Title
WO2008019133A3 (en) Detecting duplicate and near-duplicate files
WO2011017658A3 (en) Document layout system
WO2010092423A8 (en) Music profiling
WO2006124952A3 (en) The information nervous system
WO2008115670A3 (en) System and method for identifying content
WO2006096428A3 (en) Data processing systems and methods
WO2008094433A3 (en) Method and apparatus to store data patterns
WO2007120625A3 (en) Secure and granular index for information retrieval
WO2007130675A3 (en) Methods and systems for reporting regions of interest in content files
MX2010000619A (en) Systems and processes for obtaining and managing electronic signatures for real estate transaction documents.
WO2007084836A3 (en) Match-based employment system and method
WO2005086738A3 (en) Data structure with market capitalization breakdown
WO2007100916A3 (en) Systems, methods, and media for outputting a dataset based upon anomaly detection
WO2008044004A3 (en) Improvements relating to the detection of patterns
WO2008068450A3 (en) Improvements in resisting the spread of unwanted code and data
WO2006026733A3 (en) A method of designing a probe card apparatus with desired compliance characteristics
FR2890665B1 (en) SECURE ARTICLE, IN PARTICULAR A DOCUMENT OF SECURITY AND / OR VALUE.
WO2006118896A3 (en) Method and apparatus for detecting the falsification of metadata
WO2006010114A3 (en) Disambiguating ambiguous characters
TW200638335A (en) Audio metadata verification
WO2009036392A3 (en) Multi-modal relevancy matching
WO2006023718A3 (en) Locating electronic instances of documents based on rendered instances, document fragment digest generation, and digest based document fragment determination
GB2484879A (en) Method and apparatus for security validation of input data
WO2009054839A3 (en) Template based matching
GB201312856D0 (en) Malware Detection

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200780036634.0

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07836544

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 2660202

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: RU

WWE Wipo information: entry into national phase

Ref document number: 2007836544

Country of ref document: EP