WO2008019133A3 - Detecting duplicate and near-duplicate files - Google Patents
Detecting duplicate and near-duplicate files Download PDFInfo
- Publication number
- WO2008019133A3 WO2008019133A3 PCT/US2007/017487 US2007017487W WO2008019133A3 WO 2008019133 A3 WO2008019133 A3 WO 2008019133A3 US 2007017487 W US2007017487 W US 2007017487W WO 2008019133 A3 WO2008019133 A3 WO 2008019133A3
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- technique
- documents
- duplicate
- determine whether
- duplicates
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Abstract
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2007800366340A CN101523343B (en) | 2006-08-04 | 2007-08-03 | Detecting duplicate and near-duplicate files |
EP20070836544 EP2054797A4 (en) | 2006-08-04 | 2007-08-03 | Detecting duplicate and near-duplicate files |
CA2660202A CA2660202C (en) | 2006-08-04 | 2007-08-03 | Detecting duplicate and near-duplicate files |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/499,260 US8015162B2 (en) | 2006-08-04 | 2006-08-04 | Detecting duplicate and near-duplicate files |
US11/499,260 | 2006-08-04 |
Publications (3)
Publication Number | Publication Date |
---|---|
WO2008019133A2 WO2008019133A2 (en) | 2008-02-14 |
WO2008019133A3 true WO2008019133A3 (en) | 2008-11-20 |
WO2008019133A9 WO2008019133A9 (en) | 2009-04-30 |
Family
ID=39033519
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2007/017487 WO2008019133A2 (en) | 2006-08-04 | 2007-08-03 | Detecting duplicate and near-duplicate files |
Country Status (5)
Country | Link |
---|---|
US (2) | US8015162B2 (en) |
EP (1) | EP2054797A4 (en) |
CN (2) | CN102982053B (en) |
CA (1) | CA2660202C (en) |
WO (1) | WO2008019133A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102033962B (en) * | 2010-12-31 | 2012-05-30 | 中国传媒大学 | File data replication method for quick deduplication |
Families Citing this family (98)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6658423B1 (en) * | 2001-01-24 | 2003-12-02 | Google, Inc. | Detecting duplicate and near-duplicate files |
US7685126B2 (en) | 2001-08-03 | 2010-03-23 | Isilon Systems, Inc. | System and methods for providing a distributed file system utilizing metadata to track information about data stored throughout the system |
US7146524B2 (en) | 2001-08-03 | 2006-12-05 | Isilon Systems, Inc. | Systems and methods for providing a distributed file system incorporating a virtual hot spare |
EP2284735A1 (en) | 2002-11-14 | 2011-02-16 | Isilon Systems, Inc. | Systems and methods for restriping files in a distributed file system |
US7739363B1 (en) * | 2003-05-09 | 2010-06-15 | Apple Inc. | Configurable offline data store |
US8051425B2 (en) | 2004-10-29 | 2011-11-01 | Emc Corporation | Distributed system with asynchronous execution systems and methods |
US8055711B2 (en) * | 2004-10-29 | 2011-11-08 | Emc Corporation | Non-blocking commit protocol systems and methods |
US8238350B2 (en) | 2004-10-29 | 2012-08-07 | Emc Corporation | Message batching with checkpoints systems and methods |
US7797283B2 (en) | 2005-10-21 | 2010-09-14 | Isilon Systems, Inc. | Systems and methods for maintaining distributed data |
US7551572B2 (en) * | 2005-10-21 | 2009-06-23 | Isilon Systems, Inc. | Systems and methods for providing variable protection |
US7917474B2 (en) * | 2005-10-21 | 2011-03-29 | Isilon Systems, Inc. | Systems and methods for accessing and updating distributed data |
US7788303B2 (en) | 2005-10-21 | 2010-08-31 | Isilon Systems, Inc. | Systems and methods for distributed system scanning |
US7848261B2 (en) * | 2006-02-17 | 2010-12-07 | Isilon Systems, Inc. | Systems and methods for providing a quiescing protocol |
US8370455B2 (en) * | 2006-03-09 | 2013-02-05 | 24/7 Media | Systems and methods for mapping media content to web sites |
US7756898B2 (en) * | 2006-03-31 | 2010-07-13 | Isilon Systems, Inc. | Systems and methods for notifying listeners of events |
US7676465B2 (en) * | 2006-07-05 | 2010-03-09 | Yahoo! Inc. | Techniques for clustering structurally similar web pages based on page features |
US7941420B2 (en) * | 2007-08-14 | 2011-05-10 | Yahoo! Inc. | Method for organizing structurally similar web pages from a web site |
US8539056B2 (en) * | 2006-08-02 | 2013-09-17 | Emc Corporation | Systems and methods for configuring multiple network interfaces |
US7899800B2 (en) | 2006-08-18 | 2011-03-01 | Isilon Systems, Inc. | Systems and methods for providing nonlinear journaling |
US7752402B2 (en) * | 2006-08-18 | 2010-07-06 | Isilon Systems, Inc. | Systems and methods for allowing incremental journaling |
US7590652B2 (en) * | 2006-08-18 | 2009-09-15 | Isilon Systems, Inc. | Systems and methods of reverse lookup |
US7676691B2 (en) | 2006-08-18 | 2010-03-09 | Isilon Systems, Inc. | Systems and methods for providing nonlinear journaling |
US7680842B2 (en) * | 2006-08-18 | 2010-03-16 | Isilon Systems, Inc. | Systems and methods for a snapshot of data |
US7953704B2 (en) * | 2006-08-18 | 2011-05-31 | Emc Corporation | Systems and methods for a snapshot of data |
US7680836B2 (en) * | 2006-08-18 | 2010-03-16 | Isilon Systems, Inc. | Systems and methods for a snapshot of data |
US7882071B2 (en) | 2006-08-18 | 2011-02-01 | Isilon Systems, Inc. | Systems and methods for a snapshot of data |
US7822932B2 (en) * | 2006-08-18 | 2010-10-26 | Isilon Systems, Inc. | Systems and methods for providing nonlinear journaling |
US8286029B2 (en) | 2006-12-21 | 2012-10-09 | Emc Corporation | Systems and methods for managing unavailable storage devices |
US7593938B2 (en) * | 2006-12-22 | 2009-09-22 | Isilon Systems, Inc. | Systems and methods of directory entry encodings |
US8234277B2 (en) * | 2006-12-29 | 2012-07-31 | Intel Corporation | Image-based retrieval for high quality visual or acoustic rendering |
US7509448B2 (en) | 2007-01-05 | 2009-03-24 | Isilon Systems, Inc. | Systems and methods for managing semantic locks |
US7779048B2 (en) * | 2007-04-13 | 2010-08-17 | Isilon Systems, Inc. | Systems and methods of providing possible value ranges |
US8966080B2 (en) * | 2007-04-13 | 2015-02-24 | Emc Corporation | Systems and methods of managing resource utilization on a threaded computer system |
US7900015B2 (en) * | 2007-04-13 | 2011-03-01 | Isilon Systems, Inc. | Systems and methods of quota accounting |
US7698317B2 (en) * | 2007-04-20 | 2010-04-13 | Yahoo! Inc. | Techniques for detecting duplicate web pages |
US20090012984A1 (en) | 2007-07-02 | 2009-01-08 | Equivio Ltd. | Method for Organizing Large Numbers of Documents |
US7882068B2 (en) | 2007-08-21 | 2011-02-01 | Isilon Systems, Inc. | Systems and methods for adaptive copy on write |
US7949692B2 (en) | 2007-08-21 | 2011-05-24 | Emc Corporation | Systems and methods for portals into snapshot data |
US7966289B2 (en) * | 2007-08-21 | 2011-06-21 | Emc Corporation | Systems and methods for reading objects in a file system |
US7895225B1 (en) * | 2007-12-06 | 2011-02-22 | Amazon Technologies, Inc. | Identifying potential duplicates of a document in a document corpus |
US8131751B1 (en) | 2008-01-18 | 2012-03-06 | Google Inc. | Algorithms for selecting subsequences |
US8184953B1 (en) * | 2008-02-22 | 2012-05-22 | Google Inc. | Selection of hash lookup keys for efficient retrieval |
US8239387B2 (en) * | 2008-02-22 | 2012-08-07 | Yahoo! Inc. | Structural clustering and template identification for electronic documents |
US7870345B2 (en) | 2008-03-27 | 2011-01-11 | Isilon Systems, Inc. | Systems and methods for managing stalled storage devices |
US7953709B2 (en) * | 2008-03-27 | 2011-05-31 | Emc Corporation | Systems and methods for a read only mode for a portion of a storage system |
US7984324B2 (en) | 2008-03-27 | 2011-07-19 | Emc Corporation | Systems and methods for managing stalled storage devices |
US7949636B2 (en) * | 2008-03-27 | 2011-05-24 | Emc Corporation | Systems and methods for a read only mode for a portion of a storage system |
US7962523B2 (en) * | 2008-04-11 | 2011-06-14 | Yahoo! Inc. | System and method for detecting templates of a website using hyperlink analysis |
US7930306B2 (en) * | 2008-04-30 | 2011-04-19 | Msc Intellectual Properties B.V. | System and method for near and exact de-duplication of documents |
US8121991B1 (en) | 2008-12-19 | 2012-02-21 | Google Inc. | Identifying transient paths within websites |
US8086953B1 (en) * | 2008-12-19 | 2011-12-27 | Google Inc. | Identifying transient portions of web pages |
US8862691B2 (en) * | 2008-12-22 | 2014-10-14 | Microsoft Corporation | Media aggregation and presentation |
US20100169311A1 (en) * | 2008-12-30 | 2010-07-01 | Ashwin Tengli | Approaches for the unsupervised creation of structural templates for electronic documents |
US20150010143A1 (en) * | 2009-04-30 | 2015-01-08 | HGST Netherlands B.V. | Systems and methods for signature computation in a content locality based cache |
US9176883B2 (en) | 2009-04-30 | 2015-11-03 | HGST Netherlands B.V. | Storage of data reference blocks and deltas in different storage devices |
US8180773B2 (en) * | 2009-05-27 | 2012-05-15 | International Business Machines Corporation | Detecting duplicate documents using classification |
CN101788976B (en) * | 2010-02-10 | 2012-05-09 | 北京播思软件技术有限公司 | File splitting method based on contents |
US8650195B2 (en) * | 2010-03-26 | 2014-02-11 | Palle M Pedersen | Region based information retrieval system |
US8825641B2 (en) | 2010-11-09 | 2014-09-02 | Microsoft Corporation | Measuring duplication in search results |
US8594239B2 (en) * | 2011-02-21 | 2013-11-26 | Microsoft Corporation | Estimating document similarity using bit-strings |
DE212011100098U1 (en) * | 2011-04-28 | 2013-01-10 | Google Inc. | Present search results for gallery web pages |
US20120290678A1 (en) * | 2011-05-12 | 2012-11-15 | International Business Machines Corporation | Dynamic, user-driven service catalog |
US9501455B2 (en) * | 2011-06-30 | 2016-11-22 | The Boeing Company | Systems and methods for processing data |
US9407463B2 (en) * | 2011-07-11 | 2016-08-02 | Aol Inc. | Systems and methods for providing a spam database and identifying spam communications |
US8954458B2 (en) | 2011-07-11 | 2015-02-10 | Aol Inc. | Systems and methods for providing a content item database and identifying content items |
US8521769B2 (en) | 2011-07-25 | 2013-08-27 | The Boeing Company | Locating ambiguities in data |
US8484170B2 (en) * | 2011-09-19 | 2013-07-09 | International Business Machines Corporation | Scalable deduplication system with small blocks |
US20130086083A1 (en) * | 2011-09-30 | 2013-04-04 | Microsoft Corporation | Transferring ranking signals from equivalent pages |
US20130097704A1 (en) * | 2011-10-13 | 2013-04-18 | Bitdefender IPR Management Ltd. | Handling Noise in Training Data for Malware Detection |
US8914668B2 (en) | 2012-09-06 | 2014-12-16 | International Business Machines Corporation | Asynchronous raid stripe writes to enable response to media errors |
US8843784B2 (en) | 2012-09-06 | 2014-09-23 | International Business Machines Corporation | Remapping disk drive I/O in response to media errors |
US8843493B1 (en) * | 2012-09-18 | 2014-09-23 | Narus, Inc. | Document fingerprint |
US20140156624A1 (en) * | 2012-12-04 | 2014-06-05 | Microsoft Corporation | Producing, Archiving and Searching Social Content |
US9563677B2 (en) | 2012-12-11 | 2017-02-07 | Melissa Data Corp. | Systems and methods for clustered matching of records using geographic proximity |
US9111183B2 (en) | 2013-01-04 | 2015-08-18 | International Business Machines Corporation | Performing a comparison between two images which are scaled to a common resolution |
US9218701B2 (en) | 2013-05-28 | 2015-12-22 | Bank Of America Corporation | Image overlay for duplicate image detection |
US9213820B2 (en) * | 2013-09-10 | 2015-12-15 | Ebay Inc. | Mobile authentication using a wearable device |
JP6386089B2 (en) * | 2014-06-26 | 2018-09-05 | グーグル エルエルシー | Optimized browser rendering process |
CN106462582B (en) | 2014-06-26 | 2020-05-15 | 谷歌有限责任公司 | Batch optimized rendering and fetching architecture |
JP6211722B2 (en) | 2014-06-26 | 2017-10-11 | グーグル インコーポレイテッド | Optimized browser rendering process |
US9607029B1 (en) * | 2014-12-17 | 2017-03-28 | Amazon Technologies, Inc. | Optimized mapping of documents to candidate duplicate documents in a document corpus |
CN104850574B (en) * | 2015-02-15 | 2018-07-06 | 博彦科技股份有限公司 | A kind of filtering sensitive words method of text-oriented information |
US10147107B2 (en) * | 2015-06-26 | 2018-12-04 | Microsoft Technology Licensing, Llc | Social sketches |
US10381108B2 (en) * | 2015-09-16 | 2019-08-13 | Charles Jianping Zhou | Web search and information aggregation by way of molecular network |
CN105760445A (en) * | 2016-02-03 | 2016-07-13 | 北京光年无限科技有限公司 | Junk word filtering method and system |
CN105893463B (en) * | 2016-03-23 | 2019-11-05 | 广州酷狗计算机科技有限公司 | Album input method and device |
WO2018148591A1 (en) * | 2017-02-10 | 2018-08-16 | Secured FTP Hosting, LLC d/b/a SmartFile | System for describing and tracking the creation and evolution of digital files |
US10417269B2 (en) | 2017-03-13 | 2019-09-17 | Lexisnexis, A Division Of Reed Elsevier Inc. | Systems and methods for verbatim-text mining |
WO2019098732A1 (en) | 2017-11-16 | 2019-05-23 | Samsung Electronics Co., Ltd. | Method and system for management and operation over image in a computing system |
US10261784B1 (en) * | 2018-06-20 | 2019-04-16 | Terbium Labs, Inc. | Detecting copied computer code using cryptographically hashed overlapping shingles |
GB201821327D0 (en) | 2018-12-31 | 2019-02-13 | Transversal Ltd | A system and method for discriminating removing boilerplate text in documents comprising structured labelled text elements |
CN112131340B (en) * | 2019-06-25 | 2024-02-20 | 杭州萤石软件有限公司 | Character string detection method, device and storage medium |
JP7400543B2 (en) * | 2020-02-28 | 2023-12-19 | 富士フイルムビジネスイノベーション株式会社 | Information processing device and program |
CN111367563B (en) * | 2020-03-06 | 2024-02-23 | 中国银行股份有限公司 | Host version merging method and device |
US11526506B2 (en) * | 2020-05-14 | 2022-12-13 | Code42 Software, Inc. | Related file analysis |
US11726779B2 (en) * | 2021-11-03 | 2023-08-15 | Sap Se | Code simplification system |
US11797486B2 (en) | 2022-01-03 | 2023-10-24 | Bank Of America Corporation | File de-duplication for a distributed database |
CN114091428A (en) * | 2022-01-20 | 2022-02-25 | 北京搜狐互联网信息服务有限公司 | Method for determining duplication of information content, related device and computer storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6119124A (en) * | 1998-03-26 | 2000-09-12 | Digital Equipment Corporation | Method for clustering closely resembling data objects |
US6871200B2 (en) * | 2002-07-11 | 2005-03-22 | Forensic Eye Ltd. | Registration and monitoring system |
Family Cites Families (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6850252B1 (en) * | 1999-10-05 | 2005-02-01 | Steven M. Hoffberg | Intelligent electronic appliance system and method |
US5465299A (en) * | 1992-12-03 | 1995-11-07 | Hitachi, Ltd. | Electronic document processing system and method of forming digital signature |
US5850490A (en) * | 1993-12-22 | 1998-12-15 | Xerox Corporation | Analyzing an image of a document using alternative positionings of a class of segments |
US6505160B1 (en) * | 1995-07-27 | 2003-01-07 | Digimarc Corporation | Connected audio and other media objects |
US5778395A (en) * | 1995-10-23 | 1998-07-07 | Stac, Inc. | System for backing up files from disk volumes on multiple nodes of a computer network |
US5909677A (en) * | 1996-06-18 | 1999-06-01 | Digital Equipment Corporation | Method for determining the resemblance of documents |
US6052693A (en) * | 1996-07-02 | 2000-04-18 | Harlequin Group Plc | System for assembling large databases through information extracted from text sources |
US5745900A (en) * | 1996-08-09 | 1998-04-28 | Digital Equipment Corporation | Method for indexing duplicate database records using a full-record fingerprint |
US6285999B1 (en) * | 1997-01-10 | 2001-09-04 | The Board Of Trustees Of The Leland Stanford Junior University | Method for node ranking in a linked database |
US6088707A (en) * | 1997-10-06 | 2000-07-11 | International Business Machines Corporation | Computer system and method of displaying update status of linked hypertext documents |
US6134532A (en) * | 1997-11-14 | 2000-10-17 | Aptex Software, Inc. | System and method for optimal adaptive matching of users to most relevant entity and information in real-time |
US6263348B1 (en) * | 1998-07-01 | 2001-07-17 | Serena Software International, Inc. | Method and apparatus for identifying the existence of differences between two files |
US6363377B1 (en) * | 1998-07-30 | 2002-03-26 | Sarnoff Corporation | Search data processor |
US6240409B1 (en) * | 1998-07-31 | 2001-05-29 | The Regents Of The University Of California | Method and apparatus for detecting and summarizing document similarity within large document sets |
US6317722B1 (en) * | 1998-09-18 | 2001-11-13 | Amazon.Com, Inc. | Use of electronic shopping carts to generate personal recommendations |
US6360215B1 (en) * | 1998-11-03 | 2002-03-19 | Inktomi Corporation | Method and apparatus for retrieving documents based on information other than document content |
JP2000187668A (en) * | 1998-12-22 | 2000-07-04 | Hitachi Ltd | Grouping method and overlap excluding method |
US6873982B1 (en) * | 1999-07-16 | 2005-03-29 | International Business Machines Corporation | Ordering of database search results based on user feedback |
US6718363B1 (en) * | 1999-07-30 | 2004-04-06 | Verizon Laboratories, Inc. | Page aggregation for web sites |
US6665661B1 (en) * | 2000-09-29 | 2003-12-16 | Battelle Memorial Institute | System and method for use in text analysis of documents and records |
US6978419B1 (en) * | 2000-11-15 | 2005-12-20 | Justsystem Corporation | Method and apparatus for efficient identification of duplicate and near-duplicate documents and text spans using high-discriminability text fragments |
US6658423B1 (en) | 2001-01-24 | 2003-12-02 | Google, Inc. | Detecting duplicate and near-duplicate files |
US7203343B2 (en) * | 2001-09-21 | 2007-04-10 | Hewlett-Packard Development Company, L.P. | System and method for determining likely identity in a biometric database |
US20040139072A1 (en) * | 2003-01-13 | 2004-07-15 | Broder Andrei Z. | System and method for locating similar records in a database |
US20040210575A1 (en) * | 2003-04-18 | 2004-10-21 | Bean Douglas M. | Systems and methods for eliminating duplicate documents |
US8296304B2 (en) * | 2004-01-26 | 2012-10-23 | International Business Machines Corporation | Method, system, and program for handling redirects in a search engine |
EP1776629A4 (en) * | 2004-07-21 | 2011-05-04 | Equivio Ltd | A method for determining near duplicate data objects |
US7966327B2 (en) * | 2004-11-08 | 2011-06-21 | The Trustees Of Princeton University | Similarity search system with compact data structures |
US20060149820A1 (en) * | 2005-01-04 | 2006-07-06 | International Business Machines Corporation | Detecting spam e-mail using similarity calculations |
US7739314B2 (en) * | 2005-08-15 | 2010-06-15 | Google Inc. | Scalable user clustering based on set similarity |
US7747614B2 (en) * | 2005-10-31 | 2010-06-29 | Yahoo! Inc. | Difference control for generating and displaying a difference result set from the result sets of a plurality of search engines |
US7472121B2 (en) * | 2005-12-15 | 2008-12-30 | International Business Machines Corporation | Document comparison using multiple similarity measures |
-
2006
- 2006-08-04 US US11/499,260 patent/US8015162B2/en active Active
-
2007
- 2007-08-03 CN CN201210158895.2A patent/CN102982053B/en active Active
- 2007-08-03 CA CA2660202A patent/CA2660202C/en not_active Expired - Fee Related
- 2007-08-03 CN CN2007800366340A patent/CN101523343B/en active Active
- 2007-08-03 EP EP20070836544 patent/EP2054797A4/en not_active Withdrawn
- 2007-08-03 WO PCT/US2007/017487 patent/WO2008019133A2/en active Application Filing
-
2011
- 2011-09-02 US US13/225,342 patent/US20120290597A1/en not_active Abandoned
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6119124A (en) * | 1998-03-26 | 2000-09-12 | Digital Equipment Corporation | Method for clustering closely resembling data objects |
US6871200B2 (en) * | 2002-07-11 | 2005-03-22 | Forensic Eye Ltd. | Registration and monitoring system |
Non-Patent Citations (2)
Title |
---|
CHARIKAR M.: "Similarity Estimation Techniques from Rounding Algorithms", 2002, pages 380 - 388, XP008103597 * |
See also references of EP2054797A4 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102033962B (en) * | 2010-12-31 | 2012-05-30 | 中国传媒大学 | File data replication method for quick deduplication |
Also Published As
Publication number | Publication date |
---|---|
CA2660202C (en) | 2013-03-12 |
CA2660202A1 (en) | 2008-02-14 |
CN101523343A (en) | 2009-09-02 |
WO2008019133A2 (en) | 2008-02-14 |
US20120290597A1 (en) | 2012-11-15 |
US20080044016A1 (en) | 2008-02-21 |
CN101523343B (en) | 2012-07-04 |
WO2008019133A9 (en) | 2009-04-30 |
US8015162B2 (en) | 2011-09-06 |
CN102982053A (en) | 2013-03-20 |
EP2054797A4 (en) | 2013-09-04 |
CN102982053B (en) | 2016-08-10 |
EP2054797A2 (en) | 2009-05-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2008019133A3 (en) | Detecting duplicate and near-duplicate files | |
WO2011017658A3 (en) | Document layout system | |
WO2010092423A8 (en) | Music profiling | |
WO2006124952A3 (en) | The information nervous system | |
WO2008115670A3 (en) | System and method for identifying content | |
WO2006096428A3 (en) | Data processing systems and methods | |
WO2008094433A3 (en) | Method and apparatus to store data patterns | |
WO2007120625A3 (en) | Secure and granular index for information retrieval | |
WO2007130675A3 (en) | Methods and systems for reporting regions of interest in content files | |
MX2010000619A (en) | Systems and processes for obtaining and managing electronic signatures for real estate transaction documents. | |
WO2007084836A3 (en) | Match-based employment system and method | |
WO2005086738A3 (en) | Data structure with market capitalization breakdown | |
WO2007100916A3 (en) | Systems, methods, and media for outputting a dataset based upon anomaly detection | |
WO2008044004A3 (en) | Improvements relating to the detection of patterns | |
WO2008068450A3 (en) | Improvements in resisting the spread of unwanted code and data | |
WO2006026733A3 (en) | A method of designing a probe card apparatus with desired compliance characteristics | |
FR2890665B1 (en) | SECURE ARTICLE, IN PARTICULAR A DOCUMENT OF SECURITY AND / OR VALUE. | |
WO2006118896A3 (en) | Method and apparatus for detecting the falsification of metadata | |
WO2006010114A3 (en) | Disambiguating ambiguous characters | |
TW200638335A (en) | Audio metadata verification | |
WO2009036392A3 (en) | Multi-modal relevancy matching | |
WO2006023718A3 (en) | Locating electronic instances of documents based on rendered instances, document fragment digest generation, and digest based document fragment determination | |
GB2484879A (en) | Method and apparatus for security validation of input data | |
WO2009054839A3 (en) | Template based matching | |
GB201312856D0 (en) | Malware Detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200780036634.0 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07836544 Country of ref document: EP Kind code of ref document: A2 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2660202 Country of ref document: CA |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
NENP | Non-entry into the national phase |
Ref country code: RU |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2007836544 Country of ref document: EP |