WO2002013055A3 - Automatic categorization of documents based on textual content - Google Patents
Automatic categorization of documents based on textual content Download PDFInfo
- Publication number
- WO2002013055A3 WO2002013055A3 PCT/US2001/041669 US0141669W WO0213055A3 WO 2002013055 A3 WO2002013055 A3 WO 2002013055A3 US 0141669 W US0141669 W US 0141669W WO 0213055 A3 WO0213055 A3 WO 0213055A3
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- document
- category
- textual content
- documents
- documents based
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99936—Pattern matching access
Abstract
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2001285432A AU2001285432A1 (en) | 2000-08-09 | 2001-08-09 | Automatic categorization of documents based on textual content |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/635,714 US6621930B1 (en) | 2000-08-09 | 2000-08-09 | Automatic categorization of documents based on textual content |
US09/635,714 | 2000-08-09 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2002013055A2 WO2002013055A2 (en) | 2002-02-14 |
WO2002013055A3 true WO2002013055A3 (en) | 2003-09-18 |
Family
ID=24548813
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2001/041669 WO2002013055A2 (en) | 2000-08-09 | 2001-08-09 | Automatic categorization of documents based on textual content |
Country Status (3)
Country | Link |
---|---|
US (1) | US6621930B1 (en) |
AU (1) | AU2001285432A1 (en) |
WO (1) | WO2002013055A2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106294506B (en) * | 2015-06-10 | 2020-04-24 | 华中师范大学 | Domain-adaptive viewpoint data classification method and device |
Families Citing this family (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2404337A1 (en) * | 2000-03-27 | 2001-10-04 | Documentum, Inc. | Method and apparatus for generating metadata for a document |
US20070027672A1 (en) * | 2000-07-31 | 2007-02-01 | Michel Decary | Computer method and apparatus for extracting data from web pages |
US6618717B1 (en) * | 2000-07-31 | 2003-09-09 | Eliyon Technologies Corporation | Computer method and apparatus for determining content owner of a website |
US20020091671A1 (en) * | 2000-11-23 | 2002-07-11 | Andreas Prokoph | Method and system for data retrieval in large collections of data |
CN1240011C (en) * | 2001-03-29 | 2006-02-01 | 国际商业机器公司 | File classifying management system and method for operation system |
US20030130993A1 (en) * | 2001-08-08 | 2003-07-10 | Quiver, Inc. | Document categorization engine |
JP3997774B2 (en) * | 2001-12-11 | 2007-10-24 | ソニー株式会社 | Data processing system, data processing method, information processing apparatus, and computer program |
US7024624B2 (en) | 2002-01-07 | 2006-04-04 | Kenneth James Hintz | Lexicon-based new idea detector |
US7409404B2 (en) * | 2002-07-25 | 2008-08-05 | International Business Machines Corporation | Creating taxonomies and training data for document categorization |
US7743061B2 (en) * | 2002-11-12 | 2010-06-22 | Proximate Technologies, Llc | Document search method with interactively employed distance graphics display |
US20040122660A1 (en) * | 2002-12-20 | 2004-06-24 | International Business Machines Corporation | Creating taxonomies and training data in multiple languages |
US20040162824A1 (en) * | 2003-02-13 | 2004-08-19 | Burns Roland John | Method and apparatus for classifying a document with respect to reference corpus |
US8266215B2 (en) | 2003-02-20 | 2012-09-11 | Sonicwall, Inc. | Using distinguishing properties to classify messages |
US7299261B1 (en) * | 2003-02-20 | 2007-11-20 | Mailfrontier, Inc. A Wholly Owned Subsidiary Of Sonicwall, Inc. | Message classification using a summary |
US7146361B2 (en) | 2003-05-30 | 2006-12-05 | International Business Machines Corporation | System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND) |
US20040243554A1 (en) * | 2003-05-30 | 2004-12-02 | International Business Machines Corporation | System, method and computer program product for performing unstructured information management and automatic text analysis |
US7139752B2 (en) * | 2003-05-30 | 2006-11-21 | International Business Machines Corporation | System, method and computer program product for performing unstructured information management and automatic text analysis, and providing multiple document views derived from different document tokenizations |
US20040243556A1 (en) * | 2003-05-30 | 2004-12-02 | International Business Machines Corporation | System, method and computer program product for performing unstructured information management and automatic text analysis, and including a document common analysis system (CAS) |
US20040243560A1 (en) * | 2003-05-30 | 2004-12-02 | International Business Machines Corporation | System, method and computer program product for performing unstructured information management and automatic text analysis, including an annotation inverted file system facilitating indexing and searching |
WO2004114160A2 (en) * | 2003-06-13 | 2004-12-29 | Equifax, Inc. | Systems and processes for automated criteria and attribute generation, searching, auditing and reporting of data |
US7734627B1 (en) * | 2003-06-17 | 2010-06-08 | Google Inc. | Document similarity detection |
US20090100138A1 (en) * | 2003-07-18 | 2009-04-16 | Harris Scott C | Spam filter |
WO2005010727A2 (en) * | 2003-07-23 | 2005-02-03 | Praedea Solutions, Inc. | Extracting data from semi-structured text documents |
CA2536097A1 (en) * | 2003-08-27 | 2005-03-10 | Equifax, Inc. | Application processing and decision systems and processes |
US11132183B2 (en) | 2003-08-27 | 2021-09-28 | Equifax Inc. | Software development platform for testing and modifying decision algorithms |
US7245765B2 (en) * | 2003-11-11 | 2007-07-17 | Sri International | Method and apparatus for capturing paper-based information on a mobile computing device |
US8693043B2 (en) * | 2003-12-19 | 2014-04-08 | Kofax, Inc. | Automatic document separation |
US7975240B2 (en) * | 2004-01-16 | 2011-07-05 | Microsoft Corporation | Systems and methods for controlling a visible results set |
US7624274B1 (en) * | 2004-02-11 | 2009-11-24 | AOL LLC, a Delaware Limited Company | Decreasing the fragility of duplicate document detecting algorithms |
US7725475B1 (en) | 2004-02-11 | 2010-05-25 | Aol Inc. | Simplifying lexicon creation in hybrid duplicate detection and inductive classifier systems |
US7392262B1 (en) | 2004-02-11 | 2008-06-24 | Aol Llc | Reliability of duplicate document detection algorithms |
US7444380B1 (en) | 2004-07-13 | 2008-10-28 | Marc Diamond | Method and system for dispensing and verification of permissions for delivery of electronic messages |
US7496567B1 (en) | 2004-10-01 | 2009-02-24 | Terril John Steichen | System and method for document categorization |
US10803126B1 (en) * | 2005-01-13 | 2020-10-13 | Robert T. and Virginia T. Jenkins | Method and/or system for sorting digital signal information |
US7266562B2 (en) * | 2005-02-14 | 2007-09-04 | Levine Joel H | System and method for automatically categorizing objects using an empirically based goodness of fit technique |
US7593904B1 (en) * | 2005-06-30 | 2009-09-22 | Hewlett-Packard Development Company, L.P. | Effecting action to address an issue associated with a category based on information that enables ranking of categories |
US8719073B1 (en) | 2005-08-25 | 2014-05-06 | Hewlett-Packard Development Company, L.P. | Producing a measure regarding cases associated with an issue after one or more events have occurred |
US8423908B2 (en) * | 2006-09-08 | 2013-04-16 | Research In Motion Limited | Method for identifying language of text in a handheld electronic device and a handheld electronic device incorporating the same |
US7885466B2 (en) * | 2006-09-19 | 2011-02-08 | Xerox Corporation | Bags of visual context-dependent words for generic visual categorization |
CA2921562C (en) * | 2007-08-07 | 2017-11-21 | Equifax, Inc. | Systems and methods for managing statistical expressions |
US9082080B2 (en) * | 2008-03-05 | 2015-07-14 | Kofax, Inc. | Systems and methods for organizing data sets |
US20100121842A1 (en) * | 2008-11-13 | 2010-05-13 | Dennis Klinkott | Method, apparatus and computer program product for presenting categorized search results |
US20100121790A1 (en) * | 2008-11-13 | 2010-05-13 | Dennis Klinkott | Method, apparatus and computer program product for categorizing web content |
US8392175B2 (en) * | 2010-02-01 | 2013-03-05 | Stratify, Inc. | Phrase-based document clustering with automatic phrase extraction |
US8996350B1 (en) | 2011-11-02 | 2015-03-31 | Dub Software Group, Inc. | System and method for automatic document management |
US9298814B2 (en) | 2013-03-15 | 2016-03-29 | Maritz Holdings Inc. | Systems and methods for classifying electronic documents |
US11928606B2 (en) | 2013-03-15 | 2024-03-12 | TSG Technologies, LLC | Systems and methods for classifying electronic documents |
US9053392B2 (en) * | 2013-08-28 | 2015-06-09 | Adobe Systems Incorporated | Generating a hierarchy of visual pattern classes |
US9881079B2 (en) * | 2014-12-24 | 2018-01-30 | International Business Machines Corporation | Quantification based classifier |
WO2016172288A1 (en) * | 2015-04-21 | 2016-10-27 | Lexisnexis, A Division Of Reed Elsevier Inc. | Systems and methods for generating concepts from a document corpus |
US11783005B2 (en) | 2019-04-26 | 2023-10-10 | Bank Of America Corporation | Classifying and mapping sentences using machine learning |
US11429897B1 (en) | 2019-04-26 | 2022-08-30 | Bank Of America Corporation | Identifying relationships between sentences using machine learning |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000026795A1 (en) * | 1998-10-30 | 2000-05-11 | Justsystem Pittsburgh Research Center, Inc. | Method for content-based filtering of messages by analyzing term characteristics within a message |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0786914B2 (en) * | 1986-11-07 | 1995-09-20 | 株式会社日立製作所 | Change detection method using images |
US5479533A (en) * | 1992-02-28 | 1995-12-26 | Yamatake-Honeywell Co., Ltd. | Pattern recognition apparatus and method using fuzzy logic |
US5581630A (en) * | 1992-12-21 | 1996-12-03 | Texas Instruments Incorporated | Personal identification |
DE69331518T2 (en) * | 1993-02-19 | 2002-09-12 | Ibm | Neural network for comparing features of image patterns |
US5978620A (en) * | 1998-01-08 | 1999-11-02 | Xerox Corporation | Recognizing job separator pages in a document scanning device |
-
2000
- 2000-08-09 US US09/635,714 patent/US6621930B1/en not_active Expired - Fee Related
-
2001
- 2001-08-09 WO PCT/US2001/041669 patent/WO2002013055A2/en active Application Filing
- 2001-08-09 AU AU2001285432A patent/AU2001285432A1/en not_active Abandoned
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000026795A1 (en) * | 1998-10-30 | 2000-05-11 | Justsystem Pittsburgh Research Center, Inc. | Method for content-based filtering of messages by analyzing term characteristics within a message |
Non-Patent Citations (4)
Title |
---|
HOCH R: "USING IR TECHNIQUES FOR TEXT CLASSIFICATION IN DOCUMENT ANALYSIS", SIGIR '94. DUBLIN, JULY 3 - 6, 1994, PROCEEDINGS OF THE ANNUAL INTERNATIONAL ACM-SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, BERLIN, SPRINGER, DE, vol. CONF. 17, 3 July 1994 (1994-07-03), pages 31 - 40, XP000475312 * |
MAAREK Y S ET AL: "FULL TEXT INDEXING BASED ON LEXICAL RELATIONS AN APPLICATION: SOFTWARE LIBRARIES", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL. (SIGIR). CAMBRIDGE, MA., JUNE 25 - 28, 1989, READING, ACM, US, vol. CONF. 12, 25 June 1989 (1989-06-25), pages 198 - 206, XP000239149 * |
ROIGER R ET AL: "Selecting training instances for supervised classification", ITESM, XP010509261 * |
SMADJA F: "RETRIEVING COLLOCATIONS FROM TEXT: XTRACT", COMPUTATIONAL LINGUISTICS, CAMBRIDGE, MA, US, vol. 19, no. 1, 1993, pages 143 - 177, XP000905567 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106294506B (en) * | 2015-06-10 | 2020-04-24 | 华中师范大学 | Domain-adaptive viewpoint data classification method and device |
Also Published As
Publication number | Publication date |
---|---|
WO2002013055A2 (en) | 2002-02-14 |
AU2001285432A1 (en) | 2002-02-18 |
US6621930B1 (en) | 2003-09-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2002013055A3 (en) | Automatic categorization of documents based on textual content | |
WO2002008933A3 (en) | System and method for automated classification of text by time slicing | |
EP1503300A3 (en) | Vision-based document segmentation | |
EP1528486A3 (en) | Classification evaluation system, method, and program | |
WO2004061572A3 (en) | Adaptive classification of network traffic | |
WO2002082321A3 (en) | Method and system for archiving data files | |
EP2293204A3 (en) | Methods and systems for transitioning between thumbnails and documents based upon thumbnail appearance | |
EP0763818A3 (en) | Formant emphasis method and formant emphasis filter device | |
WO2006042265A8 (en) | System and method for facilitating network connectivity based on user characteristics | |
EP1646121A3 (en) | Overcurrent detection method and detection circuit | |
EP1691548A3 (en) | Data slicer, data slicing method, and amplitude evaluation value setting method | |
EP0810535A3 (en) | Document retrieval system | |
WO2005008439A3 (en) | San/storage self-healing/capacity planning system and method | |
EP1469399A3 (en) | Updated data write method using a journaling filesystem | |
EP1416358A3 (en) | Apparatus and method for managing power in computer system | |
WO2004075093A3 (en) | Music feature extraction using wavelet coefficient histograms | |
EP1069739A3 (en) | Removal of a common mode voltage in a differential receiver | |
EP1156587A3 (en) | Method and apparatus for detecting switch closures | |
EP1455299A3 (en) | Device and method for binarizing image | |
EP1599058A3 (en) | Spread communication system and mobile station thereof | |
WO1999035778A3 (en) | Low level content filtering | |
EP0845864A3 (en) | Level converter and semiconductor device | |
EP0898363A3 (en) | Surface acoustic wave device | |
EP0933679A3 (en) | Photographic processing apparatus and method | |
EP0881778A3 (en) | Ternary signal input circuit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: JP |