WO1995002221A1 - Case-based organizing and querying of a database - Google Patents
Case-based organizing and querying of a database Download PDFInfo
- Publication number
- WO1995002221A1 WO1995002221A1 PCT/US1994/007569 US9407569W WO9502221A1 WO 1995002221 A1 WO1995002221 A1 WO 1995002221A1 US 9407569 W US9407569 W US 9407569W WO 9502221 A1 WO9502221 A1 WO 9502221A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- query
- objects
- database
- cluster
- hits
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Definitions
- This invention relates to case-based organizing and querying of a database.
- Prior art methods of retrieving information generally require preparation of a query, in which objects to be searched for are described in some formal manner. This imposes additional effort on the searcher, and generally also requires that the searcher be familiar with the subject matter to be searched, with the organization and indexing of the database, and with a formal query language. Accordingly, it would be advantageous for the searcher to be able to describe the query in a natural and relatively informal or unstructured manner, such as a description in a natural language.
- the response may be organized by quality of match. In another aspect, the response may be organized into clusters of related objects.
- the invention provides a system for case-based organizing and querying of a database.
- the database may comprise a set of objects, such as a set of documents including text.
- the database may be organized by examining each object and associating that object with a set of property values, such as (in the case of text documents) a set of keywords or other indicators of content.
- a document may be associated with those words which appear more frequently in the document than in the database at large, or which appear in early text of the document, or which appear in a title.
- the system may be responsive to a query by associating the query with a similar set of property values and performing case-based matching or other fuzzy associative matching on the objects of the database for objects which are similar.
- the query may be natural-language text and may be associated with keywords or other indicators of its content.
- the system may present matched objects in response to the query, may respond to iterative refinement of the query (in similar manner to iterative case-based methods shown in those co-pending applications which have been incorporated by reference) , and may order matched objects by quality of match.
- the system may also examine the collection of matched objects and organize them for presentation ; for example, the system may group matched objects into clusters of objects which have similar properties, which relate to similar content, or which have similar likelihood to be of relevance to the query or of interest to an operator posing the query.
- the system may respond to the result of organizing matched objects for presentation with suggestions for iterative refinement of the query.
- the system may therefore be capable of producing improved recall and precision over prior art techniques.
- Figure 1 shows a block diagram of a database explorer and filter system.
- Figure 2 shows a data flow diagram of a method of filtering documents.
- Figure 3 shows a data flow diagram of a method of processing queries.
- Figure 4 shows a data flow diagram of a method of processing hit tables.
- Figure 5 shows a process flow diagram of a method of clustering hit tables.
- Figure 6 shows an example explorer user interface screen as viewed by an operator.
- Figure 7 shows a second example explorer user interface screen, as viewed by an operator, in which clusters are displayed.
- Figure 8 shows an example explorer user interface screen, as viewed by an operator, in which settings may be set by the operator.
- Appendix A shows a table of parts of speech and a set of lexical rules for the English language, which may be used for the tag-and-segment-text process or the tag-and-segment-text process in a preferred embodiment.
- Appendix B shows an output of a test run of an example filter when applied to a portion of an example multimedia encyclopedia used as a database, available as "Microsoft Encarta” from Microsoft Corporation of Redmond, Washington.
- the invention may operate in conjunction with a computing system, including a processor and a memory, generally configured as is well known in the art; the memory may include primary memory for stored programs and for data and secondary memory for extensive storage of large numbers of objects.
- the memory may comprise a sizable database of objects, as is well known in the art of databases, and such objects may comprise various types of computing and data-storage structures.
- the database may be a relational database, an unstructured collection of objects, or some other database format.
- Such other types of objects may include source code, object code, binary values, numeric values, text or other symbolic values, representations of sound and/or picture signals or other signals, multimedia, data structures for rule-based or case-based systems, artificial neural networks, linked data structures such as linked lists, mathematical structures such as equations, polynomials, matrices or tensors, and other data types known in at least one of the many fields of computing.
- Figure 1 shows a block diagram of a database explorer and filter system.
- a system 101 for case-based organizing and querying of a database 102 may comprise a filter 103, for organizing the database 102 so as to be responsive to a query 104, an explorer 105, for selecting a set of objects 106 in the database 102 which are responsive to that query 104, and an object file system 107, for accessing the database 102.
- the database 102 may generally be of a type which is known in the art, such as a collection of text objects supported by Cairo Milestone 4 running under the Windows NT system version 297, available from Microsoft Corporation of Redmond, Washington, and may be accessed in conjunction with the object file system 107 of that product.
- the filter 103 may operate at an initialization time, such as when the processor is first started or before the first query 104 is presented to the explorer 105.
- the filter 103 may also operate in an incremental mode, e.g., by updating its organization of the database 102 periodically, such as upon the passage of a fixed period of time, when a fixed number of objects 106 are changed or added to the database 102, when the operation of the explorer 105 is degraded below some predetermined level, when triggered by an operator 108 in conjunction with a user interface 109 (e.g., when a query is presented, by a specific command to do so, or as a side effect of another operation) , or otherwise as determined by the database 102 or an external manager.
- the filter 103 may examine each of the objects 106 (or some predetermined subset of objects 106) in the database 102 and associate each object 106 it examines (or some predetermined subset of those objects 106) with a set of properties.
- those properties may be keywords or phrases which are found in the object 106, but may also comprise other property values, such as the language the text is written in, the length of the text, or the reading level or other measure associated with the text (including measures of complexity, detail, redundancy, writing style, "fog", or other known measures of text, e.g., known in the art of grammar checking and correction) .
- the objects 106 with their properties may be treated as a set of cases to be matched by a CBR engine 110 (operating with the object file system 107) with a test case generated from the query 104.
- Each case may generally comprise an object 106 plus the properties that object 106 was associated with, e.g., key words and phrases found in that object.
- these properties may include a lexicon of words and noun phrases found in the object 106, including at least some of these words labelled as a set of "header words" or "relevant words” .
- the explorer 105 may generally operate at a question time, such as when one or more queries 104 is presented to the explorer 105.
- the ej ⁇ lorer 105 may be invoked by the operator 108 in conjunction with the user interface 109, which user interface 109 may allow the operator to trigger operation of the explorer 105 and to present one or more queries 104 to the explorer 105.
- the user interface 109 may be one such as the user interface presented by the Windows NT system referred to herein.
- the operator 108 may be a human being, but those of ordinary skill with recognize, after perusal of the application, that the operator 108 may comprise a network connection, an external management program, or an Al program.
- the explorer 105 may generate a response 111 including a set of matching cases (i.e., objects 106 with their properties) , which may be presented to the operator 108 by means of the user interface 109, such as the user interface presented by the Windows NT system referred to herein. I augmented by features described herein.
- the filter 103 and the explorer 105 may operate in conjunction with the object file system 107 (and in particular the CBR engine 110 thereof) , which may respond to a set of properties formed into a vector query 112 directed at the database 102, and may return a hit table 113 of those objects 106 in the database 102 which have the indicated properties.
- the CBR engine 110 may use case-based matching and other techniques such as those shown in those co- pending applications which have been incorporated by reference.
- Figure 2 shows a data flow diagram of a method of filtering documents.
- a document 201 (an object 106 which comprises text, such as a pure text document or a text document formatted for a word-processing program) may be input to the filter 103 for examination.
- the filter 103 may process the text by a tag-and-segment-text process 202, which may lexically analyze the document 201, e.g., by means of a known lexical analysis technique.
- the tag-and-segment-text process 202 may extract a set of single terms 203 and generate a set of header words 204 found in the document 201.
- the header words 204 may comprise those words which occur in an initial part of the object 106, or in a title, subject line, topical paragraph, or abstract.
- the header words 204 may comprise the first three things mentioned in the document 201.
- the tag-and-segment-text process 202 may also tag words in the document 201 with their parts of speech and parse them into a set of sentences 205.
- the sentences 205 may be input to an extract-noun-phrases process 206, which may further lexically analyze the document 201, e.g., by means of a known lexical analysis technique, to extract a set of noun phrases 207 and generate a lexicon 208 thereof.
- the tag-and-segment-text process 202 may use a grammar of the English language, but other natural languages, and even formal specification languages such as programming languages, would also be suitable.
- the tag-and-segment-text process 202 may also recognize and generate a set of proper nouns 209.
- the set of proper nouns 209 may be determined by known rules, e.g., that proper nouns generally comprise strings of words each starting with an upper-case letter, or by reference to a dictionary of known proper names.
- the set of proper nouns 209 may be input, along with at least some of the single terms 203, to a determine-relevant-words process 210, which may extract a set of relevant words 211.
- the set of relevant words 211 may be determined with reference to the frequency of those words in the object 106 (with respect to the entire text found in the object 106) and with reference to the frequency of those words in the database 102, with respect to the text corpus of the database 102.
- the ratio for each word (frequency in the object 106) divided by (frequency in the database 102) may be computed, and the set of relevant words 211 may comprise those words whose relative frequency exceeds a threshold, e.g., a predetermined threshold such as a 1:1 ratio.
- the filter 103 is described herein for a specific set of properties of the text which may be extracted. However, it would be clear to those of ordinary skill, after perusal of this application, that extraction of other properties could be readily accomplished, and is within the scope and spirit of the invention. Such other properties could include the language the text is written in (or for English-language text, the number of foreign words used) , the length of the text, or the reading level or other measure associated with the text (including measures of complexity, detail, redundancy, writing style, "fog", or other known measures of text, e.g., known in the art of grammar checking and correction) .
- the extract-noun-phrases process 206 and the determine-relevant-words process 211 may proceed in parallel, e.g., by execution on multiple processors or by multiple tasks or threads in a multitasking or multithreaded environment.
- the filter 103 may mark each object 106 with the properties it determines (or alternatively may create a separate object 106 relating each documentary object 106 to its properties) , so that the object 106 and its properties may be treated as a case in a case-base.
- the set of cases may be matched to a test case by a CBR engine 110, using techniques like those described in copending applications (1) Serial No. 07/ 664,561, filed March 4, 1991 in the name of inventors Bradley P. Allen and S. Daniel Lee, titled “CASE-BASED REASONING SYSTEM”; (2) Serial No. 07/ 869,935, filed April 15, 1992 in the name of inventor Bradley P.
- Figure 3 shows a data flow diagram of a method of processing queries.
- the query 104 entered in free text by the operator 108, may be input to the explorer 105 for examination.
- the explorer 105 may process the text by a tag- and-segment-text process 301, which may lexically analyze the document 201, e.g., by means of a known lexical analysis technique, similarly to the tag-and-segment-text process 202 of the filter 103.
- the tag-and-segment-text process 301 may extract a set of single terms 302, similarly to the tag-and-segment-text process 202 and the set of single terms 203 of the filter 103.
- the tag-and-segment-text process 301 may also tag words in the document 201 with their parts of speech and parse them into a set of sentences 303, similarly to the tag-and-segment- text process 202 and the sentences 205 of the filter 103.
- the sentences 303 may be input to an extract-noun-phrases process 304, which may further lexically analyze the document 201, e.g., by means of a known lexical analysis technique, to extract a set of noun phrases 305, similarly to the extract-noun-phrases process 206 and the noun phrases 207 of the filter 103.
- the tag-and-segment-text process 301 may also recognize and generate a set of proper nouns 306, similarly to the tag-and- segment-text process 202 and the proper nouns 209 of the filter 103.
- the noun phrases 305, single terms 302, and proper nouns 306, a rank threshold 307, and a set of selected subtopics 308 (subtopics selected by the operator 108 to refine the query 104) may be input to a generate-query process 309, which may generate a set of query terms 310 and a query parse tree 311.
- the tag-and-segment-text process 301, the extract-noun-phrases process 304, and the generate-query process 309 may proceed as asynchronously as possible, e.g., by execution on multiple processors or by multiple tasks or threads in a multitasking or multithreaded environment.
- the query terms 310 and the query parse tree 311 may be input to the CBR engine 110 in the object file system 107, and may perform case-based matching or other fuzzy associative matching on the objects 106 in the database 102 for objects which are similar to the query 104, as described by the query terms 310 and the query parse tree 311, and which have a match quality at least as good as the rank threshold 307. (As noted with regard to the user interface 109, the selected subtopics 308 are added to the text of the query 104.)
- the object file system 107 may generate the hit table 113 of matched objects 106.
- Figure 4 shows a data flow diagram of a method of processing hit tables.
- the hit table 113 and the relevant words 211 may be input to a cluster hits process 401, which (if clustering is enabled) collects the matched objects 106 into clusters, and may output a set of clusters 402 in response.
- Each cluster 402 may comprise a set of objects 106, selected for collective closeness with regard to all objects 106 in the hit table 113.
- the cluster hits process 401 is further described with regard to figure 5.
- the hit table 113, the relevant words 211, and the lexicon 208 may be input to a first generate-topics (from relevant words) process 403, while the lexicon 208 and the query terms 310 may be input to a second generate-topics (from query words) process 403. Together the two generate-topics processes 403 may output a set of topics 404 and subtopics 405.
- the generate-topics process 403 may examine the lexicon 208 of noun phrases 207 with a rule- based inference engine (not shown) .
- a rule- based inference engine is the ART-IM system, available from Inference Corporation in El Segundo, California.
- the inference engine may detect particular patterns in the noun phrases 207 which indicate semantic relations between the words in those noun phrases 207. For example, the noun phrase
- the generate-topics process 403 may thus construct a phrase lattice., showing each noun phrase 207 as being inclusive of (above) , included in (below) , or incommensurate with (neither above nor below) each other noun phrase 207.
- the generate-topics (from relevant words) process 403 may restrict the phrase lattice to those noun phrases 207 which include relevant words 211 of the objects 106 in the hit table 113.
- the second generate-topics (from query words) process 403 may operate in similar manner as the first generate-topics (from relevant words) process 403 and may restrict the phrase lattice to those noun phrases 305 which include relevant words 211 of the query.
- Figure 5 shows a process flow diagram of a method of clustering hit tables.
- the cluster hits process 401 may operate by means of a genetic algorithm, in which an initial configuration and a set of genetic operators are specified, and the set of solutions is formed by simulation of random "evolution" of a population of possible solutions, using the method of steady-state reproduction without duplicates.
- Genetic algorithms are well known in the art, and are described in further detail in "Foundations of Genetic Algorithms", ed. Gregory J.E. Rawlins (Morgan Kaufmann Publishers: San Mateo, California 1991). It would be clear to those of ordinary skill in the art that the parameters of the genetic algorithm, and even the type of genetic algorithm performed could be varied substantially and still remain within the scope and spirit of the invention.
- a number of clusters 402 is selected.
- the number of clusters 402 may vary from a known minimum to a known maximum, settable by the operator 108.
- the genetic algorithm of the following steps is repeated for each permissible number of clusters 402, and the best solution adopted.
- an initiate-clusters step 502 a set of possible clusters 402 is selected; this is a single "gene”. A random population of genes is selected-. Each cluster 402 is represented by the centroid of the objects 106 which would comprise that cluster 402. Thus, when a solution of clusters 402 is selected, each object 106 is assigned to the cluster 402 which it best matches.
- the genetic algorithm of the following steps is repeated for a known period of time, settable by the operator 108.
- the best available solution i.e., the gene with the best quality
- Each object 106 is assigned to the cluster 402 to which it is the closest.
- all genes in the population are evaluated for quality, and the gene with the least quality is removed.
- the statistical measure "category utility" is computed; i.e., the utility of each cluster 402 in distinguishing between an object 106 in one cluster 402 from an object in another cluster 402.
- matching for clusters 402 is performed using relevant words 211, it would be clear to those of ordinary skill, after perusal of this application, that other properties of the objects 106 could be used as well, such as the read/write date of the object 106, and that doing so would be within the scope and spirit of the invention.
- a genetic-operator step 504 one of three operators is selected and employed to create a new gene: (1) Mutation-1. The new gene is randomly created. (2) Mutation-2. An existing gene is copied, except that one of its clusters 402 is mutated by replacing it with a randomly created cluster 402. (3) Crossover. Two genes have their n-tuples of clusters 402 paired off and one cluster 402 is selected at random from each pair to form the new gene. Alternatively, a new gene is created by selecting N clusters 402 at random from the 2N clusters 402 specified by the two old genes. USER INTERFACE
- Figure 6 shows an example ej ⁇ lorer user interface screen as viewed by an operator. While the invention is described primarily with regard to a specific user interface, it would be clear to those of ordinary skill in the art that another user interface of equal or greater flexibility would be suitable, and would be within the scope and spirit of the invention.
- the user interface 109 may be combined with a user interface for a generalized file system exploration program, such as in the Windows NT system referred to herein.
- the user interface 109 may comprise a query window 601 in which the operator may enter the query 104 in free text, and a results window 602 in which the system 101 may display a set of matched objects 106 found in response to the query 104.
- the operator 108 may enter the query 104 in the query window 601.
- the query 104 is input to the explorer 105, which processes it as described herein, and generates the vector query 112.
- the vector query 112 is input to the object file system 107, and generates the hit table 113 of matched objects 106.
- the hit table 113 is input to the user interface 109, which displays the matched objects 106.
- the operator may select a displayed matched object 106 to view its contents.
- the user interface 109, the explorer 105, and the object file system 107 may operate as asynchronously as possible.
- the object file system 107 may search the database 102 for matched objects 106 independently, once it has sufficient information from the ej ⁇ lorer 105; the user interface 109 may display matched objects 106 from the hit table 113 as they are generated by the object file system 107.
- the operator 108 has entered the query 104 "who invented the light bulb?" in a content field 603 of the query window 601, and the system 101 has responded with a set of matched objects 106 in the results window 602.
- the matched objects are displayed one per line, in columns labelled "rank”, “query”, “header”, and "relevant words”.
- a rank field 604 displays the quality of match for each displayed matched object 106.
- the system 101 may order the matched objects 106 by rank. This may occur as the normal procedure, or at the request of the operator 108, e.g., by means of a "sort" command 605 in the query window 601.
- the rank field 604 may also be color-coded by value.
- a query field 606 displays the relevant words of the query which are most related to the displayed matched object 106.
- a header field 607 displays the header words 204 of the displayed matched object 106.
- a relevant words field 608 displays the most common relevant words 211 of the displayed matched object 106.
- a topics field 609 of the query window 601 displays suggested topics for refinement of the query 104 which the system 101 has identified.
- the operator 108 may select a topic in the topics field 609, and the system will display a subtopics window 610 (overlaid on the query window 601 and the results window 602) showing the subtopics which the system 101 has identified for that topic.
- the operator 108 may refine the query 104 in response to the matched objects 106, and the ej ⁇ lorer 105 may attempt to match objects 106 using the query 104 as refined. This may occur at the request of the operator 108, e.g., by means of a "refresh" command 611 in the query window 601.
- the operator 108 may select one or more subtopics 405 to refine the query 104. To do so, the operator 108 may identify (e.g., by pointing to with a pointing device such as a mouse) one or more subtopics 405 in the subtopics window 610. The selected subtopics 308 may be "added" to the query 104 and the explorer 105 may attempt to match objects 106 using the query 104 as refined.
- the operator 108 may also select one or more relevant words 211 to refine the query 104. To do so, the operator 108 may identify (e.g. by pointing to) the relevant words field 608 for a particular matched object 106 and "drag" that relevant words field 608 to the content field 603; the system 101 will display a relevance feedback window 612 (overlaid on the query window 601 and the results window 602) showing the relevant words 211 for that matched object 106.
- the operator 108 may select one or more relevant words 211 to refine the query 104. To do so, the operator 108 may identify (e.g., by pointing to) one or more relevant words 211 in the relevance feedback window 612. The selected relevant words 211 may be "added" to the query 104 and the ej ⁇ lorer 105 may attempt to match objects 106 using the query 104 as refined.
- the query 104 as refined (like the original query 104) is presented as a vector query 104 to the CBR engine 110.
- selected subtopics 308 or relevant words 211 are “added” to the query, they are properties which the CBR engine 110 must match to objects 106, as described for methods of iterative refinement of case-based matching shown in those co-pending applications which have been incorporated by reference. (Thus, the CBR engine 110 must match to objects 106 as if the operator 108 had answered a query refining question in a case-based system.)
- a query 104 as refined may be further refined, allowing the operator to iteratively refine the query 104 until desired objects 106 are located.
- Figure 7 shows a second example explorer user interface screen, as viewed by an operator, in which clusters are displayed.
- the operator 108 may select a "cluster" command (figure 6) or "uncluster” (figure 7) command 701 in the query window 601, and the system 101 will display a set of clusters 402, each a set of related matched objects 106, in place of displaying matched objects 106 themselves.
- the operator has selected the "cluster" command 701 for the same query 104 as in the example of figure 6.
- an expand field 702 displays whether the cluster 402 can be expanded (shown by a "+” symbol) to display individual matched objects 106, or can be collapsed (shown by a "-" symbol) to display a single identifier for the cluster 402.
- the rank field 703 displays the best rank for all matched objects 106 in the cluster 402.
- the system 101 may order the clusters 402 by this rank field 703. This may occur as the normal procedure, or at the request of the operator 108, e.g., by means of the "sort" command 605 in the query window 601.
- this rank field 703 may also be color-coded by value.
- the relevant words field 608 displays the most common relevant words 211 in the cluster 402.
- the operator 108 may also choose to cluster all objects 106 in a specific set, e.g., a specific directory in the object file system 107.
- the operator 108 may restrict the scope of the explorer 105 to a specific directory and issue the "cluster" command 701; the system 101 will display the objects 106 in that directory in clusters 402.
- Figure 8 shows an example explorer user interface screen, as viewed by an operator, in which settings may be set by the operator.
- the operator 108 may select settings appropriate for the system 101.
- the operator 108 may select a "properties" command 801 in the query window 601 (figure 6) , and the system 101 will display a properties window 802 with a set of property values 803 which may be set.
- a "minimum rank of returned hits" property 804 is a threshold value for including matched objects 106; matched objects 106 whose rank falls below this value are not displayed in the results window 602 and are not used in further processing.
- the rank of a matched object 106 is calculated by the CBR engine 110. In the example, this value is set to 80.
- a "maximum clustered hits" property 805 is a maximum number of matched objects 106 which are included in a single cluster 402. Those matched objects 106 not included in clusters 402 are placed in a special cluster 402 labelled "Other". In the example, this value is set to 400.
- a "clustering time” property 806 is the elapsed real time devoted to clustering. In the example, this value is set to 2500 milliseconds.
- a "minimum number of clusters" property 807 is the lower bound for the number of clusters 402 generated. In the example, this value is set to 2 clusters.
- a "maximum number of clusters" property 808 is the upper bound for the number of clusters 402 generated. In the example, this value is set to 8 clusters. The system 101 attempts to generate a number of clusters 402 between the minimum and maximum number selected.
- a "maximum topics” property 809 is the maximum number of topics displayed in the topics field 609 in the query window 601. In the example, this value is set to 7 topics.
- a "maximum subtopics" property 810 is the maximum number of subtopics displayed in the subtopics window 610. In the example, this value is set to 250 subtopics.
- a "do/don't cluster” property 811 sets whether or not clustering is performed. In the example, this value is set to YES.
- a "do/don't generate query topics" property 812 sets whether or not topics and subtopics are generated in response to query terms 310. In the example, this value is set to YES.
- a "do/don't generate salient topics” property 813 sets whether or not topics and subtopics are generated in response to relevant words 211. In the example, this value is set to YES.
- a "boolean/vector query” property 814 sets whether the object file system 107 performs a boolean query or a vector query in response to the ej ⁇ lorer 105. In the example, this value is set to vector queries.
- a boolean query would have boolean connectors (e.g., "AND”, "OR”) coupling the query terms 310, so that the query 104 would not be as flexibly matched. Search using boolean queries is well known in the art.
- Appendix A shows a table of parts of speech and a set of lexical rules for the English language, which may be used for the tag-and-segment-text process or the tag-and-segment-text process in a preferred embodiment.
- Appendix B shows an output of a test run of an example filter when applied to a portion of an example multimedia encyclopedia used as a database, available as "Microsoft Encarta” from Microsoft Corporation of Redmond, Washington.
- LDOCE is basically a dictionary of British English, so we found a lot of words we wasn't familiar with, as well as a lot of double entries to account for American spellings (e.g. color and colour) .
- the lexical ⁇ categories we were able to extract out of LDOCE and WordNet were limited to nouns, verbs, adjectives, adverbs, conjunctions, determiners, predeterminers, prepositions, pronouns, and phrases. Since we don't use a phrasal lexicon, we threw the phrases away.
- noun-phrase -> determiner noun-phrase (e.g. "The person)
- noun-phrase -> quantifier noun-phrase e.g. "Three people”.
- noun-phrase -> adverb noun-phrase e.g. "maddeningly fluffy clouds"
- noun-phrase -> noun-phrase relative-clause (e.g. "The car that hit me)
- noun-phrase -> noun-phrase [, noun-phrase]* [,] or noun-phrase e.g. "England, France, or Germany
- the Find Taxonomic Relations process uses ART-IM rules to capture patterns of words which indicate taxonomic relationships between the words. For example, it detects patterns like:
- NP such as (NP.) * ⁇ (and ⁇ or) ) NP
- NP ⁇ , ⁇ including (NP,) * ⁇ (and ⁇ or) ⁇ NP
- Clustering file afl. txt Non-empty clusters 5 Clusters : 5 I Hits Vals Seed, Value: Count
- Marijuana Mixture, Leave, drugs, alcohols, syndromes, psycho Passes: 334, best pass.- 158, best score: 0.307, worst score: 0.132 Cluster 0, has 15 hits: '(OTHER), bloods, vitaminS, tissues, poisonS, suga
- Thermometer, Instrument, Measure Wine, Beverage, Juice Wood, Substance, Trunk Cluster 1 has 22 hits: 'alcohol I, acid:7, ethyl:7, liquid: , examples, chemi Acetaldehyde, Volatile, Liquid Antifreeze, Chemica1, Substance Azeotropic Mixture, Solution, Ratio Butyl Alcohol, Chemical, Formula Cannizzaro, Stanislao, Italian Disease, Medicine, Health Ester, Chemistry, Compound Ether, Chemistry, Ethyl Fermentation, Chemical, Change Formaldehyde, Compound, Carbon Glycerin, Glycerol, C3h8o3 Gum, Substance, Plant Iodine, Element, Symbol Lipid, Group, Substance Salicylic Acid, White, Solid Solution, Chemistry, Mixture Tannin, Acid, Name Turpentine, Name, Semifluid Vinegar, Condiment, Preservative Wax, Name, Ester Whiskey, Liquor, Mash Zym
- Vodka, Beverage, Known Cluster 3 has 6 hits: 'fuel:5, alcohols, methanolS, combustions, coals, en
- Rocket, Term, Propulsion Cluster 4 has 4 hits: 'drugS, alcohols, syndromes, psychoactive drugs:2, ma
- Cluster 0 has 9 hits: '(OTHER), plants, united statesS, seeds, gardenings,
- Rhizome Stem, Organ.
- Ray, Radiation, Wavelength Cluster 2 has 3 hits: 'lampS, glassS, neonS, arcS, bulbS, argonS, lights
- Neon Lamp, Glass, Bulb Cluster 3 has 5 hits: 'bulb:5, liliaceae:4, herb , lily:3, pistilS, heights.
- Tuberose, Herb, Polianth Cluster 4 has 6 hits: 'temperature:4, atmospheres, points, humidityS, bulb
- Cluster 0 has 4 hits: '(OTHER), century:2'
- Velzquez, Diego, Soldier Cluster 2 has 5 hits: 'spanish:4, island:3, spain:2, de:2, Christopher columbu
- Cluster 1 has 5 hits: 'mind:5, philosophe , philosophy:3, matters, universe
- Clustering file israel.txt Non-empty clusters: 4 Clusters: 4 II Hits Vals Seed, Value:Count
- Cluster 0 has 22 hits: '(OTHER), governments, war:4, centuryS, french revolut Achille Lauro, Italian, Cruise Anti-semi ism, Social, Agitation Asia, Continent, Island Assyria, Ashur, Ashshur Bahai, Persian, Glory Buber, Martin, Religious Cabala, Hebrew, tradition Crusade, Expedition, Undertaken Eschatology, Discourse, Last Espionage, Collection, Information Iran, Islamic Republic, Republic Jewish Art, Architect c Jew Jewish Music, Religic o , Music Nationalalism, History, Movement Portuguese Literature, Literature, Portuguese Refugee, Person, Country Romania, Republic, Europe Saudi Arabia, Monarchy, Southwest Asia
- Clustering file marx.txt Non-empty clusters: 6 Clusters: 6 ⁇ Hits Vals Seed, Value:Count
- Marx Brothers, 20th-century, Comedian Cluster 4 has 4 hits: 'capitalists, class:3, appreciation:2, communist:2, firmly
- Marx, Karl, German Cluster 5 has 6 hits: 'social 3, marx:3, labor:2, world war ii:2, german:2, ce
- Clustering file muslim.txt Non-empty clusters: 4 Clusters: 4 if Hits Vals Seed, Value:Count
- Cluster 0 has 41 hits: '(OTHER), arab:7, bc:5, ibn:4, indian:4, india:4, islam Alfonso Viii, King, Castile Arabia, Desert, Peninsula Arabic Literature, Literature, People Archaeology, Greek, Archaio Averros, Arabic, Abu
- Cluster 0 has 50 hits: "(OTHER), church:12, henry:8, king:7, english:6, roman:6
- Tyndall John, Physicist Ultrasonics, Branch, Physic Ventriloquism, Art, Sound Violin, Instrument, Member Viscount Melville Sound, Arm, Arctic Ocean Voiceprint Identification, Method, Person Warner Brothers, Motion, Picture Xylophone, Greek, Xylon Cluster 2, has 8 hits: 'sound:6, long:3, letter:3, sign:2, atlantic ocean:2, mi Animal Behavior The, Behavior, Animal C, English, Romance-language Diacritic Mark, Sign, Mark Island Sound, Body, Salt Letter, Vowel, Engli-
- Cluster 0 has 6 hits: '(OTHER), electron:2, beam:2, tube:2, television- ⁇ ' Baseball, Game, Skill Cathode-ray Tube, El*- : , Tube
- Warfare,. Use, Force Cluster 1 has 11 hits: 'strike:10, united states:3, presidents, injunctions,
- Cluster 0 has 2 hits: '(OTHER), states'
Abstract
A system for case-based organizing and querying of a database (102). The database (102) may comprise a set of objects (106), such as text documents. The database (102) may be organized by examining each object (106) and associating that object (106) with a set of property values, such as keywords. A document may be associated with those words which appear more frequently in the document than in the database (102) at large, or which appear in the early text of the document, or which appear in the title. The system may be responsive to a query (104) by associating the query with a similar set of property values and performing case-based matching on the objects (106) of the database (102) for similar objects (106). The query (104) may be natural-language text and may be associated with keywords. The system may present matched objects in response to the query (104), may respond to iterative refinement of the query and may order matched objects by quality of match. The system may also respond to the result of organizing matched objects for presentation with suggestions for iterative refinement of the query (104).
Description
CASE-BASEDORGANIZINGANDQUERYINGOFADATABASE
1. Field of the Invention
This invention relates to case-based organizing and querying of a database.
2. Description of Related Art
As storage capability grows for computing devices, many databases have become larger, and large databases have become more common. One problem which has become apparent in the art is the difficulty of retrieving information from large databases when the location of that desired information is not already known. For example, a search for information in a large library may be hampered by the size of the library, because of the large number of items which must be examined. This can be exacerbated if the information searched for is not well-described by the searcher, if the searcher is unfamiliar with that subject matter, or if the information searched for is not well indexed.
Large databases of objects may sometimes be generated without the original intent to organize them into a database. For example, newspaper articles may generally be written without the consideration that they may be collected into a single database for later search. When they eventually are collected into a database, the effort required to organize those objects into a database for information retrieval can be formidable. It
would be advantageous to provide a system in which a large amount of information may be collected into a database without having to expend a comparable amount of effort on organization and indexing, e.g., where such organization and indexing can be done by an automated process.
Prior art methods of retrieving information generally require preparation of a query, in which objects to be searched for are described in some formal manner. This imposes additional effort on the searcher, and generally also requires that the searcher be familiar with the subject matter to be searched, with the organization and indexing of the database, and with a formal query language. Accordingly, it would be advantageous for the searcher to be able to describe the query in a natural and relatively informal or unstructured manner, such as a description in a natural language.
Work with case-based systems has shown that incremental refinement of problem descriptions can be valuable in improving a automated system's recall (ability to retrieve objects which are related to the query) and precision (ability to rule out objects which are not related to the query) . It would be advantageous to be able to incrementally refine the query after a response. But when the query itself is unstructured, the original response may provide so much information that valuable material is lost in the size of the response. Accordingly, it would be advantageous to provide suggestions for incremental refinement. In one aspect of the invention, the response may be organized by quality of match.
In another aspect, the response may be organized into clusters of related objects.
SUMMARY OF THE INVENTION
The invention provides a system for case-based organizing and querying of a database. The database may comprise a set of objects, such as a set of documents including text. In a preferred embodiment, the database may be organized by examining each object and associating that object with a set of property values, such as (in the case of text documents) a set of keywords or other indicators of content. For example, a document may be associated with those words which appear more frequently in the document than in the database at large, or which appear in early text of the document, or which appear in a title. The system may be responsive to a query by associating the query with a similar set of property values and performing case-based matching or other fuzzy associative matching on the objects of the database for objects which are similar. In a preferred embodiment, the query may be natural-language text and may be associated with keywords or other indicators of its content.
In a preferred embodiment, the system may present matched objects in response to the query, may respond to iterative refinement of the query (in similar manner to iterative case-based methods shown in those co-pending applications which have been incorporated by reference) , and may order matched objects by quality of match. The system may also examine the
collection of matched objects and organize them for presentation; for example, the system may group matched objects into clusters of objects which have similar properties, which relate to similar content, or which have similar likelihood to be of relevance to the query or of interest to an operator posing the query. The system may respond to the result of organizing matched objects for presentation with suggestions for iterative refinement of the query.
The system may therefore be capable of producing improved recall and precision over prior art techniques.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 shows a block diagram of a database explorer and filter system.
Figure 2 shows a data flow diagram of a method of filtering documents.
Figure 3 shows a data flow diagram of a method of processing queries.
Figure 4 shows a data flow diagram of a method of processing hit tables.
Figure 5 shows a process flow diagram of a method of clustering hit tables.
Figure 6 shows an example explorer user interface screen as viewed by an operator.
Figure 7 shows a second example explorer user interface screen, as viewed by an operator, in which clusters are displayed.
Figure 8 shows an example explorer user interface screen, as viewed by an operator, in which settings may be set by the operator.
Appendix A shows a table of parts of speech and a set of lexical rules for the English language, which may be used for the tag-and-segment-text process or the tag-and-segment-text process in a preferred embodiment.
Appendix B shows an output of a test run of an example filter when applied to a portion of an example multimedia encyclopedia used as a database, available as "Microsoft Encarta" from Microsoft Corporation of Redmond, Washington.
DESCRIPTION OF THE PREFERRED EMBODIMENT
An embodiment of this invention may be used together with inventions which are disclosed in a copending application titled "AUTONOMOUS LEARNING AND REASONING AGENT", application Serial No. 07/ 869,926, filed April 15, 1992 in the name of
Bradley P. Allen, hereby incorporated by reference as if fully set forth herein.
In a preferred embodiment, the invention may operate in conjunction with a computing system, including a processor and a memory, generally configured as is well known in the art; the memory may include primary memory for stored programs and for data and secondary memory for extensive storage of large numbers of objects. Preferably, the memory may comprise a sizable database of objects, as is well known in the art of databases, and such objects may comprise various types of computing and data-storage structures. However, no particular structure is required for the database itself; the database may be a relational database, an unstructured collection of objects, or some other database format.
Although the invention is disclosed herein primarily with respect to textual objects, it would be clear to those of ordinary skill in the art, after perusal of the application, that extension of the concepts disclosed to other types of objects is within the scope and spirit of the invention, and would not requite undue experimentation. Such other types of objects may include source code, object code, binary values, numeric values, text or other symbolic values, representations of sound and/or picture signals or other signals, multimedia, data structures for rule-based or case-based systems, artificial neural networks, linked data structures such as linked lists, mathematical structures such as equations, polynomials, matrices or tensors,
and other data types known in at least one of the many fields of computing. Although when the invention is applied to textual objects, appearance of a text string in an object is considered pertinent, when the invention is applied to other types of objects, other measures of closeness or pertinence, such as numerical closeness, would be workable, and are within the scope and spirit of the invention.
FILTER AND EXPLORER SYSTEM
Figure 1 shows a block diagram of a database explorer and filter system.
In a preferred embodiment, a system 101 for case-based organizing and querying of a database 102 may comprise a filter 103, for organizing the database 102 so as to be responsive to a query 104, an explorer 105, for selecting a set of objects 106 in the database 102 which are responsive to that query 104, and an object file system 107, for accessing the database 102. In a preferred embodiment, the database 102 may generally be of a type which is known in the art, such as a collection of text objects supported by Cairo Milestone 4 running under the Windows NT system version 297, available from Microsoft Corporation of Redmond, Washington, and may be accessed in conjunction with the object file system 107 of that product.
The filter 103 may operate at an initialization time, such as when the processor is first started or before the first
query 104 is presented to the explorer 105. The filter 103 may also operate in an incremental mode, e.g., by updating its organization of the database 102 periodically, such as upon the passage of a fixed period of time, when a fixed number of objects 106 are changed or added to the database 102, when the operation of the explorer 105 is degraded below some predetermined level, when triggered by an operator 108 in conjunction with a user interface 109 (e.g., when a query is presented, by a specific command to do so, or as a side effect of another operation) , or otherwise as determined by the database 102 or an external manager.
The filter 103 may examine each of the objects 106 (or some predetermined subset of objects 106) in the database 102 and associate each object 106 it examines (or some predetermined subset of those objects 106) with a set of properties. For a textual database 102 as primarily described herein, those properties may be keywords or phrases which are found in the object 106, but may also comprise other property values, such as the language the text is written in, the length of the text, or the reading level or other measure associated with the text (including measures of complexity, detail, redundancy, writing style, "fog", or other known measures of text, e.g., known in the art of grammar checking and correction) .
The objects 106 with their properties may be treated as a set of cases to be matched by a CBR engine 110 (operating with the object file system 107) with a test case generated from the
query 104. Each case may generally comprise an object 106 plus the properties that object 106 was associated with, e.g., key words and phrases found in that object. In a preferred embodiment, these properties may include a lexicon of words and noun phrases found in the object 106, including at least some of these words labelled as a set of "header words" or "relevant words" .
The explorer 105 may generally operate at a question time, such as when one or more queries 104 is presented to the explorer 105. In a preferred embodiment, the ejφlorer 105 may be invoked by the operator 108 in conjunction with the user interface 109, which user interface 109 may allow the operator to trigger operation of the explorer 105 and to present one or more queries 104 to the explorer 105. In a preferred embodiment, the user interface 109 may be one such as the user interface presented by the Windows NT system referred to herein. In a preferred embodiment, the operator 108 may be a human being, but those of ordinary skill with recognize, after perusal of the application, that the operator 108 may comprise a network connection, an external management program, or an Al program.
In a preferred embodiment, the explorer 105 may generate a response 111 including a set of matching cases (i.e., objects 106 with their properties) , which may be presented to the operator 108 by means of the user interface 109, such as the user interface presented by the Windows NT system referred to herein. I augmented by features described herein.
The filter 103 and the explorer 105 may operate in conjunction with the object file system 107 (and in particular the CBR engine 110 thereof) , which may respond to a set of properties formed into a vector query 112 directed at the database 102, and may return a hit table 113 of those objects 106 in the database 102 which have the indicated properties. In a preferred embodiment, the CBR engine 110 may use case-based matching and other techniques such as those shown in those co- pending applications which have been incorporated by reference.
FILTERING DOCUMENTS
Figure 2 shows a data flow diagram of a method of filtering documents.
In a preferred embodiment, a document 201 (an object 106 which comprises text, such as a pure text document or a text document formatted for a word-processing program) may be input to the filter 103 for examination. The filter 103 may process the text by a tag-and-segment-text process 202, which may lexically analyze the document 201, e.g., by means of a known lexical analysis technique.
The tag-and-segment-text process 202 may extract a set of single terms 203 and generate a set of header words 204 found in the document 201. The header words 204 may comprise those words which occur in an initial part of the object 106, or in a title, subject line, topical paragraph, or abstract. In a
preferred embodiment, the header words 204 may comprise the first three things mentioned in the document 201.
The tag-and-segment-text process 202 may also tag words in the document 201 with their parts of speech and parse them into a set of sentences 205. The sentences 205 may be input to an extract-noun-phrases process 206, which may further lexically analyze the document 201, e.g., by means of a known lexical analysis technique, to extract a set of noun phrases 207 and generate a lexicon 208 thereof. In a preferred embodiment, the tag-and-segment-text process 202 may use a grammar of the English language, but other natural languages, and even formal specification languages such as programming languages, would also be suitable.
The tag-and-segment-text process 202 may also recognize and generate a set of proper nouns 209. In a preferred embodiment, the set of proper nouns 209 may be determined by known rules, e.g., that proper nouns generally comprise strings of words each starting with an upper-case letter, or by reference to a dictionary of known proper names. The set of proper nouns 209 may be input, along with at least some of the single terms 203, to a determine-relevant-words process 210, which may extract a set of relevant words 211.
The set of relevant words 211 may be determined with reference to the frequency of those words in the object 106 (with respect to the entire text found in the object 106) and with
reference to the frequency of those words in the database 102, with respect to the text corpus of the database 102. In a preferred embodiment, the ratio for each word (frequency in the object 106) divided by (frequency in the database 102) may be computed, and the set of relevant words 211 may comprise those words whose relative frequency exceeds a threshold, e.g., a predetermined threshold such as a 1:1 ratio. However, it would be clear to those of ordinary skill, after perusal of this application, that other measures (e.g., statistical measures) relating to frequency could be used to determine relevant words, such as clustering of relevant words in paragraphs, correlation with other relevant words, or relative frequency of word pairs or n-tuples, and that such other measures are within the scope and spirit of the invention.
The filter 103 is described herein for a specific set of properties of the text which may be extracted. However, it would be clear to those of ordinary skill, after perusal of this application, that extraction of other properties could be readily accomplished, and is within the scope and spirit of the invention. Such other properties could include the language the text is written in (or for English-language text, the number of foreign words used) , the length of the text, or the reading level or other measure associated with the text (including measures of complexity, detail, redundancy, writing style, "fog", or other known measures of text, e.g., known in the art of grammar checking and correction) .
In a preferred embodiment, the extract-noun-phrases process 206 and the determine-relevant-words process 211 may proceed in parallel, e.g., by execution on multiple processors or by multiple tasks or threads in a multitasking or multithreaded environment.
The filter 103 may mark each object 106 with the properties it determines (or alternatively may create a separate object 106 relating each documentary object 106 to its properties) , so that the object 106 and its properties may be treated as a case in a case-base. In a preferred embodiment, the set of cases may be matched to a test case by a CBR engine 110, using techniques like those described in copending applications (1) Serial No. 07/ 664,561, filed March 4, 1991 in the name of inventors Bradley P. Allen and S. Daniel Lee, titled "CASE-BASED REASONING SYSTEM"; (2) Serial No. 07/ 869,935, filed April 15, 1992 in the name of inventor Bradley P. Allen, titled "MACHINE LEARNING WITH A RELATIONAL DATABASE"; and (3) Serial No. 07/ 869,926, filed April 15, 1992 in the name of Bradley P. Allen, titled "AUTONOMOUS LEARNING AND REASONING AGENT"; each of which is hereby incorporated by reference as if fully set forth herein, or other case-based reasoning techniques which may be known in the art.
PROCESSING QUERIES
Figure 3 shows a data flow diagram of a method of processing queries.
In a preferred embodiment, the query 104, entered in free text by the operator 108, may be input to the explorer 105 for examination. The explorer 105 may process the text by a tag- and-segment-text process 301, which may lexically analyze the document 201, e.g., by means of a known lexical analysis technique, similarly to the tag-and-segment-text process 202 of the filter 103.
The tag-and-segment-text process 301 may extract a set of single terms 302, similarly to the tag-and-segment-text process 202 and the set of single terms 203 of the filter 103.
The tag-and-segment-text process 301 may also tag words in the document 201 with their parts of speech and parse them into a set of sentences 303, similarly to the tag-and-segment- text process 202 and the sentences 205 of the filter 103. The sentences 303 may be input to an extract-noun-phrases process 304, which may further lexically analyze the document 201, e.g., by means of a known lexical analysis technique, to extract a set of noun phrases 305, similarly to the extract-noun-phrases process 206 and the noun phrases 207 of the filter 103.
The tag-and-segment-text process 301 may also recognize and generate a set of proper nouns 306, similarly to the tag-and- segment-text process 202 and the proper nouns 209 of the filter 103.
The noun phrases 305, single terms 302, and proper nouns 306, a rank threshold 307, and a set of selected subtopics 308 (subtopics selected by the operator 108 to refine the query 104) may be input to a generate-query process 309, which may generate a set of query terms 310 and a query parse tree 311.
In a preferred embodiment, the tag-and-segment-text process 301, the extract-noun-phrases process 304, and the generate-query process 309 may proceed as asynchronously as possible, e.g., by execution on multiple processors or by multiple tasks or threads in a multitasking or multithreaded environment.
The query terms 310 and the query parse tree 311 may be input to the CBR engine 110 in the object file system 107, and may perform case-based matching or other fuzzy associative matching on the objects 106 in the database 102 for objects which are similar to the query 104, as described by the query terms 310 and the query parse tree 311, and which have a match quality at least as good as the rank threshold 307. (As noted with regard to the user interface 109, the selected subtopics 308 are added to the text of the query 104.) The object file system 107 may generate the hit table 113 of matched objects 106.
PROCESSING HIT TABLES
Figure 4 shows a data flow diagram of a method of processing hit tables.
The hit table 113 and the relevant words 211 may be input to a cluster hits process 401, which (if clustering is enabled) collects the matched objects 106 into clusters, and may output a set of clusters 402 in response. Each cluster 402 may comprise a set of objects 106, selected for collective closeness with regard to all objects 106 in the hit table 113. The cluster hits process 401 is further described with regard to figure 5.
The hit table 113, the relevant words 211, and the lexicon 208 may be input to a first generate-topics (from relevant words) process 403, while the lexicon 208 and the query terms 310 may be input to a second generate-topics (from query words) process 403. Together the two generate-topics processes 403 may output a set of topics 404 and subtopics 405.
In a preferred embodiment, the generate-topics process 403 may examine the lexicon 208 of noun phrases 207 with a rule- based inference engine (not shown) . (One such inference engine is the ART-IM system, available from Inference Corporation in El Segundo, California.) The inference engine may detect particular patterns in the noun phrases 207 which indicate semantic relations between the words in those noun phrases 207. For example, the noun phrase
"kangaroos, wallabies, and other marsupials"
would be detected and would generate the relations
kangaroo IS-A marsupial wallaby IS-A marsupial
The generate-topics process 403 may thus construct a phrase lattice., showing each noun phrase 207 as being inclusive of (above) , included in (below) , or incommensurate with (neither above nor below) each other noun phrase 207.
The generate-topics (from relevant words) process 403 may restrict the phrase lattice to those noun phrases 207 which include relevant words 211 of the objects 106 in the hit table 113. In a preferred embodiment, the second generate-topics (from query words) process 403 may operate in similar manner as the first generate-topics (from relevant words) process 403 and may restrict the phrase lattice to those noun phrases 305 which include relevant words 211 of the query.
Figure 5 shows a process flow diagram of a method of clustering hit tables.
The cluster hits process 401 may operate by means of a genetic algorithm, in which an initial configuration and a set of genetic operators are specified, and the set of solutions is formed by simulation of random "evolution" of a population of possible solutions, using the method of steady-state reproduction without duplicates. Genetic algorithms are well known in the art, and are described in further detail in "Foundations of Genetic Algorithms", ed. Gregory J.E. Rawlins (Morgan Kaufmann
Publishers: San Mateo, California 1991). It would be clear to those of ordinary skill in the art that the parameters of the genetic algorithm, and even the type of genetic algorithm performed could be varied substantially and still remain within the scope and spirit of the invention.
In a cluster-count step 501, a number of clusters 402 is selected. The number of clusters 402 may vary from a known minimum to a known maximum, settable by the operator 108. The genetic algorithm of the following steps is repeated for each permissible number of clusters 402, and the best solution adopted.
In an initiate-clusters step 502, a set of possible clusters 402 is selected; this is a single "gene". A random population of genes is selected-. Each cluster 402 is represented by the centroid of the objects 106 which would comprise that cluster 402. Thus, when a solution of clusters 402 is selected, each object 106 is assigned to the cluster 402 which it best matches.
After the initiate-clusters step 502, the genetic algorithm of the following steps is repeated for a known period of time, settable by the operator 108. When that time ej ires, the best available solution (i.e., the gene with the best quality) is selected as the solution and specifies the set of clusters 402. Each object 106 is assigned to the cluster 402 to which it is the closest,
In an evaluation step 503, all genes in the population are evaluated for quality, and the gene with the least quality is removed. In a preferred embodiment, the statistical measure "category utility" is computed; i.e., the utility of each cluster 402 in distinguishing between an object 106 in one cluster 402 from an object in another cluster 402. Thus, if the centroid of a cluster 402 has high quality of match for several objects 106, those objects are reasonably clustered together.
Although in a preferred embodiment, matching for clusters 402 is performed using relevant words 211, it would be clear to those of ordinary skill, after perusal of this application, that other properties of the objects 106 could be used as well, such as the read/write date of the object 106, and that doing so would be within the scope and spirit of the invention.
In a genetic-operator step 504, one of three operators is selected and employed to create a new gene: (1) Mutation-1. The new gene is randomly created. (2) Mutation-2. An existing gene is copied, except that one of its clusters 402 is mutated by replacing it with a randomly created cluster 402. (3) Crossover. Two genes have their n-tuples of clusters 402 paired off and one cluster 402 is selected at random from each pair to form the new gene. Alternatively, a new gene is created by selecting N clusters 402 at random from the 2N clusters 402 specified by the two old genes.
USER INTERFACE
Figure 6 shows an example ejφlorer user interface screen as viewed by an operator. While the invention is described primarily with regard to a specific user interface, it would be clear to those of ordinary skill in the art that another user interface of equal or greater flexibility would be suitable, and would be within the scope and spirit of the invention.
In a preferred embodiment, the user interface 109 may be combined with a user interface for a generalized file system exploration program, such as in the Windows NT system referred to herein. The user interface 109 may comprise a query window 601 in which the operator may enter the query 104 in free text, and a results window 602 in which the system 101 may display a set of matched objects 106 found in response to the query 104.
In a preferred embodiment, the operator 108 may enter the query 104 in the query window 601. The query 104 is input to the explorer 105, which processes it as described herein, and generates the vector query 112. The vector query 112 is input to the object file system 107, and generates the hit table 113 of matched objects 106. The hit table 113 is input to the user interface 109, which displays the matched objects 106. The operator may select a displayed matched object 106 to view its contents.
In a preferred embodiment, the user interface 109, the explorer 105, and the object file system 107, may operate as asynchronously as possible. Accordingly, the object file system 107 may search the database 102 for matched objects 106 independently, once it has sufficient information from the ejφlorer 105; the user interface 109 may display matched objects 106 from the hit table 113 as they are generated by the object file system 107.
In the example, the operator 108 has entered the query 104 "who invented the light bulb?" in a content field 603 of the query window 601, and the system 101 has responded with a set of matched objects 106 in the results window 602. The matched objects are displayed one per line, in columns labelled "rank", "query", "header", and "relevant words".
In the example, a rank field 604 displays the quality of match for each displayed matched object 106. In a preferred embodiment, the system 101 may order the matched objects 106 by rank. This may occur as the normal procedure, or at the request of the operator 108, e.g., by means of a "sort" command 605 in the query window 601. In a preferred embodiment, the rank field 604 may also be color-coded by value.
In the example, a query field 606 displays the relevant words of the query which are most related to the displayed matched object 106.
In the example, a header field 607 displays the header words 204 of the displayed matched object 106.
In the example, a relevant words field 608 displays the most common relevant words 211 of the displayed matched object 106.
In the example, a topics field 609 of the query window 601 displays suggested topics for refinement of the query 104 which the system 101 has identified. In a preferred embodiment, the operator 108 may select a topic in the topics field 609, and the system will display a subtopics window 610 (overlaid on the query window 601 and the results window 602) showing the subtopics which the system 101 has identified for that topic.
QUERY REFINEMENT
The operator 108 may refine the query 104 in response to the matched objects 106, and the ejφlorer 105 may attempt to match objects 106 using the query 104 as refined. This may occur at the request of the operator 108, e.g., by means of a "refresh" command 611 in the query window 601.
In a preferred embodiment, the operator 108 may select one or more subtopics 405 to refine the query 104. To do so, the operator 108 may identify (e.g., by pointing to with a pointing device such as a mouse) one or more subtopics 405 in the subtopics window 610. The selected subtopics 308 may be "added"
to the query 104 and the explorer 105 may attempt to match objects 106 using the query 104 as refined.
In a preferred embodiment, the operator 108 may also select one or more relevant words 211 to refine the query 104. To do so, the operator 108 may identify (e.g. by pointing to) the relevant words field 608 for a particular matched object 106 and "drag" that relevant words field 608 to the content field 603; the system 101 will display a relevance feedback window 612 (overlaid on the query window 601 and the results window 602) showing the relevant words 211 for that matched object 106.
In a preferred embodiment, the operator 108 may select one or more relevant words 211 to refine the query 104. To do so, the operator 108 may identify (e.g., by pointing to) one or more relevant words 211 in the relevance feedback window 612. The selected relevant words 211 may be "added" to the query 104 and the ejφlorer 105 may attempt to match objects 106 using the query 104 as refined.
The query 104 as refined (like the original query 104) is presented as a vector query 104 to the CBR engine 110. When selected subtopics 308 or relevant words 211 are "added" to the query, they are properties which the CBR engine 110 must match to objects 106, as described for methods of iterative refinement of case-based matching shown in those co-pending applications which have been incorporated by reference. (Thus, the CBR engine 110 must match to objects 106 as if the operator 108 had answered a
query refining question in a case-based system.) A query 104 as refined may be further refined, allowing the operator to iteratively refine the query 104 until desired objects 106 are located.
VIEWING CLUSTERS
Figure 7 shows a second example explorer user interface screen, as viewed by an operator, in which clusters are displayed.
The operator 108 may select a "cluster" command (figure 6) or "uncluster" (figure 7) command 701 in the query window 601, and the system 101 will display a set of clusters 402, each a set of related matched objects 106, in place of displaying matched objects 106 themselves. In the example, the operator has selected the "cluster" command 701 for the same query 104 as in the example of figure 6.
In the example, an expand field 702 displays whether the cluster 402 can be expanded (shown by a "+" symbol) to display individual matched objects 106, or can be collapsed (shown by a "-" symbol) to display a single identifier for the cluster 402.
In the example, the rank field 703 displays the best rank for all matched objects 106 in the cluster 402. In a preferred embodiment, the system 101 may order the clusters 402
by this rank field 703. This may occur as the normal procedure, or at the request of the operator 108, e.g., by means of the "sort" command 605 in the query window 601. In a preferred embodiment, this rank field 703 may also be color-coded by value.
In the example, the relevant words field 608 displays the most common relevant words 211 in the cluster 402.
Other fields and windows remain similar to the example of figure 6.
The operator 108 may also choose to cluster all objects 106 in a specific set, e.g., a specific directory in the object file system 107. In a preferred embodiment, the operator 108 may restrict the scope of the explorer 105 to a specific directory and issue the "cluster" command 701; the system 101 will display the objects 106 in that directory in clusters 402.
SETTING PARAMETERS
Figure 8 shows an example explorer user interface screen, as viewed by an operator, in which settings may be set by the operator.
In a preferred embodiment, the operator 108 may select settings appropriate for the system 101. The operator 108 may select a "properties" command 801 in the query window 601 (figure
6) , and the system 101 will display a properties window 802 with a set of property values 803 which may be set.
A "minimum rank of returned hits" property 804 is a threshold value for including matched objects 106; matched objects 106 whose rank falls below this value are not displayed in the results window 602 and are not used in further processing. The rank of a matched object 106 is calculated by the CBR engine 110. In the example, this value is set to 80.
A "maximum clustered hits" property 805 is a maximum number of matched objects 106 which are included in a single cluster 402. Those matched objects 106 not included in clusters 402 are placed in a special cluster 402 labelled "Other". In the example, this value is set to 400.
A "clustering time" property 806 is the elapsed real time devoted to clustering. In the example, this value is set to 2500 milliseconds.
A "minimum number of clusters" property 807 is the lower bound for the number of clusters 402 generated. In the example, this value is set to 2 clusters.
A "maximum number of clusters" property 808 is the upper bound for the number of clusters 402 generated. In the example, this value is set to 8 clusters. The system 101
attempts to generate a number of clusters 402 between the minimum and maximum number selected.
A "maximum topics" property 809 is the maximum number of topics displayed in the topics field 609 in the query window 601. In the example, this value is set to 7 topics.
A "maximum subtopics" property 810 is the maximum number of subtopics displayed in the subtopics window 610. In the example, this value is set to 250 subtopics.
A "do/don't cluster" property 811 sets whether or not clustering is performed. In the example, this value is set to YES.
A "do/don't generate query topics" property 812 sets whether or not topics and subtopics are generated in response to query terms 310. In the example, this value is set to YES.
A "do/don't generate salient topics" property 813 sets whether or not topics and subtopics are generated in response to relevant words 211. In the example, this value is set to YES.
A "boolean/vector query" property 814 sets whether the object file system 107 performs a boolean query or a vector query in response to the ejφlorer 105. In the example, this value is set to vector queries. A boolean query would have boolean connectors (e.g., "AND", "OR") coupling the query terms 310, so
that the query 104 would not be as flexibly matched. Search using boolean queries is well known in the art.
APPENDICES
Appendix A shows a table of parts of speech and a set of lexical rules for the English language, which may be used for the tag-and-segment-text process or the tag-and-segment-text process in a preferred embodiment.
Appendix B shows an output of a test run of an example filter when applied to a portion of an example multimedia encyclopedia used as a database, available as "Microsoft Encarta" from Microsoft Corporation of Redmond, Washington.
Alternative Embodiments
While preferred embodiments are disclosed herein, many variations are possible which remain within the concept and scope of the invention, and these variations would become clear to one of ordinary skill in the art after perusal of the specification, drawings and claims herein.
APPENDIX A
LEX2.TXT
Number of original entries from LDOCE and WordNet:
2466 lines of the form: Ability: skill, faculty, aptitude 11624 total terms on the right (downward relationships) Terms never have their parents as children (no loops)
Parts of speech represented:
I
A - Adjective strong, vivid, real
ADV - Adverb weakly, dimly, very
AUX - Auxiliary Verb can, shall, will
AXN - AUX not can't, won't doesn't
BE - be is, are, be, was
BTH - PQT/Double Conj. both
CLN - Colon
CMA - Comma
CON - Connective and, or, but O CRD - Cardinal three, 3.14, twenty-two o D - Determiner the, a, that
DAT - Date &/or Time friday, 3:00, Christmas
DDC - D/Double Conj. either, neither
DO - Do (aux) do, did, does
ENS - End Of Sentence ■? I
ETC - "And Others" ... , e^c. , e^. .
GEN - Genitive his, her, their
HAV - Have (aux) have, had, has,, having
IJ - Interjection Oh, shucks, well
INF - Infinitive marker to
N - Noun frog, pride, year
NEG - Negation not
ORD — Ordinal first, 2nd, last
P - Preposition by, around, with, from
PA - Open Paren ( , [, { , <
PD - Post Determiner many, several, next,
PN - Proper Noun Zippy, Brad Allen
PQL - Pre-Qualifier quite, rather, such
LEX2.TXT
PQT - Pre-Quantifier nary, many, half, all
PRN - Pronoun him, she, we
PRT - Participial Verb running, thinking
QA - Quantifier/Article that, this
QL - Qualifier some, many, every, QLP - Post-Qualifier enough, 'nuff, indeed
QN - Quantified Noun everybody, nothing REN - Close Paren , ). ], >, >
RP - Relative Pronoun that, which SOS - Start of Sentence, or « V - Verb (inf or past) eat, voted, surf WHD - Wh-Determiner what, which WHQ - Wh-Qualifier who , hy
XT - Existential Term it, there
Total number of phrase recognition rules: ω 5 for the filter:
CRD GEN|N|ORD, N, ~N
GEN, PRT
ADV CRD GEN N ORD, A CRD ORD, N, "N ADV CRD GEN N ORD, A CRD ORD, A|CRD|N|ORD, N, 'N CRD N | ORD, CON, A | CRD | N | ORD , N , *N Additional 10 for the Explorer (original 5 used as well ) :
LEX2.TXT
N, RP, AUX|AXN|COP|DO|HAV, P|PRT|V, N|PN note: "X means not X or nothing at all (end of sentence)
Total number of automatically acquired lexicon entries:
For Encarta, including base LDOCE/Wordnet entries:
184904 unique words / base phrases
51623 parents involved in 445025 relationships
151850 children involved in 445025 relationships
Average number of terms per automatically acquired phrase:
445025 / 51623 ■ 8.6 445025 / 151850 = 2.9 r Average number of children phrases from original LDOCE entries:
11624 / 2466 = 4.7
NOTE from Perry:
You asked how many things we got out of WordNet and LDOCE. The number that David responded was the number of taxonyms we extracted from those two sources (mostly WordNet) . If you were asking the number of words we extracted, it was initially in the neighborhood of 85,000. The current number of tagged words in the lexicon is 25915.
There are some additional phrase lattice rules that David didn't mention, since they are currently stubbed out. They involve noun phrases where a prepositional phrase or relative clause attatches to the right: of a noun:
Queen of England girl from Ipanema
LEX2.TXT
man who hit Dave Adam car that didn't stop The reason why we don't use them is because of the right attatchment.
Our current representation in the phrase lattice file is: base-word, extl, ext2, ... , extn where extl through extn all attatch to the LEFT of base-word. Bear in mind, of course, that unstubbing the code and fixing the reps of this fiTe will add this form of phrase lattice entry, but it will also increase the size of the phrase lattice file (perhaps double it) .
LDOCE is basically a dictionary of British English, so we found a lot of words we weren't familiar with, as well as a lot of double entries to account for American spellings (e.g. color and colour) . The lexical ω categories we were able to extract out of LDOCE and WordNet were limited to nouns, verbs, adjectives, adverbs, conjunctions, determiners, predeterminers, prepositions, pronouns, and phrases. Since we don't use a phrasal lexicon, we threw the phrases away.
All other categories of words (including the different categories of verbs: do, be, have, participial) were hand tagged. This tagging was greatly aided by two books: DeRose's Dissertation and the book by Kucera and Francis. The past tenses for all verbs were also done by hand, which was something of a waste as most of them (the regular ones) were eventually thrown away, once we implemented rules that tag based on word endings.
The following are the current set of rules used for determining noun phrases:
1. noun-phrase — > proper-noun (e.g. "Elvis")
2. noun-phrase - pronoun (e.g. "he")
3. noun-phrase -> noun (e.g. "cars")
4. noun-phrase -> gerund (e.g. "running")
5. noun-phrase -> determiner noun-phrase (e.g. "The person")
6. noun-phrase -> quantifier noun-phrase (e.g. "Three people")
7. noun-phrase -> adjective noun-phrase (e.g. "fluffy clouds")
8. noun-phrase -> adverb noun-phrase (e.g. "maddeningly fluffy clouds")
9. noun-phrase — > noun noun-phrase (e.g. "printer ribbons")
10. noun-phrase -> noun-phrase relative-clause (e.g. "The car that hit me")
11. noun-phrase — > noun-phrase prepositional-phrase
(e.g. "The person with the most toys")
12. noun-phrase — > noun-phrase that sentence
(e.g. "The candidate that I will vote for")
13. noun-phrase — > noun-phrase [, noun-phrase]* [,] and noun-phrase (e.g. "Larry, Moe and Curly")
14. noun-phrase -> noun-phrase [, noun-phrase]* [,] or noun-phrase (e.g. "England, France, or Germany")
15. noun-phrase — > comparative noun-phrase than noun-phrase (e.g. "more tea than China")
The Find Taxonomic Relations process (process 2.2 in figure 4) uses ART-IM rules to capture patterns of words which indicate taxonomic relationships between the words. For example, it detects patterns like:
"... kangaroos, wallabies, and other marsupials ..."
From this particular phrase, one could reasonably extract the relations
IS_A(kangaroo,marsupial) and IS_A(wallaby,marsupial)
Other patterns which detect this type of relation extracted from [14] are :
1. NP such as (NP.) * {(and \ or) ) NP
2. such NP as (NP,) * X(and \ or) ) NP
3. NP {, NP)* {,) and other NP
4. NP (, NP}* {.) or other NP
5. NP {,} including (NP,) * {(and \ or) } NP
6. NP (,) especially (NP.) * {(and \ or) ) NP
APPENDIX B
Mar 16 17 : 39 1993 test. log Emacs buffer Page 1
Clustering file afl. txt Non-empty clusters : 5 Clusters : 5 I Hits Vals Seed, Value: Count
0 1 0 NONE
1 2 0 Reuther, Walter Philip, Labor, labor:2, presidents, wage:2
2 2 0 Railroad Labor Organizations, Brotherhood, Union, united statesS
3 7 0 Hillman, Sidney, Labor, labor:7, afl:7, union:4, american federat
4 2 0 Kirkland, Lane, Labor, directors
Passes: 1029, best pass: 830, best score: 0.955, worst score: 0.170 Cluster 0, has 1 hits: "
Football, Type, United States Cluster 1, has 2 hits: 'labors, presidents, wage:2'
Meany, George, Labor
Reuther, Walter Philip, Labor Cluster 2, has 2 hits: 'united statesS, unionS, managements'
Railroad Labor Organizations, Brotherhood, Union
Teamsters Union, Full, International Brotherhood Cluster 3, has 7 hits: 'labors, afl:7, union:4, american federation^, cio:3,
American Federation, Labor, Congress
Gomper, Samuel, Labor
Green, William, Labor
Hillman, Sidney, Labor
Knight, Labor, Union
Lewi, John L, Labor
Strike, Labor, Relation Cluster 4, has 2 hits: 'directors'
Kirkland, Lane, Labor
Rozelle, Pete, Full
Clustering file alcohol.txt Non-empty clusters: 5 Clusters: 5 I Hits Vals Seed, Value:Count
0 15 0 (OTHER), blood , vitamins, tissues, poisons, sugar metabolis
1 22 0 Antifreeze, Chemical, Substance, alcoholSI, acid:7, ethyl:7, li
2 10 0 Vodka, Beverage, Known, alcohol:9, percent:5, beverages, use:3,
3 6 0 Gasohol, Blend, Part, fuel:5, alcohols, methanolS, combustion
4 4 0 Marijuana, Mixture, Leave, drugs, alcohols, syndromes, psycho Passes: 334, best pass.- 158, best score: 0.307, worst score: 0.132 Cluster 0, has 15 hits: '(OTHER), bloods, vitaminS, tissues, poisonS, suga
Birth Defects, Disorder, Structure
Cancer, Medicine, Growth
Corn, Maize, Cereal
Crop Farming, Cultivation, Plant
First Aid, Emergency, Measure
Fungi, Group, Organism
Liver, Organ, Vertebrate
Nutrition, Human, Science
Paint, Varnish, Liquid
Pennsylvania, Full, Commonwealth
Poison, Substance, Produce
Sugar, Term, Number
Mar 16 17:39 1993 test.log E acs buffer Page 2
Thermometer, Instrument, Measure Wine, Beverage, Juice Wood, Substance, Trunk Cluster 1, has 22 hits: 'alcohol I, acid:7, ethyl:7, liquid: , examples, chemi Acetaldehyde, Volatile, Liquid Antifreeze, Chemica1, Substance Azeotropic Mixture, Solution, Ratio Butyl Alcohol, Chemical, Formula Cannizzaro, Stanislao, Italian Disease, Medicine, Health Ester, Chemistry, Compound Ether, Chemistry, Ethyl Fermentation, Chemical, Change Formaldehyde, Compound, Carbon Glycerin, Glycerol, C3h8o3 Gum, Substance, Plant Iodine, Element, Symbol Lipid, Group, Substance Salicylic Acid, White, Solid Solution, Chemistry, Mixture Tannin, Acid, Name Turpentine, Name, Semifluid Vinegar, Condiment, Preservative Wax, Name, Ester Whiskey, Liquor, Mash Zymology, Zymurgy, Biochemistry Cluster 2, has 10 hits: 'alcohols, percentS, beverages, useS, liquor , dist Beer, Term, Beverage Cider, Sweet, Juice Cosmetic, Term, Preparation
Distillation, Process, Liquid
Distilled Liquors, Beverage, Alcohol Gin, Liquor, Grain
Liqueur, Beverage, Spirit
Police, Agency, Community
Prohibition, Ban, Manufacture
Vodka, Beverage, Known Cluster 3, has 6 hits: 'fuel:5, alcohols, methanolS, combustions, coals, en
Alcohol, Arabic, Al-kuhul
Automobile, Greek, Auto
Combustion, Process, Oxidation
Energy Supply, World, Resource
Gasohol, Blend, Part
Rocket, Term, Propulsion Cluster 4, has 4 hits: 'drugS, alcohols, syndromes, psychoactive drugs:2, ma
Alcoholism, Illness, Ingestion
Drug Dependence, State, Compulsion
Marijuana, Mixture, Leave
Psychoactive Drugs, Chemical, Substance
Clustering file bulb.txt Non-empty clusters: 5 Clusters: 5 I Hits Vals Seed, Value:Count
Mar 16 17:39 1993 test.log Emacs buffer Page 3
0 9 0 (OTHER), plants, united statesS, seeds, gardenings, flowerS
1 10 0 Radiometer, Instrument, Intensity, bulb:7, light:4, tuber:3, stem:
2 3 0 Electric Lighting, Illumination, Mean, lamp:3, glassS, neonS, ar
3 5 0 Autumn Crocus, Name, Herb, bulb:5, liliaceae:4, herb:3, lilyS, pi
4 6 0 Hygrometer, Type, Instrument, temperature:4, atmosphere , points Passes: 598, best pass: 333, best score: 0.491, worst score: 0.208
Cluster 0, has 9 hits: '(OTHER), plants, united statesS, seeds, gardenings,
Disease, Plant, Deviation
Gardening, Cultivation, Plant
Garlic, Name, Herb
Genetics, Study, Trait
Gopher, French, Gauffre
Horticulture, Latin, Hortu
Peanut Worm, Name, Small
Spice, Flavoring, Part
Technology, Term, Process Cluster 1, has 10 hits: 'bulbS, light:4, tuberS, stem , rhizomeS, electrons
Bulb, Mass, Leave
Edison, Township, Middlesex County
Edison, Thomas Alva, Inventor
Onion, Name, Herb
Photoelectric Cell, Phototube, Electron
Photography, Technique, Permanent
Radiometer, Instrument, Intensity
Rhizome, Stem, Organ.
Tuber, Stem, Plant
Ray, Radiation, Wavelength Cluster 2, has 3 hits: 'lampS, glassS, neonS, arcS, bulbS, argonS, lights
Argon, Element, Symbol
Electric Lighting, Illumination, Mean
Neon Lamp, Glass, Bulb Cluster 3, has 5 hits: 'bulb:5, liliaceae:4, herb , lily:3, pistilS, heights.
Autumn Crocus, Name, Herb
Hyacinth, Plant, Genu
Soap Plant, Amole, Native
Star-of-bethlehem, Name, Herb
Tuberose, Herb, Polianth Cluster 4, has 6 hits: 'temperature:4, atmospheres, points, humidityS, bulb
Blood Pressure, Pressure, Blood
Humidity, Moisture, Content
Hygrometer, Type, Instrument
Meteorology, Study, Atmosphere
Thermometer, Instrument, Measure
Vapor, Physic, Term
Clustering file columbus.txt Non-empty clusters: 7 Clusters: 7 I Hits Vals Seed, Value:Count
0 4 0 (OTHER), century:2
1 4 0 Pinzn, Name, Family, expedition^, voyage:2, hispaniola:2, pinta:2
2 5 0 Puerto Rico, Commonwealth, Spanish Estado Libre Asociado, Spanish:
3 2 0 Samana Cay, Island, Bahama, atlantic ocean:2, landfall:2, san sal
4 6 0 Mississippi, East South Central, U.S., state:5, river:3, city:3,
Mar 16 17:39 1993 test.log Emacs buffer Page 4
5 5 0 Santiago, Dominican Republic, Name, cacao:3, city:3, Caribbean:2,
6 4 0 South America, Continent, Asia, death valley:2, south:2, slavery: Passes: 614, best pass: 65, best score: 0.520, worst score: 0.189
Cluster 0, has 4 hits: '(OTHER), century:2'
American Literature, Literature, English
Coin, Geography, City
Europe, Continent, World
Knight, Columbu, Organization Cluster 1, has 4 hits: 'expedition:3, voyage:2, hispaniola:2, pinta:2, ship:2'
Columbu, Christopher, Italian Cristoforo Colombo
Pinzn, Name, Family
Ship, Type, Construction
Velzquez, Diego, Soldier Cluster 2, has 5 hits: 'spanish:4, island:3, spain:2, de:2, Christopher columbu
Bobadilla, Francisco, De
Cuba, Island, West Indies
Dsirade, Island, West Indies
Ferdinand V, The Catholic, King
Puerto Rico, Commonwealth, Spanish Estado Libre Asociado Cluster 3, has 2 hits: 'atlantic ocean:2, landfall:2, san Salvador:2, island:2,
Samana Cay, Island, Bahama
San Salvador, Island, Watling Island Cluster 4, has 6 hits: 'state:5, river:3, city:3, american civil war:2, ohio:2,
Columbu, Georgia, City
Columbu, Mississippi,__City
Columbu, Ohio, City
Georgia, State, South Atlantic
Mississippi, East South Central, U.S.
Ohio, East North Central, U.S. Cluster 5, has 5 hits: 'cacao:3, city:3, Caribbean:2, dominican:2, Santiago:2,
Columbu, Indiana, City
Santiago, Dominican Republic, Name
Santo Domingo, Trujillo, City
Spanish Town, City, Jamaica
Tobago, Republic, Commonwealth Cluster 6, has 4 hits: 'death valley:2, south:2, slavery:2, brazil:2, continen
Black, America, Immigration
North America, Continv c Canada South America, Continent, Asia United States, America, Republic
Clustering file dualism.txt Non-empty clusters: 5 Clusters: 5 f Hits Vals Seed, Value:Count
0 2 0 NONE
1 5 0 Dualism, Philosophy, Theory, mind:5, philosophers, philosophy ,
2 3 0 Devil, Hebrew, Belief, evil:3, god:3, goods, humanS, middle age
3 3 0 Paulician, Church, History, dualisms, sects, bogomilsS, old te
4 2 0 Docetism, Christian, Heresy, doctrines, human:2 Passes: 1050, best pass: 312, best score: 1.003, worst score: 0.397 Cluster 0, has 2 hits: ' '
Austria, German, sterreich Zoroastrianism, Religion, Persia
Mar 16 17:39 1993 test.log E acs buffer Page 5
Cluster 1, has 5 hits: 'mind:5, philosophe , philosophy:3, matters, universe
Dualism, Philosophy, Theory
Metaphysics, Branch, Philosophy
Monism, Greek, Mono
Occasionalism, Term, System
Philosophy, Greek, Philosophia Cluster 2, has 3 hits: 'evils, godS, good:2, human:2, middle agesS, middle e
Albigens, Follower, Single
Devil, Hebrew, Belief
Evil, Wrong, Harm Cluster 3, has 3 hits: 'dualisms, sects, bogomilsS, old testaments, century
Basilide, Teacher, Alexandria
Bogomils, Member, Sect
Paulician, Church, History Cluster 4, has 2 hits: 'doctrine:2, human:2'
Docetism, Christian, Heresy
Neoplatonism, Designation, Doctrine
Clustering file infant.txt Non-empty clusters: 7 Clusters: 7 S Hits Vals Seed, Value:Count
0 4 0 NONE
1 3 0 Gesell, Arnold Lucius, Psychologist, infants, developments
2 2 0 Incubator, Apparatu, Chamber, growths
3 2 0 Pregnancy, Childbirth, Term, births, pregnancyS, infants, chi
4 2 0 Hondura, Republic, Central America, countryS, 1980s:2
5 3 0 Baptism, Greek, Baptein, rite:2, baptisms
6 2 0 Japan, Japanese Dai, Great, manchuriaS, governments, partyS
Passes: 835, best pass: -> best score: 0.795, worst s_ T. ?T\ Cluster 0, has 4 hits:
Free Trade, Interchange, Frontier
Human, Name, Individual
Perception, Process, Stimulation
Scotland, Division, Kingdom Cluster 1, has 3 hits: 'infants, developmen s'
Gesell, Arnold Lucius, Psychologist
Infancy, Period, Birth
Sudden Infant Death Syndrome, Sid, Death Cluster 2, has 2 hits: "growths'
Incubator, Apparatu, Chamber
Population, Term, Human Cluster 3, has 2 hits: 'birthS, pregnancy:2, infants, childbirth:2, women:2'
Obstetrics, Branch, Medicine
Pregnancy, Childbirth, Term Cluster 4, has 2 hits: 'country:2, 1980s:2'
Hondura, Republic, Central America
Sierra Leone, Nation, Africa Cluster 5, has 3 hits: 'rite:2, baptisms'
Baptism, Greek, Baptein
Circumcision, Removal, Part
Hennonite, Religious, Group Cluster 6, has 2 hits: 'manchuria:2, government:2, party:2'
China, Chinese Zhonghua Renmin Gongheguo, People Republic
Mar 16 17:39 1993 test.log Emacs buffer Page 6
Japan, Japanese Dai, Great
Clustering file israel.txt Non-empty clusters: 4 Clusters: 4 II Hits Vals Seed, Value:Count
0 22 0 (OTHER), governments, war:4, centuryS, french revolutions, coun
1 66 0 Judah, Old Testament, Name, israel:64, judahSO, old testamentSO,
2 39 0 Nasser, Gamal Abdel, Egyptian, israel:32, arab:26, israeliSO, pal
3 11 0 Song, Solomon, Book, book:10, old testaments, israelS, chap:5, b Passes: 127, best pass:_117, best score: 0.213, worst score: 0.083
Cluster 0, has 22 hits: '(OTHER), governments, war:4, centuryS, french revolut Achille Lauro, Italian, Cruise Anti-semi ism, Social, Agitation Asia, Continent, Island Assyria, Ashur, Ashshur Bahai, Persian, Glory Buber, Martin, Religious Cabala, Hebrew, Tradition Crusade, Expedition, Undertaken Eschatology, Discourse, Last Espionage, Collection, Information Iran, Islamic Republic, Republic
Jewish Art, Architect c Jew Jewish Music, Religic o , Music Nationalism, History, Movement Portuguese Literature, Literature, Portuguese Refugee, Person, Country Romania, Republic, Europe Saudi Arabia, Monarchy, Southwest Asia
Union, Soviet Socialist Republics, Russian Soyuz Sovyetskikh Sotsialisticheski United Nations, Organization, Nation-state United States, America, Republic Woman Suffrage, Right, Women Cluster 1, has 66 hits: 'israel:64, judahSO, old testamentSO, king:18, bc:12, Abner, Old Testament, Cousin Ahab, King, Israel Amaziah, Hebrew, King Ammonite, People, Region Amo, Book, Old Testament Angel, Greek, Aggelo Apostle, Greek, Apostolo Ashqelon, Town, Palestine Balaam, Old Testament, Prophet Kokhba, Simon, Name Bene Israel, Community, Jew Ben-zvi, Itzhak, Second Bethlehem, Jordan, Hebrew Bible, Holy Bible, Book Carmel, Mount, Mountan Diaspora, Greek, Dispersion David, King, Be Edom, Old Testament, Times Elat, Eilat, City
Mar 16 17:39 1993 test.log Emacs buffer Page 7
Elia, Century, Be
Elisha, Old Testament, See
Ephraim, Hebrew, Old Testament
Esdraelon, Plain, Jezreel
Ezekiel, Book, Old Testament
Falasha, Sect, Ethiopia
Galilee, Galil, Circle
Gideon, Hebrew, Hewer
Habima Theater, Former, Name
Hebron, City, Israeli-occupied Jordan
Herzog, Chaim, President
High Priest, Hierarchy, Head
Hoion, City, Israel
Israel, Kingdom, Hebrew
Jacob, Old Testament, Patriarch
Joash, Name, King
Jehoshaphat, Hebrew, Jehovah
Jehu, Hebrew, Jehovah
Jeremiah, Book, Old Testament
Jeroboam I, Old Testa- -.r. See Jeroboam Ii, King, Israel Jew, Usage, Hebrews Jezebel, Tyrian, Princess Jonathan, Old Testament Books, Samuel Judah, Old Testament, Name Judaism, Culture, Jew Justification, Theology, Way King, Book, Old Testament Lost Tribes, History, Tribe Manasseh, Son, Old Testament Meir, Golda, Israeli Michael, Hebrew, God Moab, Country, Hill
National Jewish Welfare Board, National, Agency Negeb, Region, Middle East Philistine, Inhabitant, Region Putnam, Israel, Soldier Ramat Gan, City, Central Rehoboam, King, Judah Samuel, Book, Old Testament Saul, King, Israel Sharon, Plain, Israel She a, Hebrew, Word Solomon, King, Israel Tiberia, Lake, Sea Weizmann, Chai , Long-time Zangwill, Israel, English Cluster 2, lias 39 hits: 'israelS2, arab:26, israeliSO, palestine:ll, egypt:ll, Husein, King, Jordan Acre, Akko, Seaport Agnon, Slimuel Yosef, Israeli Amman, Rabbah Ammon, Philadelphia Arab League, Name, League Arafat, Yasir, Palestinian Aren, Moshe, Israeli Menachem, Israeli, Prime
Mar 16 17:39 1993 test.log Emacs buffer Page
Ben-gurion, David, Israeli
Damascu, Arabic Dimashq, Ash-sham
Dayan, Moshe, Israeli
Egypt, Arab Republic, United Arab Republic
Gaza, Arabic Ghazze, City
Golan Heights, Region, Syria
Haifa, City, Seaport
Hebrew Literature, Literature, Jew
Iraq, Irak, Republic
Israel, Republic, Middle East
Jerusalem, Arabic, Al-qud
Jordan, River, Middle East
Jordan, Hashemite Kingdom, Arabic
Kibbutz, Village, Far Lebanon, Arabic Lubnan, Republic
Libya, Full, Socialist People Libyan Arab Jamahiriyah Middle East, Region, Geography Nasser, Gamal Abdel, Egyptian Palestine, Region, Extent
Palestine Liberation Organization, Plo, Body Sadat, Egyptian, Military Six-day War, Conflict, June Suez Canal, Waterway, Running Syria, Arabic Suriyah, Al-arabiyah Tel Aviv-jaffa, Tel Aviv-yafo, City Terrorism, International, Use Tunisia, Republic, Africa West Bank, Area, West Yom Kippur War, Conflict, Israel Zionism, Movement, People Zionist Organization, America, Zoa Cluster 3, has 11 hits: 'book:10, old testament:9, israel:9, chap:5, be:5, proph Dead Sea Scrolls, Collection, Hebrew Hosea, Book, Old Testament Isaiah, Book, Old Testament Joshua, Book, Old Testament Judge, Book, Old Testament Micah, Book, Old Testament Number, Book, Old Testament Obadiah, Book, Old Testament Song, Solomon, Book Wisdo , Solomon, Book Zechariah, Book, Old Testament
Clustering file marx.txt Non-empty clusters: 6 Clusters: 6 β Hits Vals Seed, Value:Count
0 2 0 (OTHER), german:2, germany:2, east:2, baltic sea:2
1 3 0 Hegel, G, W, philosophers, philosophy:2
2 4 0 Bolshevism, Doctrine, Theory, communist:4, lenin:4, revolutions,
3 4 0 Marx Brothers, 20th-century, Comedian, marx:4, socialisms, engels
4 4 0 Communist Manifesto, German Manifest, Partei, capitalists, class.-
5 6 0 Ideology, System, Concept, social:3, marx:3, labor:2, world war ii
Mar 16 17:39 1993 test.log Emacs buffer Page 9
Passes: 722, best pass: 675, best score: 0.663, worst score: 0.248 Cluster 0, has 2 hits: '(OTHER), german:2, germany:2, east:2, baltic sea:2'
Germany, Country, Europe
Germany, German Democratic Republic, Gdr Cluster 1, has 3 hits: 'philosopher:3, philosophy:2'
Hegel, G, W
Philosophy, Greek, Philosophia
Political Theory, SuL . ion, Science Cluster 2, has 4 hits: 'communist:4, lenin:4, revolutions, communism:2, govern
Bolshevism, Doctrine, Theory
Communism, Concept, System
International, Name, Socialist
Socialism, Doctrine, Movement Cluster 3, has 4 hits: 'marx:4, socialisms, engels:2'
Bernstein, Eduard, German Social Democratic
Economics, Science, Production
Engels, Friedrich, German
Marx Brothers, 20th-century, Comedian Cluster 4, has 4 hits: 'capitalists, class:3, capitalism:2, communist:2, bourg
Bourgeoisie, Resident, European
Capitalism, System, Individual
Communist Manifesto, German Manifest, Partei
Marx, Karl, German Cluster 5, has 6 hits: 'social 3, marx:3, labor:2, world war ii:2, german:2, ce
Ideology, System, Concept
Karl-marx-stadt, Former, Name
Kauts y, Karl Johann, German Marxist
Lassalle, Ferdinand, German
Sociology, Science, Deal
Wage, Theory, Labor
Clustering file muslim.txt Non-empty clusters: 4 Clusters: 4 if Hits Vals Seed, Value:Count
0 41 0 (OTHER), arab:7, bc:5, ibn:4, indian:4, india:4, islam:4
1 20 0 Philippine, Republic, Pacific Ocean, 1980s:17, country:8, governm
2 40 0 Kashgar, Kashi, Kaxgar, muslim:38, india:8, muhammad:7, Jerusalem
3 11 0 Mathematics, Study, Relationship, century:11, art:3, franee:3, ar Passes: 146, best pass: 47, best score: 0.210, worst score: 0.124
Cluster 0, has 41 hits: '(OTHER), arab:7, bc:5, ibn:4, indian:4, india:4, islam Alfonso Viii, King, Castile Arabia, Desert, Peninsula Arabic Literature, Literature, People Archaeology, Greek, Archaio Averros, Arabic, Abu
Black Muslims, Religious, Organization Borneo, Island, World Chess, Game, Skill Christianity, World, Religion Chronology, Science, Division Concubinage, Term, World Costume, Clothing, People Demon, Usage, Spirit
Mar 16 17:39 1993 test.log Emacs buffer Page 10
Egypt, Arab Republic, United Arab Republic
Gandhi, Mohandas Karar .1 1, Mahatma Gandhi
Ghana, Kingdom, West :iv.an
Hegira, Hejira, Arabic
Iraq, Irak, Republic
Jacobite Church, Christian, Group
Java, Island, Malay Archipelago
Jew, Usage, Hebrews
Jordan, Hashemite Kingdom, Arabic
Judaism, Culture, Jew
Karbala, City, Iraq
Mahdi, Arabic, Mahdiy
Medina, Medinat-en-nabi, City
Middle East, Region, Geography
Nehru, Indian, Nationalist
Orthodox Church, Major, Branch
Philosophy, Greek, Philosophia
Pottery, Clay, Firing
Punjab, Region, River
Saudi Arabia, Monarchy, Southwest Asia
Shiite, Arabic, Partisan
Sikhs, Follower, Religion
Sudan, Republic, Africa
Trigonometry, Branch, Mathematics
Tobago, Republic, Commonwealth
Tunisia, Republic, Africa
Turkey, Republic, Turkish Trkiye Cumhuriyeti
Vijayanagar, Kingdom, India Cluster 1, has 20 hits: '1980s:17, country:8, government:7, Spanish:5, arab:4, s
Afghanistan, Persian Afghnistn, Republic
Bangladesh, Full, People Republic
Berber, Name, Language
Cameroon, Republic, Africa
Chad, Republic, Central
Ethiopia, Abyssinia, Republic
Gambia, Republic, Commonwealth
Gibraltar, Dependency, Promontory
Indonesia, Republic, Island
Iran, Islamic Republic, Republic
Israel, Republic, Middle East
Kenya, Republic, Africa
Libya, Full, Socialist People Libyan Arab Jamahiriyah
Morocco, Arabic, Al-mamlakah
Nigeria, Federal Republic, Republic
Pakistan, Islamic Republic, Republic
Philippine, Republic, Pacific Ocean
Republic, Europe, Portion
Spain, Spanish Espaa, Monarchy
Syria, Arabic Suriyah^ Al-arabiyah Cluster 2, has 40 hits: 'muslim:38, india:8, muhammad:7, Jerusalem:5, delhi:4, p
Fakhruddin Ali, Fifth, President
Algeria, French Algrie, Popular Republic
Allah, Name, Supreme Being
Almeida, Francisco, De
Almoravid, Berber, Dynasty
Asia, Continent, Island
Mar 16 17:39 1993 test.log Emacs buffer Page 11
Babism, Religion, Offshoot Balewa, Sir Abubakar Tafawa, Minister Region, Part, Subcontinent Caliphate, Office, Realm Crusade, Expedition, Undertaken Delhi, Old Delhi, City Delhi Sultanate, Muslim, State Dervish, Turkish, Darvsh Fakir, Arabic, Faqir Farabi, Tarkhan, Al-farabi Gansu, Kansu, Province Ghazali, Name, Abu Ha id Muhammad India, Republic, Hindi Bharat Sir Muhammad, Pakistani, Philosopher Islam, World, Religion Islamic Music, Vocal, Art Ja mu, Kashmir, Known Jerusalem, Arabic, Al-qud Jinnah, Muhammad All, Leader Kashgar, Kashi, Kaxgar Kharijite, Arabic, Kharawrij Lebanon, Arabic Lubnan, Republic Malaysia, Monarchy, Commonwealth Malcolm X, Leader, Omaha Mufti, Title, Lawyer Palestine, Region, Extent Pilgrim, Place, Intent Relic, Usage, Body Roger I, Norman, Conqueror Saladin, Leader, Jerusalem
Shivaji Bhonsle, Founder, India Maratha State Tughluq, Muhammad, Sultan Tuni, Tune, City Umar, Al-hajj, West African Cluster 3, has 11 hits: "century:ll, art:3, france:3, architecture:2, sculpture: Africa, Continent, Island Europe, Continent, World
France, French Rpublique Franaise, Republic Gypsy, People, Heritage History, Historiography, Sense Indian Art, Architecture, Art Indian Literature, Literature, Language Islamic Art, Architecture, Art Librar , Repository, Form Mathematics, Study, Relationship Portraiture, Representation, Art
Clustering file pope.txt Non-empty clusters: 3 Clusters: 3 8 Hits Vals Seed, Value:Count
0 50 0 (OTHER), church:12, henry:8, king:7, english:6, roman:6, governme
1 138 0 Benedict Xiv, Pope, Moderation, pope:138, church:28, rome:26, cou
12 0 Angelico, r Italian, florence:10, meo±c J, flόre-itineT:4,~ ddmiή
Mar 16 17:39 1993 test.log Emacs buffer Page 12
Passes: 86, best pass: 34, best score: 0.149, worst score: 0.082
Cluster 0, has 50 hits: "(OTHER), church:12, henry:8, king:7, english:6, roman:6
Aquina, Saint Thomas, Angelic Doctor
Borgia, Cesare, Italian
Bruno, Saint, Carthusian
Bulgaria, Full, People Republic
Canon Law, Greek, Kanon
Carpini, Giovanni, De
Carroll, John, American Roman Catholic
Christianity, World, Religion
Church, England, Anglican Church
Civil War, Conflict, United States
Conrad Iii, King, Germany
Corsica, French Corse, Island
Counter Reformation, Movement, Roman Catholic
Couplet, Poetry, Term
Cranmer, Thoma, Archbishop
Cyril, Methodiu, Saint
Demarcation, Line, Boundary
Duns Scotus, John, Theologian
Easter, Festival, Resurrection
England, Latin Anglia, Portion
English Literature, Literature, England
Erigena, John Scotus, Scholar
Este, Italian, Family
Europe, Continent, World
Felix V, Last, Antipope
Ferdinand I, Naple, King
Feuillant, French, Organizations-one
Finland, Finnish Suomi, Republic
Fisher, Saint John, English Christian
France, French Rpublique Franaise, Republic
Gardiner, Stephen, English
Germany, Country, Europe
Henry Viii, King, England
Henry Iv, France, Bourbon
Holy Roman Empire, Eatity, Europe
Hungary, Hungarian Magyarorszg, Republic
Ireland, Geography, Island Italian Italia, Republic, Europe
Knight, Saint John, Jerusalem
Lincoln, Abraham, President
Loyola, Saint Ignatius, Spanish Inigo
Lutheranism, Protestant, Denomination
Mary, Virgin Mary, Mother
Mendelssohn, Mos, German
Middle Ages, Period, European
Modernism, Theology, Philosophy Neri, Saint Philip, Italian
Orthodox Church, Majo.. inch Poland, Republic, Polska zeczpospolita Pole, Reginald, English Roman Catholic Cluster 1, has 138 hits: 'pope:138, church:28, rome:26, council:23,, papacy:23, Adrian I, Pope, Power Adrian Iv, Pope, Englishman Adrian Vi, Pope, Dutchman
Mar 16 17:39 1993 test,log Emacs buffer Page 13
Alexander Iii, Pope, Authority
Alexander Vi, Pope, Worldliness
Algardi, Alessandro, Italian
Antonelli, Giaco o, Italian
Arnold, Brescia, 1100-c
Augustinian, Order, Roman Catholic
Bacon, Roger, English Scholastic
Basel, Council, Middle Ages
Bembo, Pietro, Italian
Benedict Viii, Pope, Reformer
Benedict Ix, Pope, 1032- 4
Benedict Xiii, Antipope, Avignon
Benedict Xiv, Pope, Moderation
Benedict Xv, Pope, Church
Bernard, Clairvaux, Saint
Bonaventure, Saint, Theologian
Boniface, Saint, English Benedictine
Boniface Viii, Pope, Power
Boniface Ix, Pope, Papal States
Bossuet, Jacques Bnigne, French Roman Catholic
Bull, Letter, Document
Bull Run, Battle, Manassa
Callistu, Calixtus I, Saint
Callistus Ii, Calixtus Ii, Pope
Callistus Iii, Calixtus Iii, Pope
Canonization, Roman Catholic, Church
Canossa, Village, Reggio
Cardinal, Title, Latin
Catherine, Aragn, Queen
Catherine, Siena, Saint
Cedar Mountain, Battle, Military
Celestine V, Saint, Pope
Celestine Iii, Pope, Born Giacinto Bobo
Censorship, Supervision, Control
Chalcedon, Council, Emperor
Charlemagne, Latin Carolus Magnus, Charle
Charles V, Holy Roman Empire, Holy Roman
Church, State, Relationship
Clement V, Pope, Avignon
Clement Vi, Pope, Church
Clement Vii, Pope, Pontificate
Clement Vii, Antipope, Great Schism
Clement Viii, Last, Pope
Clement Xiv, Pope, Jesαi
Conciliar Theory, Doctrine, Superiority
Conclave, Latin, Cum
Constance, Council, City
Coptic Church, Christian, Church
Council, Assembly, Doctrine
Crusade, Expedition, Undertaken
Damasus I, Saint, Pope
Damian, Saint Peter, Doctor
Doctor, Church, Christian
Dllinger, Johann Joseph Ignaz, Von
Ecumenical Movement, Movement, Cooperation
Edmund, Abingdon, Saint
Mar 16 17:39 1993 test.log Emacs buffer Page 14
Elector, German Imperial, German Kurfrsten
Eugene Iii, Pope, Cistercian
Eugene Iv, Pope, Dispute
Formosu, Pope, Trial
Franciscan, Order, Friars Minor
Frederick I, Holy Roman Empire, Frederick Barbarossa
Frederick Ii, Holy Roman Empire, Holy Roman
Gallicanism, History, Combination
Gregory I, Saint, Pope
Gregory Ii, Saint, Pope
Gregory Vii, Saint, Pope
Gregory Ix, Pope, Inquisition
Gregory Xi, Pope, Return
Guiscard, Robert, Norman
Henry Ii, Holy Roman Empire, Henry The Saint
Henry Iv, Holy Roman Empire, Holy Roman
Henry V, Holy Roman Empire, German
Hippolytu, Rome, Saint
Honorius I, Pope, Heretic
Infallibility, Theology, Doctrine
Innocent Iii, Pope, Pop
Innocent Iv, Pope, Dominion
Innocent Xi, Pope, King Louis Xiv
Inquisition, Institution, Papacy
Interdict, Roman Catholic, Church
Investiture Controversy, Dispute, Church
Jesuit, Society, Jesu
Joan, Pope, Female
John Ii, Pope, Born Mercurius
John Viii, Pope, Ablest
John Xii, Pope, Boy Pope
John Xxi, Pope, Pontiff
John Xxii, Pope, Second
John Xxiii, Antipope, Born Baldassare Cossa
John Xxiii, Pope, Era
John, John Lackland, King
John Paul I, Pope, Born Albino Luciani
John Paul Ii, Pope, N -. lian
Jubilee, Jew, Sabbatical
Julius Ii, Pope, Reign
K.ulturkampf, German, Culture
Langton, Stephen, English
Lateran Councils, Council, Roman Catholic
Lateran Treaty, Designation, Agreement
Leo Iii, Saint, Pope
Leo Ix, Saint, Pope
Leo X, Pope, Renaissance
Leo Xiii, Pope, Modern
Louis Iv, German, Ludwig Iv
Lyon, Council, Church
Martin I, Saint, Pope
Martin Iv, Pope, Born Simon
Martin V, Pope, Election
Molino, De, Spanish Roman Catholic
Nicholas Iii, Pope, Papal States
Nichola, Cusa, German
Mar 16 17:39 1993 test.log Emacs buffer Page 15
Occam, William, 1285-1349 Otto Iii, Holy Roman, Emperor Otto Iv, Otto, Brunswick Papacy, Office, Pope
Papal States, Church, Pontifical States Paschal Ii, Pope, Reign Paul" V, Pope, Born Camillo Borghese Paul Vi, Pope, Second Vatican Council Pepin, Short, Mayor Peter Pence, Offering, Pope Philip Iv, France, The Fair Photiu, 820-91, Patriarch Pico Delia Mirandola, Giovanni, Conte Pius Ii, Pope, Writer Pius Iv, Pope, Conclusion Pius V, Saint, Pope Pius Vi, Pope, Reign Pius Vii, Pope, Napoleon Pius Ix, Pope, Pontificate Pius X, Saint, Pope Pius Xi, Pope, Path Pius Xii, Pope, World War Ii Pope, Latin, Papa Cluster 2, has 12 hits: 'florence:10, medici:5, florentine:4, dominican:3, chur Alberti, Leon Battista, Italian Albertus Magnus, Saint, Albert Angelico, Fra, Italian Cellini, Benvenuto, Florentine Dante Alighieri, Italian, Poet Dominican, Friars Preachers, Member Ferrara-florence, Council, Basel-ferrara-florence
Florence, Italian Firt. z. Florentia Guicciardini, Francesco, Italian Leonardo, Da, Vinci Medici, Lorenzo, De Michelangelo, Creator, History
Clustering file sound.txt Non-empty clusters: 5 Clusters: 5 I Hits Vals Seed, Value:Count
0 68 0 (OTHER), music:10, american civil war:6, state:6, bass:5, century:
1 57 0 Mach Number, Aerodynamics, Mechanic, sound:51, instruments, pitch
2 8 0 Letter, Vowel, English, sound:6, long:3, letter:3, sign:2, atlanti
3 19 0 Linguistics, Study, Language, language:14, english:9, speech:6, so
4 11 0 Vowel, English, Alphabet, sound;11, alphabets, letter:9, hierogly Passes: 103, best pass: 74, best score: 0.173, worst score: 0.072 Cluster 0, has 68 hits: '(OTHER), music:10, american civil war:6, state:6, bass:
Amati, Family, Italian
American Indian Languages, Language, People
American Indians, People, America
Audiovisual Education, Planning, Preparation
Band, Ensemble, Brass
Transaction, Service, Consumer
Mar 16 17:39 1993 test.log Emacs buffer Page 16
Bird, Name, Member
Bremerton, City, Kitsap County
British Columbia, Province, Canada
Bronx, Borough, New York City
Building Construction, Procedure, Erection
Circulatory System, Anatomy, Physiology
Communication, Method, Receiving
Connecticut, New England, United States
Copyright, Body, Right
Currency, Economics, Term
Deep-sea Exploration, Investigation, Chemical
Bass, Member, Violin
Drama, Dramatic Arts,-Form
Edison, Thomas Alva, Inventor
Encyclopedia, Encyclopaedia, Greek
Firework, Device, Material
Floor, Floor Coverings, Ceiling
Folk Dance, Dance, Member
Folk Music, Music, Performance
Frequency, Term, Science
Golden Globe Awards, Motion, Picture
Harmony, Music, Combination
Harpsichord, Italian, Cembalo
Insect, Name, Animal
Jazz, Type, Music
Jet Propulsion, Thrus.., parting
Mississippi, East South Central, U.S.
Motion Picture Arts, Science, Academy
Music, Vocal, Part
Music, Western, Europe
Musical Form, Arrangement, Element
Mystic, Village, Stonington
Navigation, Science, Position
Haven, City, New Haven County
North Carolina, South Atlantic, U.S.
Ocean, Oceanography, Body
Orchestra, Ensemble, Instrument
Orchestration, Art, Musical
Philosophy, Greek, Philosophia
Pianoforte, Keyboard, Musical
Social Dance, Term, Dance
Radio, System, Communication
Rhode Island, Full, State
Scale, Music, Italian
Scott, Robert Falcon, Officer
Seattle, City, Seat
Seward Peninsula, Peninsula, Alaska
Snake, Reptile, Name
Sonata, Italian, Sonare
Tacoma, City, Seat
Telephone, Communication, Instrument
Television, Tv, Transmission
Theater Production, Mean, Form
United States, America, Republic
Valdez, City, Alaska
Video Recording, Process, Recording
Mar 16 17:39 1993 test.log Emacs buffer Page 17
Viol, Instrument, Century Washington, State, U.S. Wave Motion, Physic, Mechanism Whale, Mammal, Order Yachting, Operation, Boat Zither, Instrument, String Cluster 1, has 57 hits-- 'sound:51, instruments, pitch:7, string:5, recordings Acoustics, Greek, Akouein Aerodynamics, Branch, Mechanic Airplane, Craft, Action Albemarle Sound, Inlet, Atlantic Ocean Bell, Instrument, Percussion Chaplin, Charlie, Name Clair, Ren, Name Digital Audio Tape, Dat, Tape De Forest, Lee, Inventor Doppler Effect, Physic, Variation Ear, Organ, Hearing Edmond, City, Snohomish County
Electronic Music,
Exxon Valdez, Oil
Fluid Mechanics, Science, Action
Grunt, Name, Fish
Guitar, Instrument, Lute
Harmonic, Vibration, Primary
Harp, Instrument, Run
Hearing, Main, Sense
Hearing Aid, Device, Sound
Mach Number, Aerodynamics, Mechanic
Microphone, Device, Energy
Midi, Acronym, Musical Instrument Digital Interface
Motion Picture, Sequence, Photograph
Motion Pictures, History, Development
Music, Movement, Sound
Musical Instruments, Tool, Scope
Noise, Physic, Signal
Oboe, Wind, Instrument
Organ, Instrument, Air
Petroleum, Oil, Bituminou
Phonograph, Known, Player
Physic, Science, Constituent
Prince William Sound, Inlet, Gulf
Propeller, Device, Force
Puget Sound, Arm, Pacific Ocean
Radiometer, Instrument, Intensity
Reflection, Physic, Phenomenon
Singing, Use, Voice
Sonar, Acronym, Sound Navigation And Ranging
Sound, Phenomenon, Sense
Determination, Depth, Body
Sound Recording, Reproduction, Conversion
Supersonics, Branch, Physic
Synthesizer, Computer, Peripheral
Tone, Music, Sound
Transformer, Device, Coi1
Mar 16 17:39 1993 test.log Emacs buffer Page 18
Tyndall, John, Physicist Ultrasonics, Branch, Physic Ventriloquism, Art, Sound Violin, Instrument, Member Viscount Melville Sound, Arm, Arctic Ocean Voiceprint Identification, Method, Person Warner Brothers, Motion, Picture Xylophone, Greek, Xylon Cluster 2, has 8 hits: 'sound:6, long:3, letter:3, sign:2, atlantic ocean:2, mi Animal Behavior The, Behavior, Animal C, English, Romance-language Diacritic Mark, Sign, Mark Island Sound, Body, Salt
Letter, Vowel, Engli-
Pamlico Sound, Inlet, Atxantic Ocean
Rhyme, Likeness, Sound , Letter, English Cluster 3, has 19 hits: *language:14, english:9, speech:6, sound-.6;' word:5, spok
American English, English, Spoken
Celtic Languages, Indo-european, Family
Chinese Language, Language, Chinese
Cuneiform, Latin, Cuneu
Deafness, Inability, Definition
English Language, Medium, Communication
English Literature, Literature, England
Etymology, Branch, Linguistics
Grammar, Branch, Linguistics
Greek Language, Language, People
Hieroglyph, Character, System
Japanese Language, Language, Spoken
Language, Communication, Being
Linguistics, Study, Language
Phonetics, Branch, Linguistics
Poetry, Form, Expression
Semantics, Greek, Seπtantiko
Versification, Art, Verse
Writing, Method, Intercommunication Cluster 4, has 11 hits: 'sound:11, alphabet:9, letter:9, hieroglyph:8, english:7
Vσwel, English, Alphabet
Alphabet, Alpha, Beta
F, Letter, Consonant
K, Letter, English
L, Letter, English
M, Letter, English
Q, Letter, English
R, Letter, English
U.. 21st, Letter
X, Letter, English
Y, Letter, English
Clustering file strike.txt Non-empty clusters: 4 Clusters: 4 « Hits Vals Seed, Value:Count
Mar 16 17:39 1993 test.log Emacs buffer Page 19
0 6 0 (OTHER), electron:2, beam:2, tube:2, television:2
1 11 0 Gary, City, Lake County, strike:10, united states:3, presidents,
2 10 0 National Labor Relations Act, Nlra, Law, labor:9, strike:8, union
3 15 0 Poland, Republic, Polska Rzeczpospolita, government:11, 1980s:8, Passes: 453, best pass: 208, best score: 0.445, worst score: 0.154
Cluster 0, has 6 hits: '(OTHER), electron:2, beam:2, tube:2, television-^' Baseball, Game, Skill
Cathode-ray Tube, El*- : , Tube
Napoleon I, Emperor, x'ench
Russia, History, Empire
Television, Tv, Transmission
Warfare,. Use, Force Cluster 1, has 11 hits: 'strike:10, united states:3, presidents, injunctions,
Chartism, Reform, Movement
Coolidge, John, Calvin
Defense Systems, Defense, Country
Deb, Eugene Victor, American Socialist
Dollfuss, Engelbert, Chancellor
Fault, Geology, Line
Gary, City, Lake County
Homestead Strike, Labor, Strike
Pullman Strike, See, Deb
Sound, Phenomenon, Sense
Ueberroth, Peter Victor, Sport Cluster 2, has 10 hits: 'labor:9, strike:8, union:7, labor-management relations
Cleveland, Grover, 22d
Industrial Workers, World, Former
International Ladies, Garment Workers, Union
Knight, Labor, Union
Labor Relations, Transaction, Determination
Lockout, Labor, Relation
National Labor Relations Act, Nlra, Law
Labor, Relation, Practice
Strike, Labor, Relation
Trade Unions, United States, Labor Cluster 3, has 15 hits: 'government:11, 1980s:8, war:6, country:4, soviet:3, pa
Colombia, Republic, South America
France, French Rpublique Franaise, Republic
Ghana, Country, Africa
Britain, United Kingdom, Great Britain
Illinoi, East North Central, U.S.
Italian Italia, Republic, Europe
Japan, Japanese Dai, Great
Northern Ireland, Part, United Kingdom
Poland, Republic, Polska Rzeczpospolita
Russian Revolution, Event, Russia
Spain, Spanish Espaa, Monarchy
Sweden, Konungariket Sverige, Kingdom
Union, Soviet Socialist Republics, Russian Soyuz Sovyetskikh Sotsialisticheski
United States, America, Republic
World War Ii, Military, Conflict
Clustering file utah.txt Non-empty clusters: 5 _ Clusters: 5
Mar 16 17:39 1993 test.log Emacs buffer Page 20
# Hits Vals Seed, Value:Count
0 2 0 (OTHER), stateS
1 3 0 Utah, University, Institution, Utah:3
2 9 0 City, Davis County, Utah, city:8, Utah:8, mormon:5, state:4, name:
3 3 0 Mormonism, World, Religion, mormonism:3, polygamy:3, Smith:3, morm
4 7 0 Green, River, Utah, utah:6, colorado:5, mi:4, km:4, rivers, yampa Passes: 764, best pass: 515, best score: 0.652, worst score: 0.147
Cluster 0, has 2 hits: '(OTHER), states'
United States, America, Republic
State, U.S. , North Cluster 1, has 3 hits: 'Utah:3'
Bushnell, Nolan Kay, Founder-chairman
Orem, City, Utah County
Utah, University, Institution Cluster 2, has 9 hits: 'city:8, Utah:8, mormon:5, state:4, name:3, lake:3, salt
City, Davis County, Utah
Deseret, State, Name
Logan, City, Seat
Hurray, City, Salt Lake County
Nevada, State, U.S.
Provo, City, Seat
Salt Lake City, City, Capital
Utah, State, U.S.
Utah Lake, Freshwater, Lake Cluster 3, has 3 hits: 'mormonism:3, polygamy:3, smith:3, mormon:3, church , ki
Mormonism, World, Religion
Smith, Joseph, Religious
Brigham, Religious, Leader Cluster 4, has 7 hits: 'Utah:6, Colorado:5, mi:4, km:4, river:2, yampa:2, uteS,
Colorado, State, United States
Colorado, River, North America
Salt Lake, Body, Salt
Green, River, Utah
Hovenweep National Monument, Colorado, Utah
Uinta Mountains, Range, Mountain
Ute, North American Indian, Tribe
Claims
1. A system for case-based organizing and querying of a database, said database having a set of objects, said system comprising means for organizing said database, by examining each object in said database and associating that object with a first set of property values; means responsive to a query, by associating said query with a second set of property values and performing matching on the objects of the database for objects which are similar.
2. A system as in claim 1, wherein said objects comprise text.
3. A system as in claim 1, wherein said first set of property values comprise keywords or other indicators of content.
4. A system as in claim 1, wherein said first set of property values comprise those words which appear more frequently in the document than in the database at large.
5. A system as in claim 1, wherein said first set of property values comprise those words which appear in a predetermined section of text of the object.
6. A system as in claim 1, wherein said first set of property values comprise those words which appear in a title of the object.
7. A system as in claim 1, wherein said matching is case-based matching or other fuzzy associative matching.
8. A system as in claim 1, wherein said query comprises tex .
9. A system as in claim 1, wherein said means responsive to a query associates said query with keywords or other indicators of its content.
10. A system as in claim 1, comprising means for presenting a set of matched objects in response to said query.
11. A system as in claim 1, comprising means responsive to refinement of said query.
12. A system as in claim 1, comprising means responsive to iterative refinement of said query.
13. A system as in claim 12, wherein said means responsive to iterative refinement uses a case-based technique.
14. A system as in claim 1, comprising means for ordering said set of matched objects in response to quality of match.
15. A system as in claim 1, comprising means for organizing said set of matched objects.
16. A system as in claim 15, wherein said means for organizing comprises means for grouping said set of matched objects into a set of clusters.
17. A system as in claim 15, wherein said means for organizing comprises means for grouping said set of matched objects into a set of clusters of objects which have similar properties, which relate to similar content, which have similar likelihood to be of relevance to the query, or which have similar likelihood to be of interest to an operator posing the query.
18. A system as in claim 15, comprising means for generating suggestions for iterative refinement of said query.
19. A system as in claim 18, wherein said means for generating is responsive to a result of organizing matched objects.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU73236/94A AU7323694A (en) | 1993-07-07 | 1994-07-05 | Case-based organizing and querying of a database |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US8830793A | 1993-07-07 | 1993-07-07 | |
US08/088,307 | 1993-07-07 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1995002221A1 true WO1995002221A1 (en) | 1995-01-19 |
Family
ID=22210607
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1994/007569 WO1995002221A1 (en) | 1993-07-07 | 1994-07-05 | Case-based organizing and querying of a database |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU7323694A (en) |
WO (1) | WO1995002221A1 (en) |
Cited By (143)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6263333B1 (en) | 1998-10-22 | 2001-07-17 | International Business Machines Corporation | Method for searching non-tokenized text and tokenized text for matches against a keyword data structure |
US6336029B1 (en) | 1996-12-02 | 2002-01-01 | Chi Fai Ho | Method and system for providing information in response to questions |
US6498921B1 (en) | 1999-09-01 | 2002-12-24 | Chi Fai Ho | Method and system to answer a natural-language question |
US6571240B1 (en) | 2000-02-02 | 2003-05-27 | Chi Fai Ho | Information processing for searching categorizing information in a document based on a categorization hierarchy and extracted phrases |
EP1260919A3 (en) * | 2001-05-22 | 2004-10-20 | ICMS Group n.v. | A method of storing, retrieving and viewing data |
CN1320481C (en) * | 2004-11-22 | 2007-06-06 | 北京北大方正技术研究院有限公司 | Method for conducting title and text logic connection for newspaper pages |
US7702541B2 (en) | 2000-08-01 | 2010-04-20 | Yahoo! Inc. | Targeted e-commerce system |
US7809663B1 (en) | 2006-05-22 | 2010-10-05 | Convergys Cmg Utah, Inc. | System and method for supporting the utilization of machine language |
US7996398B2 (en) | 1998-07-15 | 2011-08-09 | A9.Com, Inc. | Identifying related search terms based on search behaviors of users |
US8379830B1 (en) | 2006-05-22 | 2013-02-19 | Convergys Customer Management Delaware Llc | System and method for automated customer service with contingent live interaction |
US8862252B2 (en) | 2009-01-30 | 2014-10-14 | Apple Inc. | Audio user interface for displayless electronic device |
US8892446B2 (en) | 2010-01-18 | 2014-11-18 | Apple Inc. | Service orchestration for intelligent automated assistant |
US8898568B2 (en) | 2008-09-09 | 2014-11-25 | Apple Inc. | Audio user interface |
US8935167B2 (en) | 2012-09-25 | 2015-01-13 | Apple Inc. | Exemplar-based latent perceptual modeling for automatic speech recognition |
US8977584B2 (en) | 2010-01-25 | 2015-03-10 | Newvaluexchange Global Ai Llp | Apparatuses, methods and systems for a digital conversation management platform |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US9053089B2 (en) | 2007-10-02 | 2015-06-09 | Apple Inc. | Part-of-speech tagging using latent analogy |
US9075783B2 (en) | 2010-09-27 | 2015-07-07 | Apple Inc. | Electronic device with text error correction based on voice recognition data |
US9190062B2 (en) | 2010-02-25 | 2015-11-17 | Apple Inc. | User profiling for voice input processing |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US9311043B2 (en) | 2010-01-13 | 2016-04-12 | Apple Inc. | Adaptive audio feedback system and method |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9361886B2 (en) | 2008-02-22 | 2016-06-07 | Apple Inc. | Providing text input using speech data and non-speech data |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US9412392B2 (en) | 2008-10-02 | 2016-08-09 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9501741B2 (en) | 2005-09-08 | 2016-11-22 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9691383B2 (en) | 2008-09-05 | 2017-06-27 | Apple Inc. | Multi-tiered voice feedback in an electronic device |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US9733821B2 (en) | 2013-03-14 | 2017-08-15 | Apple Inc. | Voice control to diagnose inadvertent activation of accessibility features |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9946706B2 (en) | 2008-06-07 | 2018-04-17 | Apple Inc. | Automatic language identification for dynamic text processing |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US9977779B2 (en) | 2013-03-14 | 2018-05-22 | Apple Inc. | Automatic supplementation of word correction dictionaries |
US10019994B2 (en) | 2012-06-08 | 2018-07-10 | Apple Inc. | Systems and methods for recognizing textual identifiers within a plurality of words |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US10078487B2 (en) | 2013-03-15 | 2018-09-18 | Apple Inc. | Context-sensitive handling of interruptions |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10199051B2 (en) | 2013-02-07 | 2019-02-05 | Apple Inc. | Voice trigger for a digital assistant |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10255566B2 (en) | 2011-06-03 | 2019-04-09 | Apple Inc. | Generating and processing task items that represent tasks to perform |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10515147B2 (en) | 2010-12-22 | 2019-12-24 | Apple Inc. | Using statistical language models for contextual lookup |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US10540976B2 (en) | 2009-06-05 | 2020-01-21 | Apple Inc. | Contextual voice commands |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US10572476B2 (en) | 2013-03-14 | 2020-02-25 | Apple Inc. | Refining a search based on schedule items |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10642574B2 (en) | 2013-03-14 | 2020-05-05 | Apple Inc. | Device, method, and graphical user interface for outputting captions |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10672399B2 (en) | 2011-06-03 | 2020-06-02 | Apple Inc. | Switching between text data and audio data based on a mapping |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10748529B1 (en) | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10791216B2 (en) | 2013-08-06 | 2020-09-29 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
CN112749207A (en) * | 2020-12-29 | 2021-05-04 | 大连海事大学 | Deep sea emergency disposal auxiliary decision making system based on case reasoning |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US11151899B2 (en) | 2013-03-15 | 2021-10-19 | Apple Inc. | User training by intelligent digital assistant |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5062074A (en) * | 1986-12-04 | 1991-10-29 | Tnet, Inc. | Information retrieval system and method |
US5099426A (en) * | 1989-01-19 | 1992-03-24 | International Business Machines Corporation | Method for use of morphological information to cross reference keywords used for information retrieval |
US5201048A (en) * | 1988-12-01 | 1993-04-06 | Axxess Technologies, Inc. | High speed computer system for search and retrieval of data within text and record oriented files |
US5303361A (en) * | 1989-01-18 | 1994-04-12 | Lotus Development Corporation | Search and retrieval system |
-
1994
- 1994-07-05 WO PCT/US1994/007569 patent/WO1995002221A1/en active Application Filing
- 1994-07-05 AU AU73236/94A patent/AU7323694A/en not_active Abandoned
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5062074A (en) * | 1986-12-04 | 1991-10-29 | Tnet, Inc. | Information retrieval system and method |
US5201048A (en) * | 1988-12-01 | 1993-04-06 | Axxess Technologies, Inc. | High speed computer system for search and retrieval of data within text and record oriented files |
US5303361A (en) * | 1989-01-18 | 1994-04-12 | Lotus Development Corporation | Search and retrieval system |
US5099426A (en) * | 1989-01-19 | 1992-03-24 | International Business Machines Corporation | Method for use of morphological information to cross reference keywords used for information retrieval |
Cited By (201)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6336029B1 (en) | 1996-12-02 | 2002-01-01 | Chi Fai Ho | Method and system for providing information in response to questions |
US6480698B2 (en) | 1996-12-02 | 2002-11-12 | Chi Fai Ho | Learning method and system based on questioning |
US6501937B1 (en) | 1996-12-02 | 2002-12-31 | Chi Fai Ho | Learning method and system based on questioning |
US7996398B2 (en) | 1998-07-15 | 2011-08-09 | A9.Com, Inc. | Identifying related search terms based on search behaviors of users |
US6263333B1 (en) | 1998-10-22 | 2001-07-17 | International Business Machines Corporation | Method for searching non-tokenized text and tokenized text for matches against a keyword data structure |
US6498921B1 (en) | 1999-09-01 | 2002-12-24 | Chi Fai Ho | Method and system to answer a natural-language question |
US6571240B1 (en) | 2000-02-02 | 2003-05-27 | Chi Fai Ho | Information processing for searching categorizing information in a document based on a categorization hierarchy and extracted phrases |
US9646614B2 (en) | 2000-03-16 | 2017-05-09 | Apple Inc. | Fast, language-independent method for user authentication by voice |
US7702541B2 (en) | 2000-08-01 | 2010-04-20 | Yahoo! Inc. | Targeted e-commerce system |
EP1260919A3 (en) * | 2001-05-22 | 2004-10-20 | ICMS Group n.v. | A method of storing, retrieving and viewing data |
US7209914B2 (en) | 2001-05-22 | 2007-04-24 | Icms Group N.V. | Method of storing, retrieving and viewing data |
CN1320481C (en) * | 2004-11-22 | 2007-06-06 | 北京北大方正技术研究院有限公司 | Method for conducting title and text logic connection for newspaper pages |
US9501741B2 (en) | 2005-09-08 | 2016-11-22 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US10318871B2 (en) | 2005-09-08 | 2019-06-11 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US8379830B1 (en) | 2006-05-22 | 2013-02-19 | Convergys Customer Management Delaware Llc | System and method for automated customer service with contingent live interaction |
US9549065B1 (en) | 2006-05-22 | 2017-01-17 | Convergys Customer Management Delaware Llc | System and method for automated customer service with contingent live interaction |
US7809663B1 (en) | 2006-05-22 | 2010-10-05 | Convergys Cmg Utah, Inc. | System and method for supporting the utilization of machine language |
US9117447B2 (en) | 2006-09-08 | 2015-08-25 | Apple Inc. | Using event alert text as input to an automated assistant |
US8930191B2 (en) | 2006-09-08 | 2015-01-06 | Apple Inc. | Paraphrasing of user requests and results by automated digital assistant |
US8942986B2 (en) | 2006-09-08 | 2015-01-27 | Apple Inc. | Determining user intent based on ontologies of domains |
US10568032B2 (en) | 2007-04-03 | 2020-02-18 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US9053089B2 (en) | 2007-10-02 | 2015-06-09 | Apple Inc. | Part-of-speech tagging using latent analogy |
US10381016B2 (en) | 2008-01-03 | 2019-08-13 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US9361886B2 (en) | 2008-02-22 | 2016-06-07 | Apple Inc. | Providing text input using speech data and non-speech data |
US9865248B2 (en) | 2008-04-05 | 2018-01-09 | Apple Inc. | Intelligent text-to-speech conversion |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US9626955B2 (en) | 2008-04-05 | 2017-04-18 | Apple Inc. | Intelligent text-to-speech conversion |
US9946706B2 (en) | 2008-06-07 | 2018-04-17 | Apple Inc. | Automatic language identification for dynamic text processing |
US9535906B2 (en) | 2008-07-31 | 2017-01-03 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US10108612B2 (en) | 2008-07-31 | 2018-10-23 | Apple Inc. | Mobile device having human language translation capability with positional feedback |
US9691383B2 (en) | 2008-09-05 | 2017-06-27 | Apple Inc. | Multi-tiered voice feedback in an electronic device |
US8898568B2 (en) | 2008-09-09 | 2014-11-25 | Apple Inc. | Audio user interface |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US9412392B2 (en) | 2008-10-02 | 2016-08-09 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11900936B2 (en) | 2008-10-02 | 2024-02-13 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US10643611B2 (en) | 2008-10-02 | 2020-05-05 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US8862252B2 (en) | 2009-01-30 | 2014-10-14 | Apple Inc. | Audio user interface for displayless electronic device |
US10540976B2 (en) | 2009-06-05 | 2020-01-21 | Apple Inc. | Contextual voice commands |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US10475446B2 (en) | 2009-06-05 | 2019-11-12 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US11080012B2 (en) | 2009-06-05 | 2021-08-03 | Apple Inc. | Interface for a virtual digital assistant |
US10795541B2 (en) | 2009-06-05 | 2020-10-06 | Apple Inc. | Intelligent organization of tasks items |
US10283110B2 (en) | 2009-07-02 | 2019-05-07 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US9311043B2 (en) | 2010-01-13 | 2016-04-12 | Apple Inc. | Adaptive audio feedback system and method |
US10706841B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Task flow identification based on user intent |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US9548050B2 (en) | 2010-01-18 | 2017-01-17 | Apple Inc. | Intelligent automated assistant |
US8892446B2 (en) | 2010-01-18 | 2014-11-18 | Apple Inc. | Service orchestration for intelligent automated assistant |
US8903716B2 (en) | 2010-01-18 | 2014-12-02 | Apple Inc. | Personalized vocabulary for digital assistant |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US9424862B2 (en) | 2010-01-25 | 2016-08-23 | Newvaluexchange Ltd | Apparatuses, methods and systems for a digital conversation management platform |
US8977584B2 (en) | 2010-01-25 | 2015-03-10 | Newvaluexchange Global Ai Llp | Apparatuses, methods and systems for a digital conversation management platform |
US9424861B2 (en) | 2010-01-25 | 2016-08-23 | Newvaluexchange Ltd | Apparatuses, methods and systems for a digital conversation management platform |
US9431028B2 (en) | 2010-01-25 | 2016-08-30 | Newvaluexchange Ltd | Apparatuses, methods and systems for a digital conversation management platform |
US9190062B2 (en) | 2010-02-25 | 2015-11-17 | Apple Inc. | User profiling for voice input processing |
US9633660B2 (en) | 2010-02-25 | 2017-04-25 | Apple Inc. | User profiling for voice input processing |
US9075783B2 (en) | 2010-09-27 | 2015-07-07 | Apple Inc. | Electronic device with text error correction based on voice recognition data |
US10515147B2 (en) | 2010-12-22 | 2019-12-24 | Apple Inc. | Using statistical language models for contextual lookup |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10102359B2 (en) | 2011-03-21 | 2018-10-16 | Apple Inc. | Device access using voice authentication |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US10255566B2 (en) | 2011-06-03 | 2019-04-09 | Apple Inc. | Generating and processing task items that represent tasks to perform |
US10672399B2 (en) | 2011-06-03 | 2020-06-02 | Apple Inc. | Switching between text data and audio data based on a mapping |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US9798393B2 (en) | 2011-08-29 | 2017-10-24 | Apple Inc. | Text correction processing |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US9953088B2 (en) | 2012-05-14 | 2018-04-24 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US10079014B2 (en) | 2012-06-08 | 2018-09-18 | Apple Inc. | Name recognition system |
US10019994B2 (en) | 2012-06-08 | 2018-07-10 | Apple Inc. | Systems and methods for recognizing textual identifiers within a plurality of words |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
US9971774B2 (en) | 2012-09-19 | 2018-05-15 | Apple Inc. | Voice-based media searching |
US8935167B2 (en) | 2012-09-25 | 2015-01-13 | Apple Inc. | Exemplar-based latent perceptual modeling for automatic speech recognition |
US10199051B2 (en) | 2013-02-07 | 2019-02-05 | Apple Inc. | Voice trigger for a digital assistant |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US10572476B2 (en) | 2013-03-14 | 2020-02-25 | Apple Inc. | Refining a search based on schedule items |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US9733821B2 (en) | 2013-03-14 | 2017-08-15 | Apple Inc. | Voice control to diagnose inadvertent activation of accessibility features |
US10642574B2 (en) | 2013-03-14 | 2020-05-05 | Apple Inc. | Device, method, and graphical user interface for outputting captions |
US9977779B2 (en) | 2013-03-14 | 2018-05-22 | Apple Inc. | Automatic supplementation of word correction dictionaries |
US10748529B1 (en) | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US9697822B1 (en) | 2013-03-15 | 2017-07-04 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US11151899B2 (en) | 2013-03-15 | 2021-10-19 | Apple Inc. | User training by intelligent digital assistant |
US10078487B2 (en) | 2013-03-15 | 2018-09-18 | Apple Inc. | Context-sensitive handling of interruptions |
US9922642B2 (en) | 2013-03-15 | 2018-03-20 | Apple Inc. | Training an at least partial voice command system |
US9966060B2 (en) | 2013-06-07 | 2018-05-08 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9620104B2 (en) | 2013-06-07 | 2017-04-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
US9633674B2 (en) | 2013-06-07 | 2017-04-25 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
US10657961B2 (en) | 2013-06-08 | 2020-05-19 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US9966068B2 (en) | 2013-06-08 | 2018-05-08 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10185542B2 (en) | 2013-06-09 | 2019-01-22 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
US9300784B2 (en) | 2013-06-13 | 2016-03-29 | Apple Inc. | System and method for emergency calls initiated by voice command |
US10791216B2 (en) | 2013-08-06 | 2020-09-29 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US10497365B2 (en) | 2014-05-30 | 2019-12-03 | Apple Inc. | Multi-command single utterance input method |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US10169329B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Exemplar-based natural language processing |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US10083690B2 (en) | 2014-05-30 | 2018-09-25 | Apple Inc. | Better resolution when referencing to concepts |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10904611B2 (en) | 2014-06-30 | 2021-01-26 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9668024B2 (en) | 2014-06-30 | 2017-05-30 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10431204B2 (en) | 2014-09-11 | 2019-10-01 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US9986419B2 (en) | 2014-09-30 | 2018-05-29 | Apple Inc. | Social reminders |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US11556230B2 (en) | 2014-12-02 | 2023-01-17 | Apple Inc. | Data detection |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US10311871B2 (en) | 2015-03-08 | 2019-06-04 | Apple Inc. | Competing devices responding to voice triggers |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
US11069347B2 (en) | 2016-06-08 | 2021-07-20 | Apple Inc. | Intelligent automated assistant for media exploration |
US10354011B2 (en) | 2016-06-09 | 2019-07-16 | Apple Inc. | Intelligent automated assistant in a home environment |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10733993B2 (en) | 2016-06-10 | 2020-08-04 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US10089072B2 (en) | 2016-06-11 | 2018-10-02 | Apple Inc. | Intelligent device arbitration and control |
US10521466B2 (en) | 2016-06-11 | 2019-12-31 | Apple Inc. | Data driven natural language event detection and classification |
US10269345B2 (en) | 2016-06-11 | 2019-04-23 | Apple Inc. | Intelligent task discovery |
US10297253B2 (en) | 2016-06-11 | 2019-05-21 | Apple Inc. | Application integration with a digital assistant |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US10791176B2 (en) | 2017-05-12 | 2020-09-29 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US10810274B2 (en) | 2017-05-15 | 2020-10-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
CN112749207B (en) * | 2020-12-29 | 2023-06-02 | 大连海事大学 | Case reasoning-based deep sea emergency treatment auxiliary decision-making system |
CN112749207A (en) * | 2020-12-29 | 2021-05-04 | 大连海事大学 | Deep sea emergency disposal auxiliary decision making system based on case reasoning |
Also Published As
Publication number | Publication date |
---|---|
AU7323694A (en) | 1995-02-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO1995002221A1 (en) | Case-based organizing and querying of a database | |
Green | The Greek & Latin Roots of English | |
Jackson et al. | Words, meaning and vocabulary: An introduction to modern English lexicology | |
Louth et al. | Genesis 1-11 | |
Nakov | On the interpretation of noun compounds: Syntax, semantics, and entailment | |
Corbett | Gender | |
Tuzzi et al. | What is Elena Ferrante? A comparative analysis of a secretive bestselling Italian writer | |
US6487545B1 (en) | Methods and apparatus for classifying terminology utilizing a knowledge catalog | |
Treffers-Daller | Grammatical collocations and verb-particle constructions in Brussels French: A corpus-linguistic approach to transfer | |
Broughton | Essential Library of Congress subject headings | |
de Medeiros | Lusophony or the haunted logic of postempire | |
Jenstad et al. | Shakespeare's language in digital media: old words, new tools | |
Mansour et al. | Investigating explicitation in literary translation from English into Arabic | |
Benor et al. | A Research Agenda for Comparative Jewish Linguistic Studies | |
Verheul et al. | Using word vector models to trace conceptual change over time and space in historical newspapers, 1840–1914 | |
Brewer | The use of literary quotations in the Oxford English Dictionary | |
Menon | What’s in a name? William Jones,‘philological empiricism’and botanical knowledge making in eighteenth-century India | |
Jukes | A grammar of Makasar: A language of South Sulawesi, Indonesia | |
Moon | Words, frequencies, and texts (particularly Conrad): A stratified approach | |
Häberl | Hebraisms in Mandaic | |
Li et al. | Sacrifice to the wind gods in late Shang China–religious, paleographic, linguistic and philological analyses: An integrated approach | |
Boye et al. | A Stylistic Reading of Representation of Illegal Migration Tendencies among Gambians in Baaba Sillah’s Péñcum Taakusaan and Juka Fatou Jabang’s The Phoenix | |
Backfish | Writing the right words from left to right: Septuagint translation of wordplay in the Fourth Book of the Psalter | |
Khabtagaeva | The Sartul Buryat Dialect: A Preliminary Analysis | |
Maingon | Stesichorus and the epic tradition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AM AT AU BB BG BR BY CA CH CN CZ DE DK ES FI GB GE HU JP KE KG KP KR KZ LK LT LU LV MD MG MN MW NL NO NZ PL PT RO RU SD SE SI SK TJ TT UA US UZ VN |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): KE MW SD AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: CA |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |