WO2002086757A1 - Conversion between data representation formats - Google Patents

Conversion between data representation formats Download PDF

Info

Publication number
WO2002086757A1
WO2002086757A1 PCT/IB2001/000731 IB0100731W WO02086757A1 WO 2002086757 A1 WO2002086757 A1 WO 2002086757A1 IB 0100731 W IB0100731 W IB 0100731W WO 02086757 A1 WO02086757 A1 WO 02086757A1
Authority
WO
WIPO (PCT)
Prior art keywords
words
converting
phrases
text
logic
Prior art date
Application number
PCT/IB2001/000731
Other languages
French (fr)
Inventor
Erland Lewin
Original Assignee
Voxi Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Voxi Ab filed Critical Voxi Ab
Priority to PCT/IB2001/000731 priority Critical patent/WO2002086757A1/en
Publication of WO2002086757A1 publication Critical patent/WO2002086757A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation

Definitions

  • the present invention pertains to a method and an arrangement for conversion between data representation formats, said data comprising sound or text information. It is specifically providing a method and arrangement means for word and sound processing.
  • JSGF JavaScript Framework
  • the tags normally have some semantic meaning, regardless of how that semantic meaning was expressed. Words or phrases without semantic interest to the application (politeness phrases, articles, etc). Such an application then has a very simple parsing of the tags in order to act on the speech input. This requires manual adaptation of the application and grammar so that they work together, and cannot be said to 'understand' the spoken utterances.
  • the present invention sets forth a method and an arrangement for word and sound processing. It solves the problem of simple natural language understanding, allowing users to interact with (for instance by giving commands and asking questions) machines using natural language, for instance in spoken or written form. Additionally, the method provides language independence by transforming linguistic utterances by the user to a semantic form which is independent of the original language used. This form may later be converted into another human language, therefore having the effect of simplistic translation. Furthermore, the present invention can automate the process of adapting domain specific natural language understanding to an application of the same.
  • the present invention does not attempt to solve the highly complex problem of general natural language understanding, but rather the understanding of a limited subset of natural language - utterances which can be straight forwardly mapped to the domain of one or several data models or computer applications.
  • the present invention provides a method for conversion between data representation formats, said data comprising sound or text information.
  • the data representation formats are text, sound, words, phrases, and logic. Those are combined as conversions between text or sound to words or vice versa, words to phrases or vice versa, and phrases to logic or vice versa.
  • a string of characters in the text format a digital representation of an acoustic waveform in the sound format; a reference to a data structure containing information about a word in the word format; a tree-like representation of the grammatical structure of a phrase in said phrase format, whereby the leaf nodes of the tree-like representation are references to meanings of constituent words, and conjugation information; references to functions, objects and attributes in an underlying data model in the logic format.
  • the method further comprises: converting text to words by using characters which separate words; converting words to text by concatenating the spelling of the constituent words; converting sound to words by providing a continues speech recognition system; converting words to sound by providing a speech synthesis system; converting words to phrases by parsing; converting phrases to words by traversing said tree-like representation preferably from left to right and convert each leaf node to a word; converting phrases to logic by resolving or binding verb phrases to functions and noun phrases to objects in said underlying data model; converting logic to phrases by using knowledge of the grammar of the used language to create a phrase expressing the same semantics as the original logic form; and thus providing a computer word processing and sound processing means.
  • the present invention further sets forth a word and sound processor arrangement that converts between data representation formats, said data comprising sound or text information.
  • the data representation formats are text, sound, words, phrases, and logic combined as conversions between text or sound to words or vice versa, words to phrases or vice versa, and phrases to logic or vice versa.
  • a string of characters in the text format a digital representation of an acoustic waveform in the sound format; a reference to a data structure containing information about a word in the word format; a tree-like representation of the grammatical structure of a phrase in said phrase format, whereby the leaf nodes of the tree-like representation are references to meanings of constituent words, and conjugation information; references to functions, objects and attributes in an underlying data model in the logic format.
  • the arrangement further comprises: converting means from text to words, which uses characters which separate words; converting means from words to text, which concatenates the spelling of the constituent words; converting means for sound to words, providing a continuos speech recognition system; converting means for words to sound, providing a speech synthesis system; converting means for words to phrases, which uses parsing; converting means for phrases to words, which traverses said tree-like representation preferably from left to right and converts each leaf node to a word; converting means for phrases to logic, which resolves or binds verb phrases to functions and noun phrases to objects in said underlying data model; and converting means for logic to phrases, which uses knowledge of the grammar of the used language to create a phrase expressing the same semantics as the original logic form.
  • Fig. 1 illustrates possible conversions between representations according to the present invention
  • Fig. 2 illustrates one embodiment of a tree-like structure used in the present invention.
  • the present invention relates to how information can be converted between data representation formats used in this invention.
  • One aspect of the invention is to provide that the conversions can be done using per se known techniques, the uniqueness lies in the way the present invention is used and how to combine representations, and the conversions between them.
  • Fig. 1 illustrates possible conversions 10 between representations 12 to 20 according to the present invention, whereby the arrows represent said possible conversions.
  • the format text 12 depicts a string of characters, such as "print the file”
  • sound 14 depicts a digital representation of an acoustic waveform.
  • a digital recording of a person or a speech synthesizer saying "print the file” would be such a representation.
  • the format words 16 depicts a reference to a data structure containing information about a word, such as its spelling, pronunciation and meanings.
  • the example phrase is a list of references to data structures representing the words 'print', 'the' and 'file'. No indication exists of which meaning of each word is intended, for instance, file has at least two meanings, both as the verb 'to file', and the noun 'a file'.
  • the format phrases 18 is meant a tree-like 30 representation, Fig. 2, of the grammatical structure of a phrase.
  • the leaves or leaf nodes of such a tree-like 30 representation are references to meanings of the constituent words, and conjugation information such as verb tense, noun case etc.
  • it is not decidable exactly which word is used to express each meaning, because any one of several synonyms may be chosen to verbalize a common noun, for instance.
  • Fig. 2 illustrates a phrase representation 18 of the example phrase, "print the file”. Supposing that 'document' is a synonym of 'file' in a given domain, then either word can be used in realizing the phrase.
  • a logic format 20 representation is similar to predicate logic, a well known basis for the proof of mathematical theorems.
  • all references are had to functions, objects and attributes in an underlying data model. This means that all abstract references, such as 'the file', must be resolved to determine which individual file is referred to.
  • the logic form 20 is language independent, as opposed to the other representations, which are specific to a certain human or spoken language.
  • the example phrase could in some context be represented as: print( Vetc/passwd') in a computer language. This is a function call to a function called print with the filename 7etc/passwd' as its argument. This assumes that it is resolved that 'the file' refers to a file with the name ⁇ /etc/passwd", and that there is a function called 'print' which corresponds to the verb 'print' as it is used in natural language.
  • Converting text 12 to words 16 can be accomplished by first using characters which separate words, such as spaces and punctuation, to extract the spelling of each word. In the example, "print the file” would be converted to "print", "the", and "file”. These strings can then be used to look up the representations of the words in a dictionary, lexicon etc, through well known searching and sorting algorithms in the art. Such a conversion is deterministic, meaning that the same text will always yield the same list of words.
  • Converting words 16 to text 12 consists of concatenating the spelling of the constituent words.
  • words may have several spellings. Examples of this are American and British spelling, color vs. colour.
  • converting words to text may use different methods to choose which spelling to use, varying in complexity from choosing one at random, to more advanced methods.
  • Converting sounds 14 to words 16 is what is provided through continuous speech recognition systems.
  • Such systems are available in both commercial and free implementations, including IBM's Viavoice ® , Dragon Dictate ® from Dragon Systems, and the open source ISEP Artificial Speech Recognition system from the University of Mississippi, and are often based on well known techniques such as Hidden Markov Models (HMMs).
  • HMMs Hidden Markov Models
  • Speech synthesis systems perform conversions from word 16 to sound 14. Such systems can vary in complexity from simply concatenating stored recordings of each word spoken by a person, to advanced synthesis taking prosody and emotions into account. Speech synthesis systems have been available for many years, and include the freely available Festival Speech Synthesis System from the University of Edinburgh.
  • Converting words 16 to phrases 18 is known as parsing. Parsing is a classical task in computer science and linguistics. A grammar is used to specify possible word sequences, and a parser matches words to possible sequences using the grammar. Grammars can take many forms. Deterministic grammars, using the common Backus Naur Form (BNF) notation, probabilistic n-gram grammars and Head-Driven Phrase Structure Grammars are some examples.
  • BNF Backus Naur Form
  • Converting a phrase 18 structure to words 16 consists of traversing the tree 30 from left to right, in this embodiment, and converting each leaf node to a word.
  • look-up tables together with information about conjugation of verbs, noun cases etc can be used to select words which can represent a given meaning.
  • a primary issue in going from a phrase 18 form to a logic form 20 involves to resolve/bind verb phrases to functions and noun phrases to objects in the underlying data model. Binding verb phrases to functions can be accomplished by using a look-up table relating each verb to one or possibly several functions implementing the functionality of that verb. Different interpretations of the same verb can be disambiguated by the number and type of phrase arguments that it takes. The exact mechanisms for such resolution may be realized in different ways for a person skilled in the art.
  • Resolving noun phrases to objects may require tracking recently mentioned objects, i.e., to know which file 'the file' refers to in the example phrase. The exact mechanism for doing this is known for persons skilled in the art.
  • phrase form 18 uses and requires knowledge of the grammar of the language, word order etc, to create a phrase expressing the same semantics as the original logic form.
  • tracking recently mentioned objects can help in good logic-to-phrase conversion, this can for instance recognize when the pronoun 'it' can be used instead of generating a long description of the object.
  • the arrangement comprises the following means: converting means from text to words, which uses characters which separate words; converting means from words to text, which concatenates the spelling of the constituent words; converting means for sound to words, providing a continuos speech recognition system; converting means for words to sound, providing a speech synthesis system; converting means for words to phrases, which uses parsing; converting means for phrases to words, which traverses said tree-like representation and converts each leaf node to a word; converting means for phrases to logic, which resolves or binds verb phrases to functions and noun phrases to objects in said underlying data model; and converting means for logic to phrases, which uses knowledge of the grammar of the used language to create a phrase expressing the same semantics as the original logic form.
  • Means mentioned in the present description can be software means or a combination of hardware and software means.
  • the hardware being computer means and peripheral computer means.

Abstract

The invention relates to a method for word and sound processing in a means for such, a function and a sound and word processor arrangement. This is accomplished through conversion between data representation formats. The data comprising sound or text information. Data representation formats are text, sound, phrases, and logic formats combined as conversions between text or sound to words or vice versa, words to phrases or vice versa, and phrases to logic or vice versa.

Description

Conversion between data representation formats Technical Field
The present invention pertains to a method and an arrangement for conversion between data representation formats, said data comprising sound or text information. It is specifically providing a method and arrangement means for word and sound processing.
Background art Natural language understanding has been the topic of research since the first days of Artificial Intelligence. The present invention is primarily intended for understanding spontaneous utterances, in written or spoken form, within a limited domain. One current approach to this problem is to model a dialog flow for each operation that can be performed in a specific system, dividing each dialog into modes. For each mode, valid inputs and their consequences are listed. For example, the Philips SpeechMania® 99 product has been demonstrated with a pizza ordering application, where a user goes through dialog modes involving for instance selecting pizza toppings. A disadvantage of this type of technology is that the system will only understand the utterances expected in a given mode. If a user changes his drink order while he is expected to select pizza toppings, the system may fail to understand this. The degree to which the system 'understands' the utterances in this kind of iπteration is limited; each mode and the utterances valid therein must be anticipated by the developers, and directly related to the action the system takes as a response to a user input. Other speech recognition systems, such as those using the Java™ Speech Grammar
Format (JSGF), provide tags attached to an (often handwritten) grammar. The tags normally have some semantic meaning, regardless of how that semantic meaning was expressed. Words or phrases without semantic interest to the application (politeness phrases, articles, etc). Such an application then has a very simple parsing of the tags in order to act on the speech input. This requires manual adaptation of the application and grammar so that they work together, and cannot be said to 'understand' the spoken utterances.
More advanced approaches to natural language understanding make use of formalisms developed within the field of linguistics and computational linguistics. One currently popular formalism is Head-Driven Phrase Structure Grammars, which associates groups of lexical features with words, resulting in a grammar structure which can be used for parsing general natural language. Many of these linguistic formalisms could be used to perform some of the steps described in the present invention, but require much work to be integrated into a complete language understanding interface to an application, and also substantial adaptation to new domains. Some speech recognition systems use word spotting. This entails listening for certain key words and ignoring the rest of the spoken utterance. This may simplify the parsing component of a system, but does not allow the system any possibility to understand the details of the user's utterance. No commercial applications use the same grammar for both natural language generation and natural language understanding. Most current applications either understand a very simple subset of natural language, or require substantial manpower to adapt the natural language understanding system to a given application.
Summary of the disclosed invention The present invention sets forth a method and an arrangement for word and sound processing. It solves the problem of simple natural language understanding, allowing users to interact with (for instance by giving commands and asking questions) machines using natural language, for instance in spoken or written form. Additionally, the method provides language independence by transforming linguistic utterances by the user to a semantic form which is independent of the original language used. This form may later be converted into another human language, therefore having the effect of simplistic translation. Furthermore, the present invention can automate the process of adapting domain specific natural language understanding to an application of the same.
The present invention does not attempt to solve the highly complex problem of general natural language understanding, but rather the understanding of a limited subset of natural language - utterances which can be straight forwardly mapped to the domain of one or several data models or computer applications.
To achieve aims and objectives the present invention provides a method for conversion between data representation formats, said data comprising sound or text information. The data representation formats are text, sound, words, phrases, and logic. Those are combined as conversions between text or sound to words or vice versa, words to phrases or vice versa, and phrases to logic or vice versa. Thereby comprising the following information in said formats: a string of characters in the text format; a digital representation of an acoustic waveform in the sound format; a reference to a data structure containing information about a word in the word format; a tree-like representation of the grammatical structure of a phrase in said phrase format, whereby the leaf nodes of the tree-like representation are references to meanings of constituent words, and conjugation information; references to functions, objects and attributes in an underlying data model in the logic format.
The method further comprises: converting text to words by using characters which separate words; converting words to text by concatenating the spelling of the constituent words; converting sound to words by providing a continues speech recognition system; converting words to sound by providing a speech synthesis system; converting words to phrases by parsing; converting phrases to words by traversing said tree-like representation preferably from left to right and convert each leaf node to a word; converting phrases to logic by resolving or binding verb phrases to functions and noun phrases to objects in said underlying data model; converting logic to phrases by using knowledge of the grammar of the used language to create a phrase expressing the same semantics as the original logic form; and thus providing a computer word processing and sound processing means.
The present invention further sets forth a word and sound processor arrangement that converts between data representation formats, said data comprising sound or text information. The data representation formats are text, sound, words, phrases, and logic combined as conversions between text or sound to words or vice versa, words to phrases or vice versa, and phrases to logic or vice versa. Thereby comprising the following information in said formats: a string of characters in the text format; a digital representation of an acoustic waveform in the sound format; a reference to a data structure containing information about a word in the word format; a tree-like representation of the grammatical structure of a phrase in said phrase format, whereby the leaf nodes of the tree-like representation are references to meanings of constituent words, and conjugation information; references to functions, objects and attributes in an underlying data model in the logic format.
The arrangement further comprises: converting means from text to words, which uses characters which separate words; converting means from words to text, which concatenates the spelling of the constituent words; converting means for sound to words, providing a continuos speech recognition system; converting means for words to sound, providing a speech synthesis system; converting means for words to phrases, which uses parsing; converting means for phrases to words, which traverses said tree-like representation preferably from left to right and converts each leaf node to a word; converting means for phrases to logic, which resolves or binds verb phrases to functions and noun phrases to objects in said underlying data model; and converting means for logic to phrases, which uses knowledge of the grammar of the used language to create a phrase expressing the same semantics as the original logic form.
Further embodiments of the present invention are set out through the attached dependent claims. Also, said arrangement is able to provide the embodiments relating to said method.
Brief description of the drawings Henceforth reference is had to the attached figures for a better understanding of the present invention and its examples and embodiments, wherein:
Fig. 1 illustrates possible conversions between representations according to the present invention; and
Fig. 2 illustrates one embodiment of a tree-like structure used in the present invention.
Detailed description of preferred embodiments
The present invention relates to how information can be converted between data representation formats used in this invention. One aspect of the invention is to provide that the conversions can be done using per se known techniques, the uniqueness lies in the way the present invention is used and how to combine representations, and the conversions between them.
Fig. 1 illustrates possible conversions 10 between representations 12 to 20 according to the present invention, whereby the arrows represent said possible conversions.
Provided below is an explanation of the representations converted between. As an example, the phrase "print the file" will be used. The format text 12 depicts a string of characters, such as "print the file", and sound 14 depicts a digital representation of an acoustic waveform. A digital recording of a person or a speech synthesizer saying "print the file" would be such a representation.
In defining the format words 16 it depicts a reference to a data structure containing information about a word, such as its spelling, pronunciation and meanings. In the "words" representation, the example phrase is a list of references to data structures representing the words 'print', 'the' and 'file'. No indication exists of which meaning of each word is intended, for instance, file has at least two meanings, both as the verb 'to file', and the noun 'a file'. With the format phrases 18 is meant a tree-like 30 representation, Fig. 2, of the grammatical structure of a phrase. The leaves or leaf nodes of such a tree-like 30 representation are references to meanings of the constituent words, and conjugation information such as verb tense, noun case etc. Using this representation, it is not decidable exactly which word is used to express each meaning, because any one of several synonyms may be chosen to verbalize a common noun, for instance.
Fig. 2 illustrates a phrase representation 18 of the example phrase, "print the file". Supposing that 'document' is a synonym of 'file' in a given domain, then either word can be used in realizing the phrase.
A logic format 20 representation is similar to predicate logic, a well known basis for the proof of mathematical theorems. In the logic representation 20, all references are had to functions, objects and attributes in an underlying data model. This means that all abstract references, such as 'the file', must be resolved to determine which individual file is referred to. The logic form 20 is language independent, as opposed to the other representations, which are specific to a certain human or spoken language. The example phrase could in some context be represented as: print( Vetc/passwd') in a computer language. This is a function call to a function called print with the filename 7etc/passwd' as its argument. This assumes that it is resolved that 'the file' refers to a file with the name α/etc/passwd", and that there is a function called 'print' which corresponds to the verb 'print' as it is used in natural language.
With an utterance in a logic form 20 representation, it is possible to execute it (by calling the functions with their arguments), evaluate the truth of a statement, or enumerate free variables to answer questions. Performing these operations (execution, evaluation and enumeration) can be done by standard techniques implemented by persons skilled in the art. For instance, answering the utterance "Is Joe happy?" might be performed by generating the logic statement 'isHaρpy( joe )', and evaluating it by calling the isHappy function with the argument joe, and letting; the value of the statement be the boolean result of this function call. Another possible logic representation might be 'mood(joe) = happy'. To answer the question 'Who is happy?', the logic representation 'isHappy( X )' might be used (where X is an unbound variable). Answering the question would be equivalent to finding for which objects the function isHappy has the value true. Another possible logic representation 20 would be 'isHappy( X ) AND isPerson( X )' (Interpreting the word 'who' to limit the query to persons). Converting text 12 to words 16 can be accomplished by first using characters which separate words, such as spaces and punctuation, to extract the spelling of each word. In the example, "print the file" would be converted to "print", "the", and "file". These strings can then be used to look up the representations of the words in a dictionary, lexicon etc, through well known searching and sorting algorithms in the art. Such a conversion is deterministic, meaning that the same text will always yield the same list of words.
Converting words 16 to text 12 consists of concatenating the spelling of the constituent words. In some implementations, words may have several spellings. Examples of this are American and British spelling, color vs. colour. In this case converting words to text may use different methods to choose which spelling to use, varying in complexity from choosing one at random, to more advanced methods.
Converting sounds 14 to words 16 is what is provided through continuous speech recognition systems. Such systems are available in both commercial and free implementations, including IBM's Viavoice®, Dragon Dictate® from Dragon Systems, and the open source ISEP Artificial Speech Recognition system from the University of Mississippi, and are often based on well known techniques such as Hidden Markov Models (HMMs).
Speech synthesis systems perform conversions from word 16 to sound 14. Such systems can vary in complexity from simply concatenating stored recordings of each word spoken by a person, to advanced synthesis taking prosody and emotions into account. Speech synthesis systems have been available for many years, and include the freely available Festival Speech Synthesis System from the University of Edinburgh.
Converting words 16 to phrases 18 is known as parsing. Parsing is a classical task in computer science and linguistics. A grammar is used to specify possible word sequences, and a parser matches words to possible sequences using the grammar. Grammars can take many forms. Deterministic grammars, using the common Backus Naur Form (BNF) notation, probabilistic n-gram grammars and Head-Driven Phrase Structure Grammars are some examples.
Converting a phrase 18 structure to words 16 consists of traversing the tree 30 from left to right, in this embodiment, and converting each leaf node to a word. To convert a node to a word, look-up tables together with information about conjugation of verbs, noun cases etc can be used to select words which can represent a given meaning.
A primary issue in going from a phrase 18 form to a logic form 20 involves to resolve/bind verb phrases to functions and noun phrases to objects in the underlying data model. Binding verb phrases to functions can be accomplished by using a look-up table relating each verb to one or possibly several functions implementing the functionality of that verb. Different interpretations of the same verb can be disambiguated by the number and type of phrase arguments that it takes. The exact mechanisms for such resolution may be realized in different ways for a person skilled in the art.
Resolving noun phrases to objects may require tracking recently mentioned objects, i.e., to know which file 'the file' refers to in the example phrase. The exact mechanism for doing this is known for persons skilled in the art.
Going from logic 20 to phrase form 18 uses and requires knowledge of the grammar of the language, word order etc, to create a phrase expressing the same semantics as the original logic form. Similarly to the resolution of noun phrases, tracking recently mentioned objects can help in good logic-to-phrase conversion, this can for instance recognize when the pronoun 'it' can be used instead of generating a long description of the object.
In order to make use of the present invention it sets forth an arrangement, whereby the arrangement comprises the following means: converting means from text to words, which uses characters which separate words; converting means from words to text, which concatenates the spelling of the constituent words; converting means for sound to words, providing a continuos speech recognition system; converting means for words to sound, providing a speech synthesis system; converting means for words to phrases, which uses parsing; converting means for phrases to words, which traverses said tree-like representation and converts each leaf node to a word; converting means for phrases to logic, which resolves or binds verb phrases to functions and noun phrases to objects in said underlying data model; and converting means for logic to phrases, which uses knowledge of the grammar of the used language to create a phrase expressing the same semantics as the original logic form.
Means mentioned in the present description can be software means or a combination of hardware and software means. The hardware being computer means and peripheral computer means.
The present mvention has being described with non limiting examples and embodiments. It is the attached set of claims that describe all possible embodiments for a person skilled in the art.

Claims

Claims
1. A method for conversion between data representation formats, said data comprising sound or text information, wherein said data representation formats are text, sound, words, phrases, and logic combined as conversions between text or sound to words or vice versa, words to phrases or vice versa, and phrases to logic or vice versa, comprising the following information in said formats: a string of characters in the text format; a digital representation of an acoustic waveform in the sound format; a reference to a data structure containing information about a word in the word format; a tree-like representation of the grammatical structure of a phrase in said phrase format, whereby the leaf nodes of the tree-like representation are references to meanings of constituent words, and conjugation information; references to functions, objects and attributes in an underlying data model in the logic format; and the method further comprising: converting text to words by using characters which separate words; converting words to text by concatenating the spelling of the constituent words; converting sound to words by providing a continuos speech recognition system; converting words to sound by providing a speech synthesis system; converting words to phrases by parsing; converting phrases to words by traversing said tree-like representation and convert each leaf node to a word; converting phrases to logic by resolving or binding verb phrases to functions and noun phrases to objects in said underlying data model; converting logic to phrases by using knowledge of the grammar of the used language to create a phrase expressing the same semantics as the original logic form; and thus providing a computer word processing and sound processing means.
2. A Method according to claim 1, wherein it provides language independence by trarisforaήng linguistic utterances by a user to a semantic.
3. A method according to claim 1, wherein said form is converted into another human language, thus having the effect of simplistic translation.
4. A method according to claim 2, wherein said form is converted into another human language, thus having the effect of simplistic translation.
5. A method according to claim 1, wherein converting text to words is accomplished by first using characters which separate words, such as spaces and punctuation, to extract the spelling of each word, such strings are then used to look up the representations of the words in a dictionary, lexicon etc, through searching and sorting processes.
6. A method according to claim 2, wherein converting text to words is accomplished by first using characters which separate words, such as spaces and punctuation, to extract the spelling of each word, such strings are then used to look up the representations of the words in a dictionary, lexicon etc, through searching and sorting processes.
7. A method according to claim 3, wherein converting text to words is accomplished by first using characters which separate words, such as spaces and punctuation, to extract the spelling of each word, such strings are then used to look up the representations of the words in a dictionary, lexicon etc, through searching and sorting processes.
8. A method according to claim 4, wherein converting text to words is accomplished by first using characters which separate words, such as spaces and punctuation, to extract the spelling of each word, such strings are then used to look up the representations of the words in a dictionary, lexicon etc, through searching and sorting processes.
9. A method according to claim 1, wherein converting words to text consists of concatenating the spelling of the constituent words.
10. A method according to claim 2, wherein converting words to text consists of concatenating the spelling of the constituent words.
11. A method according to claim 3, wherein converting words to text consists of concatenating the spelling of the constituent words.
12. A method according to claim 4, wherein converting words to text consists of concatenating the spelling of the constituent words.
13. A method according to claim 5, wherein converting words to text consists of concatenating the spelling of the constituent words.
14. A method according to claim 6, wherein converting words to text consists of concatenating the spelling of the constituent words.
15. A method according to claim 7, wherein converting words to text consists of concatenating the spelling of the constituent words.
16. A method according to claim 8, wherein converting words to text consists of concatenating the spelling of the constituent words.
17. A method according to claims 1-16, wherein converting a phrase structure to words consists of traversing said treein one direction, and converting each leaf node to a word.
18. A method according to claims 1-16, wherein going from a phrase form to a logic form involves to resolve/bind verb phrases to functions and noun phrases to objects in an underlying data model.
19. A method according to claim 17, wherein going from a phrase form to a logic form involves to resolve/bind verb phrases to functions and noun phrases to objects in an underlying data model.
20. A method according to claim 18, wherein resolving noun phrases to objects uses the tracking of recently mentioned objects.
21. A method according to claim 19, wherein resolving noun phrases to objects uses the tracking of recently mentioned objects.
22. A method according to claims 1-16, wherein going from logic to phrase form uses knowledge of the grammar of the language to create a phrase expressing the same semantics as the original logic form.
23. A method according to claims 17, wherein going from logic to phrase form uses knowledge of the grammar of the language to create a phrase expressing the same semantics as the original logic form.
24. A method according to claim 18, wherein going from logic to phrase form uses knowledge of the grammar of the language to create a phrase expressmg the same semantics as the original logic form.
25. A method according to claims 1-21, wherein going from logic to phrase form uses knowledge of the grammar of the language to create a phrase expressing the same semantics as the original logic form.
26. A word and sound processor arrangement that converts between data representation formats, said data comprising sound or text information, wherein said data representation formats are text, sound, words, phrases, and logic combined as conversions between text or sound to words or vice versa, words to phrases or vice versa, and phrases to logic or vice versa, comprising the following information in said formats: a string of characters in the text format; a digital representation of an acoustic waveform in the sound format; a reference to a data structure containing information about a word in the word format; a tree-like representation of the grammatical structure of a phrase in said phrase format, whereby the leaf nodes of the tree-like representation are references to meanings of constituent words, and conjugation information; references to functions, objects and attributes in an underlying data model in the logic format; and whereby the arrangement comprises: converting means from text to words, which uses characters which separate words; converting means from words to text, which concatenates the spelling of the constituent words; converting means for sound to words, providing a continous speech recognition system; converting means for words to sound, providing a speech synthesis system; converting means for words to phrases, which uses parsing; converting means for phrases to words, which traverses said tree-like representation and converts each leaf node to a word; converting means for phrases to logic, which resolves or binds verb phrases to functions and noun phrases to objects in said underlying data model; and converting means for logic to phrases, which uses knowledge of the grammar of the used language to create a phrase expressing the same semantics as the original logic form.
27. An arrangement according to claim 26, providing the features of claims 2-25.
PCT/IB2001/000731 2001-04-20 2001-04-20 Conversion between data representation formats WO2002086757A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/IB2001/000731 WO2002086757A1 (en) 2001-04-20 2001-04-20 Conversion between data representation formats

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2001/000731 WO2002086757A1 (en) 2001-04-20 2001-04-20 Conversion between data representation formats

Publications (1)

Publication Number Publication Date
WO2002086757A1 true WO2002086757A1 (en) 2002-10-31

Family

ID=11004092

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2001/000731 WO2002086757A1 (en) 2001-04-20 2001-04-20 Conversion between data representation formats

Country Status (1)

Country Link
WO (1) WO2002086757A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5200893A (en) * 1989-02-27 1993-04-06 Hitachi, Ltd. Computer aided text generation method and system
US5237502A (en) * 1990-09-04 1993-08-17 International Business Machines Corporation Method and apparatus for paraphrasing information contained in logical forms
US5386556A (en) * 1989-03-06 1995-01-31 International Business Machines Corporation Natural language analyzing apparatus and method
US5530863A (en) * 1989-05-19 1996-06-25 Fujitsu Limited Programming language processing system with program translation performed by term rewriting with pattern matching
DE19626142A1 (en) * 1996-07-01 1998-01-08 Horn Hannes Dr Schulze Computer-aided text design system
US5715368A (en) * 1994-10-19 1998-02-03 International Business Machines Corporation Speech synthesis system and method utilizing phenome information and rhythm imformation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5200893A (en) * 1989-02-27 1993-04-06 Hitachi, Ltd. Computer aided text generation method and system
US5386556A (en) * 1989-03-06 1995-01-31 International Business Machines Corporation Natural language analyzing apparatus and method
US5530863A (en) * 1989-05-19 1996-06-25 Fujitsu Limited Programming language processing system with program translation performed by term rewriting with pattern matching
US5237502A (en) * 1990-09-04 1993-08-17 International Business Machines Corporation Method and apparatus for paraphrasing information contained in logical forms
US5715368A (en) * 1994-10-19 1998-02-03 International Business Machines Corporation Speech synthesis system and method utilizing phenome information and rhythm imformation
DE19626142A1 (en) * 1996-07-01 1998-01-08 Horn Hannes Dr Schulze Computer-aided text design system

Similar Documents

Publication Publication Date Title
US6556973B1 (en) Conversion between data representation formats
Hemphill et al. Surfing the Web by voice
US7072837B2 (en) Method for processing initially recognized speech in a speech recognition session
Watts Unsupervised learning for text-to-speech synthesis
JP2000353161A (en) Method and device for controlling style in generation of natural language
Panda Automated speech recognition system in advancement of human-computer interaction
AU723274B2 (en) Natural-language speech control
WO1998009228A9 (en) Natural-language speech control
Ostrogonac et al. Morphology-based vs unsupervised word clustering for training language models for Serbian
Baker DRAGONDICTATE-30K: Natural language speech recognition with 30000 words
Pineda et al. The dime project
Ion et al. A dialog manager for micro-worlds
Sidhu Natural language processing
Pieraccini Natural language understanding in socially interactive agents
JP3566977B2 (en) Natural language processing apparatus and method
Kawahara New perspectives on spoken language understanding: Does machine need to fully understand speech?
Bose Natural Language Processing: Current state and future directions
NithyaKalyani et al. Speech summarization for tamil language
WO2002086757A1 (en) Conversion between data representation formats
KR19980038185A (en) Natural Language Interface Agent and Its Meaning Analysis Method
Gharat et al. Natural language processing theory applications and difficulties
Sinha et al. Transforming interactions: mouse-based to voice-based interfaces
Song et al. Overview of natural language processing technologies and rationales in application
Ball et al. Spoken language processing in the Persona conversational assistant
Goulian et al. How NLP techniques can improve speech understanding: ROMUS-a robust chunk based message understanding system using link grammars.

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP