WO2009132871A1 - Method and system for converting speech into text - Google Patents

Method and system for converting speech into text Download PDF

Info

Publication number
WO2009132871A1
WO2009132871A1 PCT/EP2009/052092 EP2009052092W WO2009132871A1 WO 2009132871 A1 WO2009132871 A1 WO 2009132871A1 EP 2009052092 W EP2009052092 W EP 2009052092W WO 2009132871 A1 WO2009132871 A1 WO 2009132871A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
digital audio
text
text data
markers
Prior art date
Application number
PCT/EP2009/052092
Other languages
French (fr)
Inventor
Giacomo Olgeni
Mattia Scaricabarozzi
Original Assignee
Colby S.R.L.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Colby S.R.L. filed Critical Colby S.R.L.
Priority to EP09737915A priority Critical patent/EP2283481A1/en
Publication of WO2009132871A1 publication Critical patent/WO2009132871A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440236Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/08Systems for the simultaneous or sequential transmission of more than one television signal, e.g. additional information signals, the signals occupying wholly or partially the same frequency band, e.g. by time division
    • H04N7/087Systems for the simultaneous or sequential transmission of more than one television signal, e.g. additional information signals, the signals occupying wholly or partially the same frequency band, e.g. by time division with signal insertion during the vertical blanking interval only
    • H04N7/088Systems for the simultaneous or sequential transmission of more than one television signal, e.g. additional information signals, the signals occupying wholly or partially the same frequency band, e.g. by time division with signal insertion during the vertical blanking interval only the inserted signal being digital
    • H04N7/0884Systems for the simultaneous or sequential transmission of more than one television signal, e.g. additional information signals, the signals occupying wholly or partially the same frequency band, e.g. by time division with signal insertion during the vertical blanking interval only the inserted signal being digital for the transmission of additional display-information, e.g. menu for programme or channel selection
    • H04N7/0885Systems for the simultaneous or sequential transmission of more than one television signal, e.g. additional information signals, the signals occupying wholly or partially the same frequency band, e.g. by time division with signal insertion during the vertical blanking interval only the inserted signal being digital for the transmission of additional display-information, e.g. menu for programme or channel selection for the transmission of subtitles

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)

Abstract

Method for converting a speech (S) into text (T), which comprises the following operative steps: - an analog audio signal (AA) of a speech (S) is converted into a digital audio signal (DS); - the digital audio signal (DS) is converted into text data (TD); wherein - one or more markers (Mx... My) comprising a digital waveform are inserted into the digital audio signal (DS) before the conversion of the digital audio signal (DS) into text data (TD); - the markers (Mx...My) are converted into one or more commands (Cx...Cy) in the text data (TD) after the conversion of the digital audio signal (DS) into text data (TD). The present invention also relates to a system for carrying out such a method.

Description

SIB - 1 - BW517M-TE
METHOD AND SYSTEM FOR CONVERTING SPEECH INTO TEXT
The present invention relates to a method for converting speech into text, and in particular a method which can be employed for generating subtitles live in television transmissions. The present invention also relates to a system for carrying out such a method.
Known systems for converting a speech into a text comprise a sampler module which converts an analog audio signal of a speech into a digital audio signal, as well as a voice recognition module which converts the digital audio signal into text data. Such systems have some disadvantages when the speech is generated by a speaker, generally called respeaker, for creating in real time television subtitles comprising the text data.
As a matter of fact, such known systems provide that all the punctuation signs, the font styles, the colors and the other control or text format functions are dictated by the speaker. The time needed by the speaker for carrying out this operation inevitably increases the delay between the words pronounced in the television transmission and the words pronounced by the speaker, with a consequent delay of the subtitles and an increase of the probability that the speaker loses the thread of the transmission.
Furthermore, each word not contained in the system dictionary must be added manually and trained by the speaker by pronouncing it one or more times so that the system can associate it to the corresponding phonemes. However, this operation can be carried out only in advance, namely not during the normal dictation process, so that if during a transmission the speaker has to pronounce a new word more times, the system can never interpret the latter in a correct manner.
Moreover, the known systems convert the speech into text with a certain delay, since they use the context of the dictated sentence for eliminating the ambiguities which are inevitably discovered during the phoneme processing process, so that they generate text data only when the speaker makes a pause in the dictation, which is however quite rare when he tries to follow a transmission in real time.
It is therefore an object of the present invention to provide a conversion method and system, which are free from said disadvantages. Said object is achieved with a method and a system, whose main features are disclosed in claims 1 and 11, SIB - 2 - BW517M-TE
respectively, while other features are disclosed in the remaining claims.
Thanks to the particular markers inserted into the digital audio signal and converted into commands in the text data, the method and the system according to the present invention allow to automatically insert into the speech the desired commands without the speaker being forced to pronounce them, so as to avoid also the training phase of new words. Such commands can comprise one or more text characters, in particular symbols, characters, words and/or sentences, and/or text formatting commands, in particular colors, size and/or fonts.
Furthermore, the association of the markers with the commands can be modified in real time by a supervisor according to the argument of the speech, without modifying or training for new markers. The only training, to be carried out only once for each speaker, is required for acquiring the phonemes used as markers.
According to a particular aspect of the invention, the commands associated to the markers inserted into the digital audio signal are compared with the commands associated to the markers found in the text data for allowing the detection of possible recognition errors of the same markers.
The system according to the present invention is preferably made with a particular client-server structure, so that two or more speakers can alternate themselves in real time in the dictation of a same particularly extended text. Further advantages and features of the method and the system according to the present invention will become clear to those skilled in the art from the following detailed and non-limiting description of an embodiment thereof with reference to the attached drawings, wherein: figure 1 shows a first block scheme of the system; - figure 2 shows a scheme of the insertion of a marker; figure 3 shows a scheme of the correction of a marker series; and figure 4 shows a second block scheme of the system.
Referring to figure 1 , it is seen that the system according to the present invention comprises in a known way at least one sampler module SM which converts an analog audio signal AA into a digital audio signal DS. Analog audio signal AA is a speech S of a first speaker Sl picked up by at least one transducer, in particular a microphone MIC. SIB - 3 - BW517M-TE
Analog audio signal AA can be processed by an audio processor AP, for example comprising equalization, gate and compression stages, before it is sampled by sampler module SM. Digital audio signal DS contains at least one sampled waveform SW substantially corresponding to speech S and is transmitted to a voice recognition module VRM which converts digital audio signal DS into a dictated text D substantially corresponding to speech S.
According to the invention, the system also comprises an audio editor AE suitable for automatically inserting into digital audio signal DS at least one marker Mx comprising a digital waveform stored in at least one digital table DT comprising one or more markers Ml...Mn associated to one or more commands Cl...Cn and to one or more labels L 1...Ln. In particular, markers M 1...Mn comprise one or more phonemes pronounced by first speaker S 1 and sampled in advance, for example through the same sampler module SM. An input/output interface IO shows to first speaker Sl labels L 1...Ln associated to markers M 1...Mn. First speaker Sl can select markers M 1...Mn to be inserted into digital audio signal DS by pressing buttons associated to labels L 1...Ln. In particular, input/output interface IO is a touchscreen which shows labels L 1...Ln, which can be selected by touching the area of the touchscreen which shows the same labels. In other embodiments input/output interface IO can comprise a display, a keyboard, a mouse and/or other input/output devices. Referring also to figure 2, it is seen that when first speaker Sl selects label Lx by means of input/output interface 10, marker Mx corresponding to label Lx is immediately inserted into digital audio signal DS by audio editor AE. The latter comprises an audio buffer which temporarily stores and shifts forward the rest of sampled waveform SW, so as to make up for the portion of speech S corresponding to the duration of marker Mx. For avoiding or reducing the delays due to the introduction of marker Mx into digital audio signal DS, audio editor AE can cancel possible pauses from digital audio signal DS and/or can digitally accelerate digital audio signal DS without varying the pitch of speech S. Digital audio signal DS comprising sampled waveform SW and marker Mx is then processed by voice recognition module VRM, which converts digital audio signal DS into text data TD including dictated text D and marker Mx converted into the corresponding phonemes and inserted into dictation D. SIB - 4 - BW517M-TE
A text converter TC converts the text of the phonemes corresponding to marker Mx into command Cx associated to marker Mx in digital table DT. Command Cx can consist of one or more text character, in particular symbols, characters, words and/or sentences, and/or text formatting commands, in particular colors, size and/or fonts. Text data TD generated by text converter TC comprise then command Cx included in dictated text D.
Referring to figure 3, it is seen that first speaker Sl can insert a plurality of markers Mx... My into various points of sampled waveform SW in digital audio signal DS, in which case text data TD generated by text converter TC comprise a plurality of commands Cx...Cy included in the same points of the corresponding dictated text D. When first speaker Sl selects with input/output interface IO labels Lx... Ly corresponding to commands Cx...Cy and to markers Mx...My, the selected commands Cx... Cy are stored also in a digital memory DM, so that if a marker Mx... My inserted in digital audio signal DS is not recognized by mistake by voice recognition module VRM, text converter TC can compare anyway in digital memory DM the sequence of commands Cx...Cy which have been selected and commands Cx...Cy associated to markers Mx...My transformed into text data TD, so as to obtain text data TD which include these commands Cx...Cy in their correct sequence.
Input/output interface 10, sampler module SM and/or digital table DT, as well as digital memory DM, are components and/or peripherals, also of a known kind, of a client computer CC, while audio editor AE, voice recognition module VRM and/or text converter TC, as well as audio processor AP, are programs, also of a known kind, suitable to be executed by client computer CC.
Referring to figure 4, it is seen that a plurality of speakers Sl...Sm provided with a client computer CCl ...CCm can generate with the above mentioned method one or more text data sequences TDl 1...TDIp... TDmI...TDmq, which are sent through a data network to at least one server computer SC, which combines in an automatic and/or manual matter such sequences for generating at least one text T to be sent to a text generator TG, for example for being displayed in a television transmission. Text T can further contain also other text data TDx... TDy which can be created with a method different from the above described one. SIB - 5 - BW517M-TE
A supervisor SV can manually process the contents and/or the order of text data TDl 1... TDIp... TDmI...TDmq... TDx...TDy. The sequences of text data TDl 1... TDIp... TDmI...TDmq... TDx...TDy can be also automatically ordered by server computer SC by inserting the first available text data as soon as a pause longer than a determined threshold value is detected in the sequence of the text data which are employed at that time for generating text T. Thus, at least two speakers Sl and S2 can alternate themselves in the dictation of texts, also by completely or partially overlapping their speeches S.
Supervisor SV can also process with server computer SC and send through the same data network to client computers CC 1...CCm one or more digital tables DT 1...DTz in which markers M 1...Mx are associated to particular labels L 1...Lx and commands
C 1...Cx which relate to the argument (for example politics, sports, economy, news, etc.) dealt by speakers Sl...Sm, so as to update in real time commands Cl...Cx associated to markers M 1...Mx and usable by speakers S 1...Sm during the conversion of analog audio signal AA into digital audio signal DS.
Possible modifications and/or additions may be made by those skilled in the art to the hereinabove disclosed and illustrated embodiment while remaining within the scope of the following claims.

Claims

SIB - 6 - BW517M-TECLAIMS
1. Method for converting a speech (S) into text (T), which comprises the following operative steps: - an analog audio signal (AA) of a speech (S) is converted into a digital audio signal (DS); the digital audio signal (DS) is converted into text data (TD); characterized in that one or more markers (Mx... My) comprising a digital waveform are inserted into the digital audio signal (DS) before the conversion of the digital audio signal (DS) into text data (TD); the markers (Mx...My) are converted into one or more commands (Cx...Cy) in the text data (TD) after the conversion of the digital audio signal (DS) into text data (TD).
2. Method according to the previous claim, characterized in that the markers (Mx... My) are inserted into the digital audio signal (DS) during the conversion of the analog audio signal (AA) into the digital audio signal (DS).
3. Method according to one of the previous claims, characterized in that the markers (M 1....Mn) are associated to the commands (C 1...Cn) before the conversion of the digital audio signal (DS) into text data (TD).
4. Method according to one of the previous claims, characterized in that the markers (M 1....Mn) are selected and inserted into the digital audio signal (DS) by the speaker (Sl...Sm) of the speech (S).
5. Method according to one of the previous claims, characterized in that the sampled waveform (SW) of the speech (S) is temporarily stored and shifted forward when a marker (Mx) is inserted into the digital audio signal (DS), so as to make up for the portion of the speech (S) corresponding to the duration of the marker (Mx).
6. Method according to the previous claim, characterized in that the digital audio signal (DS) is digitally accelerated without varying the pitch of the speech (S).
7. Method according to one of the previous claims, characterized in that the commands (Cx...Cy) associated to the markers (Mx...My) inserted into the digital audio signal (DS) are compared with the commands (Cx...Cy) associated to the markers SIB - 7 - BW517M-TE
(Mx... My) in the text data (TD).
8. Method according to one of the previous claims, characterized in that one or more speakers (S 1...Sm) generate one or more text data sequences (TDl 1...TDIp... TDmI...TDmq) which are combined in an automatic and/or manual manner for generating at least one text (T).
9. Method according to the previous claim, characterized in that the text data sequences (TDl 1...TDIp...TDmI...TDmq...TDx...TDy) are automatically ordered by inserting the first available text data as soon as a pause longer than a determined threshold value is detected in the sequence of the text data which are employed at that time for generating the text (T).
10. Method according to one of the previous claims, characterized in that the commands (C 1...Cx) associated to the markers (M 1...Mx) are updated in real time during the conversion of the analog audio signal (AA) into the digital audio signal (DS).
11. System for converting a speech (S) into a text (T), which comprises at least one sampler module (SM) which converts an analog audio signal (AA) of a speech
(S) into a digital audio signal (DS), as well as a voice recognition module (VRM) which converts the digital audio signal (DS) into text data (TD), characterized in that the system comprises also an audio editor (AE) which inserts into the digital audio signal (DS) one or more markers (Mx... My) comprising a digital waveform before the conversion of the digital audio signal (DS) into text data (TD), as well as a text converter (TC) which converts the markers (Mx... My) into one or more commands (Cx...Cy) in the text data (TD) after the conversion of the digital audio signal (DS) into text data (TD).
12. System according to the previous claim, characterized in that one or more digital tables (DT, DT 1...DTz) contain the markers (M 1....Mn) associated to the commands (C 1...Cn).
13. System according to claim 11 or 12, characterized in that an input/output interface (IO), in particular a touchscreen, shows labels (Lx... Ly) which correspond to the commands (Cx...Cy) and to the markers (Mx...My) which can be selected for being inserted into the digital audio signal (DS).
14. System according to one of claims 11 to 13, characterized in that the SIB - 8 - BW517M-TE
audio editor (AE) temporarily stores and shifts forward the sampled waveform (SW) of the speech (S) when a marker (Mx) is inserted into the digital audio signal (DS), so as to make up for the portion of the speech (S) corresponding to the duration of the marker (Mx).
15. System according to the previous claim, characterized in that the audio editor (AE) digitally accelerates the digital audio signal (DS) without varying the pitch of the speech (S).
16. System according to one of claims 11 to 15, characterized in that the text converter (TC) compares the commands (Cx...Cy) associated to the markers (Mx...My) inserted into the digital audio signal (DS) with the commands (Cx...Cy) associated to the markers (Mx...My) in the text data (TD).
17. System according to one of claims 11 to 16, characterized in that the input/output interface (10), the sampler module (SM) and/or the digital table (DT) are components and/or peripherals of a client computer (CC), while the audio editor (AE), the voice recognition module (VRM) and/or the text converter (TC) are programs executable by the client computer (CC).
18. System according to the previous claim, characterized in that a plurality of client computers (CC 1...CCm) are connected to at least one server computer (SC) for sending through a network one or more text data sequences (TDl 1...TDIp... TDmI...TDmq), which are combined in an automatic and/or manual manner by the server computer (SC) for generating at least one text (T).
19. System according to the previous claim, characterized in that the server computer (SC) automatically orders the text data sequences (TDl 1...TDIp...TDmI...TDmq...TDx...TDy) by inserting the first available text data as soon as a pause longer than a determined threshold value is detected in the sequence of the text data which are employed at that time for generating the text (T).
20. System according to claim 18 or 19, characterized in that the server computer (SC) processes and sends through the same data network to the client computers (CC 1...CCm) one or more digital tables (DT 1...DTz) in which the markers (Ml...Mx) are associated to commands (Cl...Cx).
21. Method or system according to one of the previous claims, characterized SIB - 9 - BW517M-TE
in that the commands (Cx...Cy) consist of one or more text characters, in particular symbols, characters, words and/or sentences, and/or text formatting commands, in particular colors, size and/or fonts.
22. Method or system according to one of the previous claims, characterized in that the markers (M 1...Mn) comprise one or more phonemes pronounced by the speaker (Sl...Sm) of the speech (S) and sampled for being converted into a digital waveform.
PCT/EP2009/052092 2008-04-30 2009-02-20 Method and system for converting speech into text WO2009132871A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP09737915A EP2283481A1 (en) 2008-04-30 2009-02-20 Method and system for converting speech into text

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
ITMI2008A000794 2008-04-30
IT000794A ITMI20080794A1 (en) 2008-04-30 2008-04-30 METHOD AND SYSTEM TO CONVERT TO SPEAKING IN TEXT

Publications (1)

Publication Number Publication Date
WO2009132871A1 true WO2009132871A1 (en) 2009-11-05

Family

ID=40297044

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2009/052092 WO2009132871A1 (en) 2008-04-30 2009-02-20 Method and system for converting speech into text

Country Status (3)

Country Link
EP (1) EP2283481A1 (en)
IT (1) ITMI20080794A1 (en)
WO (1) WO2009132871A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120316882A1 (en) * 2011-06-10 2012-12-13 Morgan Fiumi System for generating captions for live video broadcasts
US8532469B2 (en) 2011-06-10 2013-09-10 Morgan Fiumi Distributed digital video processing system
US8749618B2 (en) 2011-06-10 2014-06-10 Morgan Fiumi Distributed three-dimensional video conversion system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5960447A (en) 1995-11-13 1999-09-28 Holt; Douglas Word tagging and editing system for speech recognition
US20050086705A1 (en) * 2003-08-26 2005-04-21 Jarman Matthew T. Method and apparatus for controlling play of an audio signal
WO2005116992A1 (en) * 2004-05-27 2005-12-08 Koninklijke Philips Electronics N.V. Method of and system for modifying messages
US20070256016A1 (en) * 2006-04-26 2007-11-01 Bedingfield James C Sr Methods, systems, and computer program products for managing video information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5960447A (en) 1995-11-13 1999-09-28 Holt; Douglas Word tagging and editing system for speech recognition
US20050086705A1 (en) * 2003-08-26 2005-04-21 Jarman Matthew T. Method and apparatus for controlling play of an audio signal
WO2005116992A1 (en) * 2004-05-27 2005-12-08 Koninklijke Philips Electronics N.V. Method of and system for modifying messages
US20070256016A1 (en) * 2006-04-26 2007-11-01 Bedingfield James C Sr Methods, systems, and computer program products for managing video information

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120316882A1 (en) * 2011-06-10 2012-12-13 Morgan Fiumi System for generating captions for live video broadcasts
US8532469B2 (en) 2011-06-10 2013-09-10 Morgan Fiumi Distributed digital video processing system
US8749618B2 (en) 2011-06-10 2014-06-10 Morgan Fiumi Distributed three-dimensional video conversion system
US9026446B2 (en) * 2011-06-10 2015-05-05 Morgan Fiumi System for generating captions for live video broadcasts

Also Published As

Publication number Publication date
ITMI20080794A1 (en) 2009-11-01
EP2283481A1 (en) 2011-02-16

Similar Documents

Publication Publication Date Title
US8706495B2 (en) Synchronise an audio cursor and a text cursor during editing
JP6633153B2 (en) Method and apparatus for extracting information
US9263027B2 (en) Broadcast system using text to speech conversion
JP5787780B2 (en) Transcription support system and transcription support method
JP5478478B2 (en) Text correction apparatus and program
US20060184369A1 (en) Voice activated instruction manual
US8606560B2 (en) Automatic simultaneous interpertation system
US10304457B2 (en) Transcription support system and transcription support method
JP2011191922A (en) Translation apparatus, translation method and computer program
US20060195318A1 (en) System for correction of speech recognition results with confidence level indication
WO2008114453A1 (en) Voice synthesizing device, voice synthesizing system, language processing device, voice synthesizing method and computer program
US20190005950A1 (en) Intention estimation device and intention estimation method
US20200043493A1 (en) Translation device
KR20160081244A (en) Automatic interpretation system and method
US8676578B2 (en) Meeting support apparatus, method and program
JP4436087B2 (en) Character data correction device, character data correction method, and character data correction program
WO2009132871A1 (en) Method and system for converting speech into text
KR101990019B1 (en) Terminal for performing hybrid caption effect, and method thereby
CN113225612A (en) Subtitle generating method and device, computer readable storage medium and electronic equipment
JP2007193166A (en) Dialog device, dialog method, and program
JP5818753B2 (en) Spoken dialogue system and spoken dialogue method
CN113409761B (en) Speech synthesis method, speech synthesis device, electronic device, and computer-readable storage medium
JP2001282779A (en) Electronized text preparation system
CN108962246B (en) Voice control method, device and computer readable storage medium
JP2008243076A (en) Interpretation device, method and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09737915

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2009737915

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE