CA2426523A1 - Method of compensating for beamformer steering delay during handsfree speech recognition - Google Patents

Method of compensating for beamformer steering delay during handsfree speech recognition Download PDF

Info

Publication number
CA2426523A1
CA2426523A1 CA002426523A CA2426523A CA2426523A1 CA 2426523 A1 CA2426523 A1 CA 2426523A1 CA 002426523 A CA002426523 A CA 002426523A CA 2426523 A CA2426523 A CA 2426523A CA 2426523 A1 CA2426523 A1 CA 2426523A1
Authority
CA
Canada
Prior art keywords
beamformer
microphone array
steering
speech recognition
talker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002426523A
Other languages
French (fr)
Inventor
Maziar Amiri
Graham H. Thompson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitel Networks Corp
Original Assignee
Mitel Knowledge Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitel Knowledge Corp filed Critical Mitel Knowledge Corp
Publication of CA2426523A1 publication Critical patent/CA2426523A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/18Methods or devices for transmitting, conducting or directing sound
    • G10K11/26Sound-focusing or directing, e.g. scanning
    • G10K11/34Sound-focusing or directing, e.g. scanning using electrical steering of transducer arrays, e.g. beam steering
    • G10K11/341Circuits therefor
    • G10K11/346Circuits therefor using phase variation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K2210/00Details of active noise control [ANC] covered by G10K11/178 but not provided for in any of its subgroups
    • G10K2210/10Applications
    • G10K2210/108Communication systems, e.g. where useful sound is kept and noise is cancelled
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K2210/00Details of active noise control [ANC] covered by G10K11/178 but not provided for in any of its subgroups
    • G10K2210/30Means
    • G10K2210/321Physical
    • G10K2210/3215Arrays, e.g. for beamforming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Abstract

A beam former for outputting an entranced signal to a speech recognitio n system, comprising a microphone array for receiving audio signals from a plurality of microphones, a steering module connected to the microphone array for calculating audio parameters related to the location of a talker, the steering module being subject to an inherent initial delay in calculating the parameters, a buffer module connected to the microphone array for delaying the audio signals by at least the inherent initial delay; and a beamforming part connected to the buffer module and the steerin g module for directing the microphone array toward the talker in response to receiving the audio parameters from the steering part, such that an enhanced signal is output for application to a speech recognition system.

Description

METHOD OF COMPENSATING FOR BEAMFORMER STEERING DELAY
DURING HANDSFREE SPEECH RECOGNITION
FIELD OF THE INVENTION
The present invention relates generally to handsfree telephone systems and in particular to a method and apparatus for improving handsfree speech recognition by compensating for bearnformer steering delay.
BACKGROUND OF THE INVENTION
Localization of sound sources is required in many applications, such as hands free telephony or hands free dictation applications on a personal computer, where the source position is used to steer a high quality microphone beam toward a talker. It is Down in the art to use electronically steerable arrays of sensors, or an antenna, in combination with localization estimator algorithms to pinpoint the location of a talker in a room. In this regard, high quality and complex bea~nfontners have peen used to measure sound levels at different positions. As discussed in greater detail below, there are two types of bearnformer: fixed and adaptive.
Estimator algorithms are used to locate the dominant sound source using information received from sound sources via the beamformer(s). This talker localization functionality can be implemented either as a separate module feeding the beamformer with the talker position or as part of an adaptive beamformin g algorithm.
The former implementation is set forth in commonly assigned UK patent application no. 0016142.2, entitled Acoustic Talker Localization by Maziar Amiri, Dieter Schulz, Michael Tetelbaum, while the latter implementation is set forth in US Patent No.
x.,956,867 entitled Adaptive Beamforming for Noise Reduction.
The performance of speech recognition algorithms is significantly degraded during handsfree telephony. This is due to noise and reverberation, which are
2 captured to a much lesser degree when a handset or headset is used. As discussed above, beamforming improves the quality of handsfree telephony by attenuating reverberation and noise. Consequently, beamforming may also be used to enhance the quality of speech recognition during handsfree operation, but only after the beamsteering parameters have been adjusted to a quasi-stationary environment (i.e.
the beam is focused on the active talker).
Fixed beamformers require an initialization time period (approximately 50-250 ms) within which to locate a source of speech. During this time period, the beamformer is said to be in an "initial state" with no useful beam output being available., During this initial state, a default one of the microphones can be selected to provide signal output, without the noise reduction benefit of beamforming, anti 1 the source has been localized. The first 50 to 250ms of an utterance contain very important information from a speech recognition perspective, for example differentiating "Be" from "Pee'° or "Dee". It is therefore evident that it is highly desirable to have this initial period benefit from noise reduction in addition to the entire remainder of the talker°s utterance.
Adaptive beamformers do not require a localization algorithm, but do also require an initial time period to adjust the adaptive parameters to the given environment. In both fixed and adaptive beamformers, the beam output is non-optimal so long as the parameters are not adjusted to a quasi-stable state for the acoustic environment.
For straightforward hands-free telephony (i.e. use for human to human communication), the transition of the beamformer from the initial state (during which the non-optimal default microphone is selected) to the quasi-stable state imposes no apparent difficulty in conducting conversations. This is due to the redundancy in normal conversation plus the fact that the human ear is arguably significantly better than any current machine at the task of speech recognition. By way of contrast, the
3 initial sub-optimal microphone selection usually results in the first spoken word not being represented properly by a speech recognition algorithm. Therefore, the error rate of recognition rises for the first word. This error also occurs each time the talker moves or the acoustical environment changes in some way.
Accordingly, there is a need to compensate for 'the transition time from beamformer initial state to quasi-stable state for the purposes of handsfree speech recognition, but not for straightforward handsfree telephony or dictation.
SUMMARY OF THE INVENTION
According to the present invention, the signal from each microphone channel of the beamformer is stored in a FIFO buffer. Signal playback takes place only after the parameters have been adjusted and an enhanced acoustic signal is guaranteed. The introduced delay is constant, and is chosen to be the maximum convergence or "adaptation" time needed for parameter adjustment. In other words, the length of the FIFO buffer depends on the "adaptation" time. Since the parameters are calculated previous to signal output being provided to the speech recognition algorithm, the output provided is always optimal. Also, the delay imposed by the FIFOs has no important impact on the speech recognition process, the result of which is effectively further delayed by a time equal to the delay added by th.e FIFO.
According to the preferred embodiment, the beamformer is split into two parts.
The first part is the steering part, which calculates the parameters of the bearnformer using the incoming signals from the microphone array. The second part does the actual beamforming using the delayed microphone signals. The FIFO buffer delays the speech signals applied to the second, beamforming, part, whereas the signals are applied directly to the first, steering, part.
4 BRIEF DESCRIPTION OF THE DRAWINGS
A preferred embodiment of the present invention will now be described more fully with reference to Figure l, which is a block diagram of a delay compensation system according to the present invention, for a fixed bearnformer.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
Figure 1 is a block diagram of a delay compensator for a steered and fixed beamformer, according to an embodiment of the invention. A plurality (n) of microphone signals from array 1 are applied to the localization algorithm 3, which immediately begins to calculate the position of a person talking. The microphone signals are also fed into FIFO buffers 5, which introduce an equal delay to all channels before the signals are transmitted to the beamformer 7. The FIFO
buffers 5 are preferably implemented in DSP software using a circular buffer in RAM.
This well-known method requires that two pointers are provided: one points to the next input sample and the other points to the next output sample. DSP code manages the pointers to ensure that the pointers do not cross, thereby avoiding an overflow or underflow condition, as is well understood in the art. As discussed above, the delay conforms to the maximum amount of time needed by the localization algorithm 3 to find the position of the talker. Thus, the localized signal output from beamformer 7 is enhanced for application to a speech detection algorithm (not shown). As discussed above, this configuration should only be used when speech recognition is being applied to the handsfree telephone output (i.e. the output of beamformer 7).
It is desirable that there should be no unnecessary added delay (i.e. the microphone signals should be routed directly to the beamformer 7) during normal (human) handsfree conversation. As discussed below, the FIFO delay can be reduced to zero during periods of silence.

Once localization (or adaptation) has stabilized, the FIFO delay is preferably reduced to zero. This is accomplished via a control signal derived from the Call Controller of the telephone system (i.e. the delay is switched out as soon as the Call State exits the dialing state and enters talking state).
Alternatively, this can be done during periods of silence as determined by a Voice Activity Detector (VAD), which is an inherent component of many localization schemes including that of the preferred embodiment, and is described in co-pending U.K. patent application 11o. 0120322.3. The speech samples in the circular FIFO buffers 5 can be analyzed using a DSP algorithm to detect periods of "silence".
As the sequence of samples approaches the "output" of the FIFO, the output pointer is simply moved to the beginning of the period of silence, thereby simultaneously removing the silence period and also reducing the delay. The DSP algorithm also checks for underflow conditions within the FIFO buffers 5 (i.e. the delay has effectively been reduced to zero). Further DSP code may be used to reinstate the delay based either on Call State (such as a call exiting the Talking State, Idle, Hold or Transfer etc., or entering Signaling) or on the basis of duration of a silence in excess of a predetermined Limit (e.g. > 10 sec). Such silence suppression algorithms are well known in the art and are an inherent part of many VoIP and Voice Compression protocols, (e.g. G.729, where silence suppression is used as a method of reducing bandwidth).
In practice, the silence period that arises inherently during transitions between call states as the caller waits for the called party to answer is usually sufficient to eliminate the FIFO delay. Consequently, the FIFO delay is used only during the transition time in which the beamformer is in the initial state, and therefore does not interfere with normal handsfree conversation.
Also, whereas the preferred embodiment is set forth above in the context of handsfree microphone arrays, it is contemplated that the present invention can be ~7 applied to any bind of speech recognition application using remote microphones, such as PC dictation (e.g. Dragon Naturally Speaking'I~M, IBPJI Via VoiceTM) which use awl~,vard noise canceling headsets. Also a number of vendors have introduced very simple microphone arrays which use non-steerable beamformers similar to low cost, very high performance directional microphones, (i.e. points to a fixed direction). The principles of the present invention may be utilized to address anticipated difficulties of PC users of such directional microphones.
All such embodiments, modifications and applications are believed to be within the sphere and scope of the invention as defined by the claims appended hereto.

Claims (8)

We claim:
1. For use with a beamformer having a steering part for calculating, after an inherent initial delay, audio parameters related to the location of a talker in a handsfree telephony enviromnent, and a beamforming part for directing a microphone array toward said talker in response to receiving said audio parameters from said steering part, such that an enhanced signal is output for application to a speech recognition system, the improvement comprising delaying application of audio signals received from said microphone array to said beamforming part of said beamformer by an amount at least as much as said initial inherent delay, whereby said beamformer outputs an enhanced signal to said speech recognition system.
2. The improvement of claim 1, further comprising the step of gradually reducing said delaying of said application of the audio signals to said beamforming part during periods of silence.
3. The beamformer for outputting an enhanced signal to a speech detection system, comprising:
a microphone array for receiving audio signals from a plurality of microphones;
a steering module connected to said microphone array for calculating audio parameters related to the location of a talker, said steering module being subject to an inherent initial delay in calculating said parameters;
a buffer module connected to said microphone array for delaying said audio signals by at least said inherent initial delay; and a beamforming part connected to said buffer module and said steering module for directing said microphone array toward said talker in response to receiving said audio parameters from said steering part, such that an enhanced signal is output for application to a speech recognition system.
4. The beamformer of claim 3, wherein said steering part comprises a localization algorithm.
5. The beamformer of claim 3, wherein said steering part comprises an adaptation algorithm.
6. The beamformer of claim 3, wherein said buffer module comprises a plurality of parallel FIFO buffers for receiving respective audio signals from individual microphones of said microphone array.
7. The beamformer of claim 6, wherein said FIFO buffers are circular buffers implemented in RAM.
8. The beamformer of claim 3, wherein said buffer module is variable such that the delaying of said audio signals is gradually reduced during periods of silence.
CA002426523A 2002-04-26 2003-04-22 Method of compensating for beamformer steering delay during handsfree speech recognition Abandoned CA2426523A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0209579A GB2388001A (en) 2002-04-26 2002-04-26 Compensating for beamformer steering delay during handsfree speech recognition
GB0209579.2 2002-04-26

Publications (1)

Publication Number Publication Date
CA2426523A1 true CA2426523A1 (en) 2003-10-26

Family

ID=9935571

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002426523A Abandoned CA2426523A1 (en) 2002-04-26 2003-04-22 Method of compensating for beamformer steering delay during handsfree speech recognition

Country Status (4)

Country Link
US (1) US20030204397A1 (en)
EP (1) EP1357543A3 (en)
CA (1) CA2426523A1 (en)
GB (1) GB2388001A (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2412997A (en) * 2004-04-07 2005-10-12 Mitel Networks Corp Method and apparatus for hands-free speech recognition using a microphone array
US7917356B2 (en) 2004-09-16 2011-03-29 At&T Corporation Operating method for voice activity detection/silence suppression system
EP1736964A1 (en) * 2005-06-24 2006-12-27 Nederlandse Organisatie voor toegepast-natuurwetenschappelijk Onderzoek TNO System and method for extracting acoustic signals from signals emitted by a plurality of sources
US8098842B2 (en) * 2007-03-29 2012-01-17 Microsoft Corp. Enhanced beamforming for arrays of directional microphones
US9203489B2 (en) 2010-05-05 2015-12-01 Google Technology Holdings LLC Method and precoder information feedback in multi-antenna wireless communication systems
US8861756B2 (en) * 2010-09-24 2014-10-14 LI Creative Technologies, Inc. Microphone array system
US9171551B2 (en) * 2011-01-14 2015-10-27 GM Global Technology Operations LLC Unified microphone pre-processing system and method
US9813262B2 (en) 2012-12-03 2017-11-07 Google Technology Holdings LLC Method and apparatus for selectively transmitting data using spatial diversity
US9591508B2 (en) 2012-12-20 2017-03-07 Google Technology Holdings LLC Methods and apparatus for transmitting data between different peer-to-peer communication groups
US9979531B2 (en) 2013-01-03 2018-05-22 Google Technology Holdings LLC Method and apparatus for tuning a communication device for multi band operation
US10229697B2 (en) 2013-03-12 2019-03-12 Google Technology Holdings LLC Apparatus and method for beamforming to obtain voice and noise signals
US9386542B2 (en) 2013-09-19 2016-07-05 Google Technology Holdings, LLC Method and apparatus for estimating transmit power of a wireless device
US9286897B2 (en) 2013-09-27 2016-03-15 Amazon Technologies, Inc. Speech recognizer with multi-directional decoding
US9549290B2 (en) 2013-12-19 2017-01-17 Google Technology Holdings LLC Method and apparatus for determining direction information for a wireless device
US9491007B2 (en) 2014-04-28 2016-11-08 Google Technology Holdings LLC Apparatus and method for antenna matching
US9478847B2 (en) 2014-06-02 2016-10-25 Google Technology Holdings LLC Antenna system and method of assembly for a wearable electronic device
CN107040831A (en) * 2016-02-04 2017-08-11 北京卓锐微技术有限公司 A kind of microphone for having a delay feature
EP3566228B1 (en) * 2017-01-03 2020-06-10 Koninklijke Philips N.V. Audio capture using beamforming
US10586538B2 (en) 2018-04-25 2020-03-10 Comcast Cable Comminications, LLC Microphone array beamforming control
CN113884986B (en) * 2021-12-03 2022-05-03 杭州兆华电子股份有限公司 Beam focusing enhanced strong impact signal space-time domain joint detection method and system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5581620A (en) * 1994-04-21 1996-12-03 Brown University Research Foundation Methods and apparatus for adaptive beamforming
US5737485A (en) * 1995-03-07 1998-04-07 Rutgers The State University Of New Jersey Method and apparatus including microphone arrays and neural networks for speech/speaker recognition systems
JP3522954B2 (en) * 1996-03-15 2004-04-26 株式会社東芝 Microphone array input type speech recognition apparatus and method
US6449593B1 (en) * 2000-01-13 2002-09-10 Nokia Mobile Phones Ltd. Method and system for tracking human speakers
WO2001093554A2 (en) * 2000-05-26 2001-12-06 Koninklijke Philips Electronics N.V. Method and device for acoustic echo cancellation combined with adaptive beamforming
GB2364121B (en) * 2000-06-30 2004-11-24 Mitel Corp Method and apparatus for locating a talker
GB2375698A (en) * 2001-02-07 2002-11-20 Canon Kk Audio signal processing apparatus

Also Published As

Publication number Publication date
US20030204397A1 (en) 2003-10-30
EP1357543A2 (en) 2003-10-29
GB0209579D0 (en) 2002-06-05
EP1357543A3 (en) 2005-05-04
GB2388001A (en) 2003-10-29

Similar Documents

Publication Publication Date Title
US20030204397A1 (en) Method of compensating for beamformer steering delay during handsfree speech recognition
US8634547B2 (en) Echo canceller operative in response to fluctuation on echo path
US7110800B2 (en) Communication system using short range radio communication headset
US10269369B2 (en) System and method of noise reduction for a mobile device
US9264807B2 (en) Multichannel acoustic echo reduction
US5131032A (en) Echo canceller and communication apparatus employing the same
TWI426767B (en) Improved echo cacellation in telephones with multiple microphones
US9167333B2 (en) Headset dictation mode
US6785381B2 (en) Telephone having improved hands free operation audio quality and method of operation thereof
JPH09172396A (en) System and method for removing influence of acoustic coupling
EP1858295A1 (en) Equalization in acoustic signal processing
KR20040019362A (en) Sound reinforcement system having an multi microphone echo suppressor as post processor
WO1999011045A1 (en) Telephone handset noise suppression
EP0393059B1 (en) Method for terminating a telephone call by voice command
JP2001095083A (en) Method and device for compensating loss of signal
US10559317B2 (en) Microphone array processing for adaptive echo control
JP2002204187A (en) Echo control system
WO2019130239A1 (en) Acoustical in-cabin noise cancellation system for far-end telecommunications
US20090067615A1 (en) Echo cancellation using gain control
JPH09233198A (en) Method and device for software basis bridge for full duplex voice conference telephone system
CN1736091A (en) Device and method for suppressing echo, in particular in telephones
WO1998008324A2 (en) Microprocessor-controlled full-duplex speakerphone using automatic gain control
KR101953866B1 (en) Apparatus and method for processing sound signal of earset having in-ear microphone
JP4299768B2 (en) Voice recognition device, method, and portable information terminal device using voice recognition method
US20210243582A1 (en) Main unit, system and method for an infotainment system of a vehicle

Legal Events

Date Code Title Description
EEER Examination request
FZDE Discontinued