WO1997007624A1

WO1997007624A1 - Echo cancelling using signal preprocessing in an acoustic environment

Info

Publication number: WO1997007624A1
Application number: PCT/SE1996/001037
Authority: WO
Inventors: Ingvar Claesson; Mattias Dahl; Sven Nordebo
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 1995-08-21
Filing date: 1996-08-21
Publication date: 1997-02-27
Also published as: AU6841396A

Abstract

An echo suppression apparatus has a number of sensors, such as microphones, disposed in an environment. Each sensor receives a signal. Each of the signals includes a component derived from an echo source in the environment. The echo source may be one or more loudspeakers which, in combination with the microphones, are part of a hands-free communication device. The echo suppression apparatus further includes a processor for processing the received signals based on spatial and/or temporal information in order to transform the received signals into a processed signal having a reduced component derived from the echo source. In accordance with another aspect of the invention, the echo source is one or more loudspeakers having input signals derived from an electronic signal, and the echo suppression apparatus may further include an echo canceler component for generating an echo replica from the electronic signal, wherein the echo replica approximates the reduced component derived from the echo source. The echo replica is then subtracted from the processed signal in order to further reduce the amount of echo that is perceived by a far-end user. The processor and the echo canceler component may, for example, each be adaptive finite impulse response (FIR) filters whose tap weights are dynamically determined.

Description

ECHO CANCELLING USING SIGNAL PREPROCESSING IN AN ACOUSTIC ENVIRONMENT

BACKGROUND The present invention relates to echo suppression in a communications system, and more particularly to techniques that employ adaptive filtering to cancel or suppress echoes in a communications system, such as in telephony systems. In conventional land-based as well as in mobile communications systems, one problem that often needs to be addressed is the existence of echoes that can arise, for example, when a signal representing a talker's voice is received at a listener's station and then retransmitted back to the original talker. Because of delays introduced by the system, the talker will hear his or her own voice as an echo that occurs shortly after the words were actually spoken. Circumstances that may give rise to such an echo include the existence of an acoustic coupling between the loudspeaker and the microphone at the listener's side of the connection. This problem is particularly pronounced in "hands-free" communication equipment, such as a speakerphone, because the acoustic coupling between the loudspeaker and the microphone is particularly strong in such equipment. The problem must also be addressed where hands-free mobile telephone equipment (e.g., cellular telephone equipment) is utilized. In an automobile, for example, a cellular telephone's microphone might be mounted on the sun visor, while the loudspeaker may be a dash-mounted unit, or may alternatively be one that is associated with the car's stereo equipment. With these components mounted in this fashion, a cellular phone user may carry on a conversation without having to hold a cellular unit or its handset. However, sound from the loudspeaker may also be picked up by the microphone, thereby returning the far end speaker's own voice to him after some, delay. The echo problem will now be described in greater detail with reference to FIG. la, which shows an apparatus, such as hands-free mobile telephony equipment, for providing two-way (full duplex) communication between a far end user (not shown) , and a near end user. The speech signal from the far end user, denoted by "far end speech" signal 1₀1 in the figure, is transmitted towards the near end listener through an electronic channel. The far end speech signal 101 is a digital signal that is supplied to a digital-to- analog (D/A) converter and amplifier 103. The resulting amplified analog signal is then supplied to one or more loudspeakers which generate an acoustic signal 1₀₇ that propagates within, and is distorted by, the near-en_d acoustic environment. For simplicity of explanation, only one of the possibly plural loudspeakers (i.e., loudspeaker 105) is illustrated in the Figures and described throughout the remainder of this disclosure. However, the invention should be considered to be applicable to situations in which more than one loudspeaker (or other type of echo source₎ are present.

Continuing with the discussion, a distorted version of the acoustic signal, denoted by "acoustic echo" 109 in the figure, is thus received by a single microphone 111 that generates an analog signal that is amplified an_d converted into a digital signal by the microphone amplifier and analog-to-digital converter 113. If the resulting signal, denoted by "echo signal" 115 in the figure, is transmitted back to the far end speaker, he or she will, due to the time delay in the closed loop, perceive it as an annoying echo.

In order to suppress the echo signal 115, an echo canceler (EC) 117 is introduced into the system. In conventional systems, the echo canceler 117 is a filter that closely approximates the filtering that the near-en_d acoustic environment performs on the acoustic signal 107. In this way, the echo canceler 117 is able to generate an echo replica 119 directly from the far end speech signal 101. The echo replica 119 is thereafter subtracted from the echo signal 115, resulting in a residual echo signal 121 that is much less annoying for the far end speaker. Typically, the filter characteristics (e.g., the tap weights in a finite impulse response, or FIR, filter) of the echo canceler 117 are dynamically adjusted by, for example, minimizing the energy of the residual echo signal 121, in order to adapt to changes in the near-end acoustic environment. Such changes may occur whenever the physical environment is altered, such as when a window or car door is opened or closed.

A more formal depiction of the acoustic echo problem is shown in Fig. lb. In this figure, the near-end acoustic environment and the analog electronic components (A/D and D/A convertors, amplifiers, etc.) have been combined so that they may be illustrated as the non-linear function f (s (k) ) 150, where k is a running discrete time index and s (k) denotes the far end speech. Thus, the echo signal, denoted by e (k) , can be written as

e (k) = £ (s (k) ) (1)

Denoting the output from the echo canceler 117 (i.e., the echo replica) by g(k) , the residual echo, r(k) , is given by

r (k) = β (k) - g (k) (2)

In almost every practical echo canceler implementation, g (k) is generated as an output of a finite impulse response (FIR) filter, and is determined by the equation

N g(ic)=j; g_ks (k- t) (3) t=l where {g_lf . . . g_N} are filter weights, which may alternatively be fixed or dynamically changed to adapt to new conditions. Thus, the echo replica is given by a weighted sum of past values of the far end speech. In telephony applications, N is typically on the order of 200-2000. In order to make the echo canceler 117 adaptive, the filter weights are dynamically determined in accordance with any of a number of well-known algorithms, such as the Normalized Least Mean Squares algorithm which operates by minimizing the energy of the residual echo signal r(k) .

For acoustic echoes originating from hands free operation of a mobile telephone, the conventional echo cancellation solution reduces the echo level by approximately l0-20dB. This is far from sufficient for a high audible quality communication. Moreover, very long

FIR-filters are required to achieve even this level of echo cancellation. As an example, something on the order of 4000 FIR taps would be required for certain speakerphone applications. In general, the conventional echo suppression techniques give imperfect performance as a result of the non-linearities in the echo path and in the analog electronics. For example, the speed of sound heavily depends on the absolute temperature. In addition, the loudspeaker is well known to introduce different kinds of distortion.

The above description is an overview of prior art echo cancellation techniques. In practice, more sophisticated algorithms are used. Such algorithms may include features like voice switching and pole-zero modeling (that is, the use of infinite impulse response filters) . A more complete description of prior art echo cancelers may be found in M.M. Sohndi & Kellerman, "Adaptive echo cancellation for speech signals", Advances in speech signal processing, (S. Furui & M.M. Sohndi eds. 1992), New York. Despite the increased sophistication, however, all of the above-described prior art methods for acoustic echo cancellation lead to a residual echo having one or more severe artifacts, such as an insufficient reduction of the echo level, and introduction of artifacts in the residual echo or in the speech of the near end speaker. Furthermore, some of these conventional techniques involve algorithms that exhibit slow convergence and computational problems, and entail high computational complexity.

SUMMARY

It is therefore an object of the present invention to provide an echo suppression technique and apparatus that gives improved performance over conventional techniques. It is another object of the invention to provide an echo canceler that avoids distortion of the speech from the near end speaker.

It is yet another object of the invention to provide an echo canceler that produces a remaining residual echo without annoying artifacts.

It is still another object of the invention to provide an echo canceler having a computational complexity on the order of that of conventional echo cancellation algorithms. It is yet another object of the invention to provide an echo canceler that avoids slow convergence and high computational complexity.

In accordance with one aspect of the present invention, the foregoing and other objects are achieved in an echo suppression apparatus comprising a plurality of sensors and processing means. The plurality of sensors, which may be microphones, are disposed in an environment _for receiving, at each of the sensors, signals, wherein each of the signals includes a component derived from an echo source in the environment. The processing means utilizes spatial and/or temporal information obtained from the plurality o_f sensors in order to transform the received signals into a processed signal having a reduced component derived from the echo source.

In accordance with another aspect of the invention, the echo source may be a loudspeaker having input signals derived from an electronic signal, and the echo suppression apparatus further comprises means for generating an echo replica from the electronic signal, wherein the echo replica approximates the reduced component derived from the echo source. Means are also provided for subtracting the echo replica from the processed signal.

In yet other aspects of the invention, the processor and the echo replica generating means may be adaptive filters, such as finite impulse response (FIR) filters whose tap weights are dynamically determined.

In still other aspects of the invention, methods for reducing the echo component are also disclosed, which methods include receiving signals at each of a plurality of sensors that are disposed in the environment. The received signals are then processed to reduce the echo component. The processing may include the use of spatial and/or temporal information to reduce the echo component.

BRIEF DESCRIPTION OF THE DRAWINGS The objects and advantages of the invention will be understood by reading the following detailed description in conjunction with the drawings in which:

FIGS, la and lb are depictions of a prior art echo canceler for suppressing echoes in an apparatus for providing two-way communication between a far end speaker and a near end speaker;

FIGS. 2a and 2b are block diagrams of an echo canceler in accordance with one embodiment of the present invention; FIG. 3 is a detailed block diagram of a preferred pre-processor for use in the invention; and FIG. 4 is a block diagram of a hardware configuration for adaptively determining tap weights for use in a pre-processor in accordance with the invention.

DETAILED DESCRIPTION

The various features of the invention will now _be described with respect to the figures, in which like parts are identified with the same reference characters. In the following discussion, the invention is described in the context of reducing echoes from an acoustic environment.

However, this context is intended to be merely illustrative. Those having ordinary skill in the art will readily be a_ble to adapt inventive principles for the purpose of reducing non-acoustic echoes as well. A block diagram of an echo canceler in accordance with one embodiment of the present invention is shown in FIG. 2a. The inventive echo canceler is based on the observation that the acoustic echo generally originates from a limited set of acoustic sources (e.g., one or a few loudspeakers) . Thus, by using several acoustic sensors (e.g., several microphones), echo suppression can be achieved using information about the spatial localization of the sources. Temporal filtering may also be applied to suppress the echo signal being generated by the several acoustic sensors, as will be described in greater detail below.

As shown in FIG. 2a, a far-end speech signal 10₁ is supplied to a D/A converter and amplifier 103 as described in the BACKGROUND section, in order to produce an analog signal that is in turn supplied to a loudspeaker 105. The loudspeaker generates an acoustic signal which propagates through a number of acoustic echo paths ₂03 to generate a corresponding number of acoustic echo signals 205. Because of non-uniformities in the near-end environment, each one of the acoustic echo signals 205 is, in general, different in one or more ways (e.g., degree of distortion, delay, etc.) from each of the other acoustic echo signals 205.

In accordance with one aspect of the invention, a plurality of sensors, such as the microphones 201, are disposed at different locations within the near end environment in order to be able to receive different ones of the acoustic echo signals 205. The sensors need not be spaced to conform with any particular geometric pattern, since, in accordance with the invention, the filters (described below) will always perform a best least square compromise. In practice, a total width (aperture) of about 0.3-0.6m has been found to be sufficient. When applied to hands-free telephone equipment in an automobile, there is no need to sample the car interior any denser than once every 0.1m.

The output from each of the microphones 201 is fed to a corresponding one of a plurality of microphone amplifiers and A/D converters 207, whose digital outputs are each supplied to a corresponding input of a pre-processor 209. As will be explained in greater detail below, the pre¬ processor 209 makes use of the spatial and temporal information that is inherent in the collection of acoustic echo signals 205 in order to assist in the suppression of the echo. For example, if the location of the loudspeaker is known, then the pre-processor 209 may be designed to filter out (or at least attenuate) all sounds coming from that direction, while allowing sounds from all other directions to pass. That is, the spatial information is utilized to distinguish between the acoustic echo signal (most of which emanates from the direction of the loudspeaker) and all other acoustic signals. Temporal information may also be utilized to filter out those sounds which most likely emanated from the loudspeaker.

The pre-processor generates a single pre-processed echo signal 215 that includes a wanted part (e.g., the near- end user's voice) in addition to having a reduced echo signal component. In order to further reduce the echo component of the pre-processed echo signal 215, the inventive echo canceler may also include a conventional echo canceler (EC) 211, which is preferably a finite impulse response (FIR) filter. The tap weights of the FIR filter are preferably set so that the output of the EC 211 is an echo replica 213 that closely approximates the echo component of the pre-processed echo 215. In a preferred embodiment of the invention, the tap weights are adjuste_d dynamically by means of well-known techniques such as the Normalized Least Mean Squares algorithm. The echo replica 213 is then subtracted from the pre-processed echo signal 215, and the resulting residual echo signal 217 is transmitted to the far-end speaker (not shown) . The various aspects of the invention will now be described in greater detail with reference to FIG. 2b, which is a more formalized depiction of the inventive echo canceler. In the figure, the near-end acoustic environment and the analog electronic components (A/D and D/A convertors, amplifiers, etc.) associated with each of M acoustic echo paths 203 have been combined in order to allow their depiction as the non-linear functions, f₍ (s (k) ) 150, where 1 < £ < M, k is a running discrete time index and s (k) denotes the far end speech. Thus, the echo signals, denoted by e_t (k) , can be written as

e_t (k) = f, (s (k) ) , where £ = 1, ..., M (4)

The multi-channel measurements are then pre- processed and reduced to a single channel signal (denoted by h (k) below) by the pre-processor 209. The pre-processor 209 preferably comprises a number, M , of FIR filters 309-1, . . . , 309-M in one-to-one correspondence with the M microphone signals, as illustrated in FIG. 3. Each of the M FIR filters 309-1, . . ., 309-M has N_t number of filter taps, where £ = 1, . . . , Λf. Thus, it is not a requirement that all of the Af FIR filters 309-1, . . ., 309-M have the same number of taps. The pre-processor 209 preferably performs both spatial and temporal filtering in accordance with techniques described in B.D. Van Veen and K.M. Buckley, "Beamforming: A Versatile Approach to Spatial Filtering", IEEE ASSP Magazine, pp. 4-24, April 1988, which is hereby incorporated herein by reference. Consequently, the signal h (k) preferably consists of a filtered sum of measurements, for example, with Af filters of equal length Λ7,

M N

^{Λ(ic) =}∑ Σ ^ω«. ^e«^-P⁾ (5)

?=1 p=l

where e, (k-p) is the echo measured by channel £ (that is, the digital signal after amplification and A/D conversion) at time k-p. As indicated above, Af denotes the number of measurement channels, and is typically in the range of from 1 to 8. Also, N denotes the number of filter taps in each of the pre-processor FIR filters 309-1, . . ., 309-M. Typical values for N are in the range from 64 to 256. Thus, in a preferred embodiment of the invention, h (k) consists of the temporally and spatially weighted sum of measurements. Note that it is preferable for Af to be greater than 1, so that the spatial information associated with the echo may be utilized in the filtering process. That is, the signals from the Af sensors are subjected to spatial processing to eliminate those sound components that derive from the direction of the loudspeaker 105, while passing all other sound components. Temporal processing may also be applied to eliminate those sound components having frequency characteristics that match the frequency characteristics of sounds produced by the loudspeaker 105. The weights ω,_p can be determined from different techniques for beamformer design. The weights are preferably fixed once they have been determined. Determination of the weights can be performed by a number of techniques. Fixed beamfor er weights can be obtained, for example, by using a least squares criterium to solve an overdetermined system of linear equations, as described at page 12 of the above-cited Van Veen and Buckley article. However, this approach requires the determination of the array response vectors over a dense set of spatial-frequency points. The array response vectors can be calculated if an accurate mathematical model is available for describing the sound field, array geometry, amplifier characteristics and other pertinent factors. This is an extremely difficult task for a microphone array in an automobile compartment due to nearfield considerations, reflections, channel matching, and the like. The response vectors can also be measured in the actual environment, but this is also a complicated matter, making it difficult to obtain accurate results.

A preferred method for obtaining fixed weights is to use an adaptive modelling technique as depicted in FIG. 4. In addition to being easier to work with, the use of adaptive algorithms also has the advantage of enabling one to more easily design a system that cancels out noise (e.g., road and tire noise) in addition to the echo.

The beamformer design is performed in the actual acoustic environment in which it will be used. All of the electronic equipment that will subsequently be used in the echo cancelling system should be disposed in the acoustic environment during the weight-determination phase of the design. This equipment includes, but is not limited to, the array of microphones 401-1, . . ., 401-M and the telephone's hands-free loudspeaker 405. For purposes of determining the weights, a second loudspeaker 425 is also disposed within the acoustic environment at a location that approximates that from which a telephone user's voice would originate.

A signal, S_sptech, which simulates the voice of a typical user, is simultaneously fed to an adaptive algorithm as a desired signal, and to the second loudspeaker 425. At the same time, a typical jamming echo noise signal, S_n fed to the hands-free loudspeaker 405. The weights of the pre-processor are then adapted to correct filter weights according to any suitable criterion, such as the Least Mean Squares criterion using the well-known Least Mean Squares algorithm. Alternatively, suitable adaptive algorithms are described in the above-referenced Van Veen and Buckley article, beginning at page 17.

Referring back now to FIG. 2a, the pre-processor 209, which receives signals from a plurality of sensors (e.g., the microphones 201), may be used alone to cancel an echo. However, in accordance with another aspect of the invention, the pre-processed echo signal (h (k) ) 215 is then preferably used in order to form the residual echo signal (r (k) ) 217 as r (k) = h (k) - g(k) (6) where g(k) is the output from a conventional echo canceler 211 as previously described.

The invention has been described with reference to a particular embodiment. However, it will be readily apparent to those skilled in the art that it is possible to embody the invention in specific forms other than those of the preferred embodiment described above. This may be done without departing from the spirit of the invention. For example, application of the invention is not limited only to hands-free telephone equipment in an automobile. Instead, the invention may be applied to cancel echoes in many other situations. An incomplete list of examples includes eliminating echoes that arise in connection with domestic speakerphones, teleconference studios and spoken-commanded computers. Moreover, the use of multiple sensors and pre¬ processing as taught in this disclosure is not limited to the cancellation of only acoustic echoes. Rather, it may be applied as well in situations where it is desired to remove any type of echo signal, such as those existing in the form of electromagnetic radiation. All that is required is that the sensors be appropriate for the type of signals to be detected.

Thus, the preferred embodiment is merely illustrative and should not be considered restrictive in any way. The scope of the invention is given by the appended claims, rather than the preceding description, and all variations and equivalents which fall within the range of the claims are intended to be embraced therein.

Claims

WHAT IS CLAIMED IS:

1. An echo suppression apparatus comprising: a plurality of sensors disposed in an environment for receiving signals at each of the sensors, wherein each of the signals includes a component derived from an echo source in the environment; and means for processing the received signals in order to transform the received signals into a processed signal having a reduced component derived from the echo source.

2. The echo suppression apparatus of claim 1, wherein the processing means includes means for using spatial information to process the received signals.

3. The echo suppression apparatus of claim l, wherein the processing means includes means for using temporal information to process the received signals.

4. The echo suppression apparatus of claim 3, wherein the processing means further includes means for using spatial information to process the received signals.

5. The echo suppression apparatus of claim 1, wherein the environment is an acoustic environment, and wherein the signals are acoustic signals.

6. The echo suppression apparatus of claim 5, wherein: the acoustic environment is a passenger compartment of an automobile; each of the sensors is a microphone belonging to hands-free communications equipment; and the echo source is a loudspeaker belonging to the hands-free communications equipment.

7. The echo suppression apparatus of claim ₁, wherein: the echo source is a loudspeaker having input signals derived from an electronic signal; and the echo suppression apparatus further comprises: means for generating an echo replica from the electronic signal, wherein the echo replica approximates the reduced component derived from the echo source; and means for subtracting the echo replica from the processed signal.

8. The echo suppression apparatus of claim 7, wherein the echo replica generating means is an adaptive filter.

9. A method for suppressing an echo component in signals that are received from an environment, the method comprising the steps of: receiving the signals at each of a plurality of sensors disposed in the environment, wherein at least one of the signals includes a component derived from an echo source in the environment; and processing the received signals in order to transform the received signals into a processed signal having a reduced component derived from the echo source.

10. The method of claim 9, wherein the step of processing includes using spatial information to process the received signals.

11. The method of claim 9, wherein the step of processing includes using temporal information to process the received signals.

12. The method of claim 11, wherein the step of processing further includes using spatial information to process the received signals.

13. The method of claim 9, wherein the environment is an acoustic environment, and wherein the signals are acoustic signals.

14. The method of claim 13, wherein: the acoustic environment is a passenger compartment of an automobile; each of the sensors is a microphone belonging to hands-free communications equipment; and the echo source is a loudspeaker belonging to the hands-free communications equipment.

15. The method of claim 9, wherein: the echo source is a loudspeaker having input signals derived from an electronic signal; and the method further comprises the steps of: generating an echo replica from the electronic signal, wherein the echo replica approximates the reduced component derived from the echo source; and subtracting the echo replica from the processed signal.

16. The method of claim 15, wherein the step of generating the echo replica comprises using an adaptive filter to generate the echo replica.