WO2001024577A1 - Process for removing voice from stereo recordings - Google Patents

Process for removing voice from stereo recordings Download PDF

Info

Publication number
WO2001024577A1
WO2001024577A1 PCT/US2000/026601 US0026601W WO0124577A1 WO 2001024577 A1 WO2001024577 A1 WO 2001024577A1 US 0026601 W US0026601 W US 0026601W WO 0124577 A1 WO0124577 A1 WO 0124577A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
instruction
stream
unmixing
processing
Prior art date
Application number
PCT/US2000/026601
Other languages
French (fr)
Inventor
Jean Laroche
Tyler Brown
Alan Peevers
Robert Sussman
Mark Dolson
Original Assignee
Creative Technology, Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Creative Technology, Ltd. filed Critical Creative Technology, Ltd.
Priority to US10/415,770 priority Critical patent/US8767969B1/en
Priority to AU79873/00A priority patent/AU7987300A/en
Publication of WO2001024577A1 publication Critical patent/WO2001024577A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S5/00Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation 
    • H04S5/005Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation  of the pseudo five- or more-channel type, e.g. virtual surround
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/008Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/05Generation or adaptation of centre channel in multi-channel audio systems

Definitions

  • the present invention relates generally to a system for processing audio and music signals, and more particularly, to a dynamic processing system to produce enhanced audio and music signals.
  • FIG. 1 shows a top view of a listening room 100 containing typical music processing equipment including a music source 102, an amplifier 104 and four speakers 106.
  • the music source 102 is a compact disk (CD) player, but could be another type of source, like a cassette tape player.
  • the music source 102 couples to the amplifier 104 so that music received by the amplifier 104 can be amplified and transmitted over cables 108 to the speakers 106.
  • a listener 110 is located approximately at the center of the listening room so that the four speakers are roughly the same distance away.
  • the speakers are designated as front left (FL), front right (FR), rear left (RL) and rear right (RR). When music is played through the speakers it is possible for the listener 110, who is facing front, to perceive spatial positions relating to sound components within the music.
  • the listener 110 may perceive that a singer's voice 112 is directly in front of him.
  • the listener may also perceive that the sound of a piano 114 is to his front and right, and that the sound of a guitar 116 is behind and to the left.
  • FIG. 1 depicts the spatial position of musical instruments, it is also possible to perceive spatial positions for other sound generating objects.
  • spatial positions for the sound of an automobile engine or the sound of the ocean may also be perceived using the listening room 100 as show in FIG. 1.
  • a significant problem exists in that the spatial positions and sound qualities of the sound components in a recording, such as on a CD, are determined when the recording is created.
  • the sound components of a sound signal may be associated with different spatial positions or sound qualities that may be more enjoyable to the listener.
  • the present invention provides a system for processing a sound signal that allows listeners to dynamically customize perceived spatial positions and sound qualities of sound components associated with the sound signal.
  • the listener may configure the system to reposition the perceived position of a singer's voice or may cause the perceived position of the singer's voice to dynamically change in accordance with a preprogrammed script.
  • the listener may also use the system to automatically reposition the perceived spatial positions of the sound components based on events detected within the sound signal itself. For example, the detected beat of a drum may be used to changed the perceived spatial position of the singer's voice. It is also possible to use the system to change the sound qualities of the sound components as desired.
  • One embodiment of the present invention includes apparatus for processing a sound signal that comprises an input to receive the sound signal, a sound unmixer coupled to the input to receive the sound signal and unmix at least one sound stream from the sound signal based on at least one unmixing instruction, and an output coupled to the sound unmixer to output the at least one sound stream.
  • FIG. 1 shows a listening room containing prior art music processing components
  • FIG. 2 shows a block diagram of a sound processing system constructed in accordance with the present invention.
  • FIG. 3 shows a detailed block diagram of the sound processing system of FIG. 2;
  • FIG. 4 is a block diagram depicting the operations of a sound unmixer included in the present invention
  • FIG. 5 is a block diagram of a computer system for implementing a sound unmixer in accordance with the present invention
  • FIG. 6 shows an exemplary format for a control script for use in accordance with the present invention
  • FIG. 7 shows a sound processing method for use with the sound processing system of FIG. 3;
  • FIG. 8 shows an exemplary control script that can be used to process a sound signal in accordance with the present invention
  • FIG. 9 shows the effects of sound processing a sound signal using the exemplary control script of FIG. 8; and FIG. 10 shows an exemplary portion of a storage medium 1000 that includes a data track with embedded sound data and script data.
  • the present invention provides a system for processing sound signals that allows listeners to dynamically customize perceived spatial positions and/or sound qualities of components associated with the sound signals.
  • FIG. 2 shows a block diagram of a sound processing system 200 constructed in accordance with the present invention.
  • the sound processing system 200 includes a sound source 202, a sound unmixer 204, a stream processor 206, a mixer 208 and an instruction generator 210.
  • the sound source 202 has a sound output 212 that couples to the sound unmixer 204.
  • the sound source may be any type of sound source, such as a CD player, or cassette tape player.
  • the sound source may also be a device that outputs sound data, such as a computer or a musical instrument like an electronic keyboard. Even a microphone picking up a live performance is suitable for use as a sound source in the present system.
  • the sound output 212 includes digital data representative of the sounds to be processed. If the sound source 202 is a CD, then digital data on the CD would be transmitted on the sound output 212. If the sound source is a cassette tape, wherein an analog signal represents the sounds to be processed, an analog to digital (A/D) converter could be included in the sound source to produce digital sound data for transmission on the sound output 212.
  • A/D analog to digital
  • the sound source 202 is a modified sound source that is capable of operating with modified media, such as modified CDs or cassette tapes that have sound data and control data stored on them.
  • modified media such as modified CDs or cassette tapes that have sound data and control data stored on them.
  • the modified sound source would be able to output both the digital sound data 212 and the control data 226 when playing back the modified media.
  • the sound signal can be a single signal or a combination of signals.
  • the sound source may be a CD player and the sound signal may be two signals representing the left and right channels, or four signals representing left and right channels for both front and back speaker locations.
  • the sound unmixer 204 is coupled to receive the sound output 212.
  • the sound unmixer 204 also receives unmix instructions 214.
  • the sound unmixer unmixes sound streams from the sound signal based on the unmix instructions.
  • the unmix instructions are provided by the instruction generator 210. A later section of this document provides a complete description of the unmix instructions.
  • the sound unmixer uses the unmix instructions to produce one or more sound streams 216, which are also referred to as "voices."
  • Each of the sound streams may represent various portions of the sound signal. For example, one stream may represent high frequency components of the sound signal 212, while a second stream represents low frequency components.
  • the sound unmixer is very flexible in the way that it unmixes sound streams to represent portions of the input sound signal. For example, special processing may be performed on the sound signal to produce an unmixed stream that contains only certain spectral components of the sound signal. It is also possible to output unmixed sound streams directly from the sound unmixer 204 as shown at 232.
  • the stream processor 206 is coupled to receives the unmixed streams 216.
  • the stream processor also receives processing instructions 218 from the instruction generator 210.
  • the stream processor processes the unmixed streams from the sound unmixer based on the processing instructions.
  • a later section of this document provides a complete description of the processing instructions.
  • the stream processor produces processed streams 220.
  • the stream processor 206 processes the sound streams 216 in a number of ways. For example, frequency domain processing, like pitch-shifting, may be performed. Other processes include three-dimensional (3D) position processing, wherein the perceived spatial positions of sounds represented by a stream are changed. Other types of processing performed by the stream processor 206, such as time domain processing, are described in greater detail in a later section of this document. It is also possible to output processed streams directly, as shown at 234.
  • the mixer 208 receives the processed streams 220 and combines them to form an output signal 222.
  • the mixer comprises logic to combine the processed streams in accordance with mixing instructions 224 received from the instruction generator 210.
  • the mixer may include delay lines or storage buffers to time synchronize the processed streams when forming the output signal 222.
  • the output signal 222 may then be input to a sound system, such as the sound system of FIG. 1, to reproduce the results of the sound processing system 200 for enjoyment by the listener.
  • Streams output directly from the sound unmixer 204 or the stream processor 206, such as streams 232 and 234, may also be input to the sound system, thereby bypassing the mixer 222.
  • the instruction generator 210 provides unmixing instructions 214, processing instructions 218 and mixing instructions 224. In one embodiment, the instruction generator 210 generates the instructions based on a control script received at control input 228. In another embodiment, the instruction generator generates the instructions based on information received at user input 230. In another embodiment, the instruction generator generates the instructions based on control data 226 received from the sound source 202, wherein the sound source is a modified sound source capable of outputting both sound 212 and control data 226. In another embodiment, the instruction generator generates the instructions based on information detected in the sound signal 212.
  • FIG. 3 shows a detailed block diagram of the processing system 200.
  • the output produced by the processing system is suitable for use in a sound system having four speakers, such as the sound system of FIG. 1.
  • embodiments of the present invention can be constructed having any number of outputs to support a sound system having any number of speakers.
  • the processor 206 is shown comprising a number of subprocessors 304 and a corresponding number of 3D position processors 306.
  • the subprocessors and 3D position processors are used to process the unmixed streams 216.
  • the subprocessors 304 are used to process the unmixed streams in ways that generally do not change their perceived spatial position. For example, a subprocessor may perform pitch-shifting or signal harmonizing on an unmixed stream. While such processes may change audible characteristics of the stream as perceived by a listener, they generally do not change the perceived spatial position, however, the subprocessors could be programmed to do so if desired. Thus, the subprocessors can perform all manner of signal processing on the unmixed streams to produce subprocessed streams 308. When the subprocessing is complete, the subprocessed streams 308 are input to the 3D position processors 306. The 3D position processors 306 operate to reposition the perceived spatial position of the sounds in the unmixed streams.
  • one unmixed stream may represent the singer's voice 112.
  • the singer's voice may be perceived to be directly in front of the listener.
  • the 3D position processors may operate on that stream to change the perceived position of the singer's voice.
  • the singer's voice may be repositioned to be behind the listener.
  • a more detailed example is provided in a later section of this document.
  • the 3D position processors produce positioning outputs 314 utilizing any 3D or 2D positioning technique.
  • the 3D position processors provide a portion of the unmixed stream to each speaker. By changing the portions of the sound stream provided to each speaker, the perceived spatial position of the stream may be repositioned around the listening room.
  • the processor instructions 218 determine what processes and positioning to perform on the streams 216.
  • the processor instructions 218 include subprocessor instructions 310 and position processor instructions 312.
  • the subprocessor instructions 310 are used by the subprocessors 304 to determine what signal processing functions are to be performed on the unmixed streams. For example, processes to produce pitch-shifting or echo effects.
  • the position processor instructions 312 are used by the 3D position processors 306 to determine how to change the perceived spatial position of the subprocessed streams 308.
  • the instruction generator 210 is capable of controlling the operation of both the subprocessors 304 and the 3D position processors 306.
  • the processor outputs 220 of the processor 206 are coupled to the mixer 208. Assuming that the sound processing system is designed to produce results for playback on a 4 speaker system, each of the 3D processors produces four position signals. The position signals will produce the desired spatial position for the stream when input into a four speaker sound system. It will be apparent to one with skill in the art that any number of speakers may be located in the listening room, and that based on speaker arrangements, the perceived position of unimxed streams may be changed to virtually any position.
  • the mixer 208 mixes together the processed signals 220 representing all the streams to produce four output signals 222 suitable for use with a four speaker sound system.
  • the mixer 208 receives mixing instructions to determine how to mix together the streams. Thus it is possible to adjust the relative signal level of one stream with respect to another when forming the output signals 222. As a result, when played on a four speaker system, all of the streams will be perceived by a listener to have the desired processing and corresponding spatial positions.
  • the unmixer 204 creates and outputs the unmixed streams 216 using an unmixing process described in a later section of this document.
  • the unmixer 204 is capable of outputing multiple unmixed streams, wherein each stream may be input to a separate subprocessor included in the processor 206.
  • the instruction generator 210 produces instructions for the sound unmixer 204, the subprocessors 304, the 3D processors 306 and the mixer 208.
  • the instruction generator 210 includes a control sequencer 316, a sound analyzer 318 and a commumcation interface 350..
  • the script input 228 couples to the communication interface 350.
  • the communication interface 350 receives the script data from an external source and provides it to the control sequencer 316 via script channel 352.
  • the communication interface may include a modem for connecting to other computers or computer networks.
  • the commumcation interface may also include additional memory for storage of received script data.
  • Other types of communication devices may be contained in the commumcation interface 350. For example, infra-red (IR), radio frequency (RF) or other type of commumcation device may be included in the commumcation interface 350 so that script data may be received from a variety of sources.
  • IR infra-red
  • RF radio frequency
  • the control sequencer is also coupled to receive control data 226 that may be included as part of the sound source, when the sound source is a modified sound source that outputs both sound signals and control data.
  • control data 226 may be included as part of the sound source, when the sound source is a modified sound source that outputs both sound signals and control data.
  • the control script information may be embedded on a modified CD containing both music and script data. In that case, a single CD would contain music and a control script defining how the music is to be processed to achieve a specific effect on playback.
  • the control sequencer also includes a memory 322 having script presets.
  • the script presets are determined before processing begins and are stored in the memory 322 for future use.
  • the sound analyzer 318 is also part of the instruction generator 210.
  • the sound analyzer 318 is coupled to the sound source 202 to receive the sound signal 212 and to detect selected events within the sound signal. For example, the beat of a drum or a crash of a cymbal may be events that are detected by the sound analyzer.
  • the control sequencer 316 instructs the sound analyzer 318 to detect selected events via an event channel 320.
  • the event channel 320 is also used by the sound analyzer to transmit indications to the control sequencer 316, that the selected events have been detected.
  • the control sequencer uses these detected events to control the generation of instructions to the components of the sound processing system 200.
  • the user input 230 couples to the control sequencer 316 to allow a user to interact with the instruction generator 210 to control operation of the sound processing system 200.
  • the user may use the user input to select whether the external script input 228 or the control data input 226 are used to receive scripts for processing the sound signal 212.
  • the user may also specify operation of any of the other components of the sound processing system by using the user input.
  • the user can instruct the control sequencer 316 to activate the sound analyzer 318 to detect selected events in the sound signal 212. Further, upon detection of the selected events, the control sequencer will use the presets stored in the memory 322 to generate instructions for the components of the sound processing system.
  • the user input 230 may also be used to enter control script information directly into the instruction generator 210.
  • the unmixer 204 and the instruction generator 210 provide unmixed streams 216 and control instructions 214, 310, 312, 224 to an external system (not shown) that may include subprocessors, 3D position processors and mixers.
  • the external system may be another computer program or computer system including hardware and software.
  • the external system may also be located at a different location from the components of the system 200.
  • it is possible to distribute the processing of the unmixed streams to one or more systems.
  • merely distributing the processing does not deviate from the scope of the invention, which includes ways to produce unmixed streams which may be processed in accordance with instructions based on a control script.
  • the sound is processed using events detected within the sound itself.
  • sound is processed using script information embedded with the sound at the sound source.
  • the script information is independent from the sound source, for example, a separate data file, that can be input to the control sequencer 316 to control how the sounds are processed.
  • the invention is related to the use of the sound processing system 200 for dynamic sound processing.
  • dynamic sound processing is provided by the sound processing system 200 in response to the control sequencer 316 executing one or more sequences of one or more instructions. Such instructions may be read into the control sequencer 316 from another computer-readable medium, such as the sound source 202.
  • Non-volatile media include, for example, optical or magnetic disks, such as those that may be used in conjunction with the sound source 202.
  • Volatile media include dynamic memory, such as dynamic memory that may be associated with the presets 322.
  • Transmission media include coaxial cables, copper wire, and fiber optics, including the wires that comprise the script input 228. Transmission media can also take the form of radio or light waves, such as those generated during radio frequency (RF) and infra-red (IR) data communications.
  • RF radio frequency
  • IR infra-red
  • Common forms of computer-readable media include, for example, floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns or holes, a RAM, a PROM, an EPROM, a FLASH-EPROM, computer data storage structure, any other memory chip or cartridge, a carrier wave as describe hereinafter, or any other medium from which a computer can read.
  • Various forms of computer-readable media may be involved in carrying one or more sequences of one or more instructions to the control sequencer 316 for execution.
  • the instructions may initially be borne on a magnetic disk of a remote computer.
  • the remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
  • a modem local to the sound processing system 200 can receive the data on the telephone line via the script input 228.
  • the communication interface 350 receives the data and forwards the data over the channel 352 to the control sequencer 316 which executes instructions included in the data.
  • the instructions received by the control sequencer 316 may optionally be stored in an internal memory within the control sequencer either before or after execution by the control sequencer 316.
  • the communication interface 350 provides a two-way data communication coupling to a script input 228 that may be connected to a local network (not shown).
  • the communication interface 350 may be an integrated services digital network (ISDN) card or a modem to provide a data commumcation connection to a corresponding type of telephone line.
  • the commumcation interface 350 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented.
  • the communication interface 350 sends and receives electrical, electromagnetic, or optical signals that carry digital data streams representing various types of information.
  • a connection may be established through a local network (not shown) to a host computer or to data equipment operated by an Internet Service Provider (ISP).
  • ISP Internet Service Provider
  • the ISP provides data communication services through the worldwide packet data communication network, now commonly referred to as the "Internet.”
  • the local network and the Internet both use electrical, electromagnetic or optical signals that carry digital data streams.
  • the signal through the various networks and the signals on the script input 228 and through the communication interface 350, which carry the digital data to and from the sound processing system 200, are exemplary forms of carrier waves transporting the information.
  • the sound processing system 200 can send messages and receive data, including program codes, through the networks(s), the script input 228 and the commumcation interface 350.
  • an Internet server might transmit code for an application program through the Internet, ISP, local network, and communication interface 350.
  • one such downloaded application provides for dynamic sound processing as described herein.
  • the received code may be executed by the control sequencer 316 as it is received, and/or stored in memory 322 as presets, or other non- volatile storage for later execution. In this manner, the sound processing system 200 obtains application code in the form of a carrier wave.
  • FIG. 4 shows a block diagram depicting the functionality of one embodiment of the unmixer 204 including various internal operations and corresponding signals. In FIG.
  • the input sound signal 212 includes left (L) and right (R) stereo channels, however, it will be obvious to one skilled in the art that minor modifications can be made to process more sound channels without deviating from the scope of the invention.
  • the left and right stereo channels are input to discrete Fourier transform (DFT) blocks 402L and 402R, respectively.
  • DFT discrete Fourier transform
  • the stereo channels will be in the form of digital signals.
  • the channels can be digitized using techniques well-known in the art.
  • the outputs of the DFT blocks 402L and 402R are the frequency domain spectra of the left and right stereo channels. Peak detection blocks 404L and 404R detect the peak frequencies where peaks occur in the frequency domain spectra.
  • This information is then passed to a subtraction block 406, which generates a difference spectra signal having values equal to the difference of the left and right frequency domain spectra at each peak frequency. If voice signals are panned to center, then the magnitudes and phases of the frequency domain spectra for each channel at voice frequencies will be almost identical. Accordingly, the magnitude of the difference spectra at those frequencies will be small.
  • the difference signal as well as the left and right peak frequencies and frequency domain spectra are input to an amplitude adjustment block 410.
  • the amplitude adjustment block utilizes the magnitudes of the difference spectra and frequency domain spectra of each channel to modify the magnitudes of the frequency domain spectra of each channel and output a modified spectra.
  • the magnitude of the modified spectra depends on the magnitude of the difference spectra. Accordingly, the magnitude of the modified frequency domain spectra will be low for frequencies corresponding to voice.
  • the modified frequency domain spectra for each channel is input to inverse discrete Fourier (LDFT) transform blocks 412L and 412R, which output time domain signals based on the modified spectra.
  • LDFT inverse discrete Fourier
  • the modified stereo channels (L' and R') output by the IDFT blocks 412L and 412R will have the voice removed. However, the instruments and other sounds not panned to the center will remain in the original stereo channels so that the stereo quality of the recording will be preserved.
  • a center output containing the unmixed spectra is input to LDFT block 412C that outputs time domain signals (C) based on the unmixed spectra.
  • the time domain signals L', C and R' are input to mixer 414 that combines the received signals to produce seven "voices." Each voice represents some combination of the L', C and R' signals. Therefore, it is possible that V0 represents only the C signal and that VI is comprised of some proportion of L' and C, for example.
  • the unmixing instructions 214 are received by the unmixer 204 and used to determine how to unmix the input signal 212 to form the output voices (VO-7). For example, the unmixing instructions specify how to combine the L', C and R' outputs to form the voice outputs.
  • the unmixing instructions also provide unmixing parameters that can be used by the subtractor 406 and the amplitude adjustor 410 to select a portion of input signal 212 to be unmixed and provided to the IDFT block 412C. For example, the unmixing parameters are used to select a center portion of the input signal 212 to be unmixed. Thus, equal amplitudes of frequency peaks that occur in both the left and right stereo channels would be unmixed.
  • the unmixing parameters include amplitude weighting parameters that may be used to unmix signals that do not appear equally in both left and right channels.
  • the singer's voice used in the above example may be spatially positioned off center, and thus, more toward either the left or right channel.
  • the frequency peaks representing the singer's voice would have greater amplitude corresponding to the side where the singer voice is located.
  • the amplitude weighting parameters are used by the subtractor 406 and the amplitude adjustor 410 to unmix the singer's voice by compensating for the greater amplitude of the frequency peaks representing the singer's voice that appear one channel (either left or right).
  • the above described unmixing process can be used to unmix virtually any part of the input signal to produce one or more of the voice outputs.
  • the unmixing is performed by hardware and/or software that receives the unmixing instructions and performs the above defined functions accordingly. The various operations performed by the blocks of FIG. 4 will now be described in greater detail.
  • a frequency-domain representation of the input signal 212 can be obtained by use of a phase-vocoder, a process in which the incoming signal is split into overlapping, windowed, short-term frames which are then processed by a Fourier Transform, resulting in a series of short-term frequency domain spectra representing the spectral content of the input signal in each short-term frame.
  • the frequency-domain representation can then be altered and a modified time-domain signal reconstructed by use of overlapping windowed inverse Fourier transforms.
  • the phase vocoder is a very standard and well known tool that has been used for years in many contexts (voice coding high-quality time-scaling frequency-domain effects and so on).
  • X L ( ⁇ k ,t) the short-term spectrum of the left signal
  • the frequency channel
  • t the time corresponding to the short-time frame
  • X R the short-term spectrum of the right signal
  • X L ( ⁇ k , t) and X R ( ⁇ k , t) are arrays of complex numbers with amplitudes and phases.
  • the first step consists of identifying peaks in the magnitudes of the short-term spectra. These peaks indicate sinusoidal components that can either belong to the singer's voice or to background instruments. To find the peaks, one calculates the magnitude of X L ( ⁇ k ,t) or of X R ( ⁇ k ,t)o ⁇ of X L ( ⁇ k ,t) + X R ( ⁇ k ,t) and one performs a peak detection process.
  • One such peak detection scheme consists of declaring as peaks those channels where the amplitude of a channel is larger than the two neighbor channels on the left and the two neighbor channels on the right. Associated with each peak is a so called region of influence composed of all the frequency channels around the peak. The consecutive regions of influence are contiguous and the limit between two adjacent regions can be set to be exactly mid-way between two consecutive peaks or to be located at the channel of smallest amplitude between the two consecutive peaks. Difference Calculation and Gain Estimation
  • the Left-Right difference signal in the frequency domain is obtained next by calculating the difference between the left and right spectra using:
  • D( ⁇ ko ,t) X L ( ⁇ ko , ⁇ - X R ( ⁇ ko ,t) (1) for each peak frequency ⁇ k .
  • the key idea is to calculate how much of a gain reduction it takes to bring X L ( ⁇ k ,t) and X R ( ⁇ k ,t) down to the level of D( ⁇ k ,t) and apply this gain in the frequency domain, leaving the phases unchanged.
  • the gains G L R ⁇ k ,t) are real, and therefore the modified channels Y L R ⁇ p. k ,t) have the same phase as the original channels X L R l k ,t) but their magnitudes have been modified.
  • G L R [ ⁇ k ,t) should be small whenever T L R ⁇ k ,t) is small and should be close to 1 whenever V l R ( ⁇ k , t) is close to 1.
  • One choice is to define where the modified channels Y L R ( ⁇ k ,t) are given the same magnitude as the difference
  • G i, ⁇ ( ⁇ , o ,t) (r i, ⁇ ( ⁇ t)) ⁇ with ⁇ > 0.
  • 1 removes exactly the same amount as the standard Left-Right technique.
  • the gain function is a function based on the magnitude of the difference spectra.
  • the gains G L R ⁇ ko ,t) should be chosen to be close to 1 for small T L R ⁇ kt> ,t) and close to 0 for
  • G L R ( ⁇ k ,t) is small for channels that belong to background instruments (for which F L R ( ⁇ k ,t) is close to 1), background instruments are attenuated while the voice is left unchanged. Thus, it is possible to unmix the voice components from the sound signal.
  • Gain Smoothing It is often useful to perform time-domain smoothing of the gain values to avoid erratic gain variations that can be perceived as a degradation of the signal quality. Any type of smoothing can be used to prevent such erratic variations. For example, one can generate a smoothed gain by setting
  • is a smoothing parameter between 0 (a lot of smoothing) and 1 (no smoothing) and (t - 1) denotes the time at the previous frame and G is the smoothed version of G.
  • Other types of linear or non-linear smoothing can be used.
  • components belonging to an instrument panned in the center such as a bass-guitar or a kick drum
  • whose spectral content do not overlap that of the voice will not be attenuated as they would with the standard method.
  • ⁇ , ⁇ ( ⁇ io ,t) 0 for ⁇ , o ⁇ ⁇ mi ⁇ or ⁇ ko > ⁇ so that instruments falling outside the voice range would be removed automatically regardless of where they are panned.
  • FIG. 5 is a block diagram of one embodiment of the unmixer 204 that includes a CPU, memory, input and output system, and peripherals, suitable for use to unmix selected sound components of an input signal.
  • the unmixer 204 is capable of receiving the unmixing instructions 214 and executing unmixing software that interprets the unmixing instructions 214 to perform unmixing operations to produce the desired voices (V0-7).
  • the unmixer 204 includes a digital signal processor (DSP) (not shown) under control of the CPU.
  • DSP digital signal processor
  • FIG. 6 shows an exemplary embodiment of sound processing instructions 600 constructed in accordance with the present invention.
  • the sound processing instructions contain name and value pairs that can be used by the components of the sound processing system 200 to process sounds in accordance with the present invention. The following is a description of the name and value pairs of the sound processing instructions 600. However, the following list is exemplary and not intended to provide an exhaustive list of all possible processing instructions.
  • FIG. 7 shows a sound processing method 700 for use with the sound processing system 200 of FIG. 3.
  • the sound processing method 700 can be used to process an input sound source to reposition perceived spatial positions of sound components within the sound signal.
  • a sound source provides a sound signal to the sound processing system of the present invention.
  • the sound source 202 provides the sound signal 210 for processing.
  • a control script for processing the sound signal is determined.
  • the user instructs the control sequencer where to find the control script.
  • the user indicates via the user input 230 that an external script is to be received from the script input 228 or that a script accompanying the sound signal at script data input 226 is to be used.
  • control sequencer 316 begins obtaining script instructions from the selected script input.
  • control sequencer decodes the script and generates unmixing instructions to the sound unmixer 204.
  • the unmixing instructions provide coefficients for forming one or more voices 216 output from the unmixer.
  • one or more voices 216 are output from the unmixer 204 in response to the unmixing instructions.
  • FIG. 3 depicts three voices 216, any number of voices may be produced by the unmixer 204.
  • the control sequencer 316 generates processing instructions 310 to transmit to the subprocessors 304 for processing the voices 216 created by the unmixer 204.
  • the processing instructions instruct the subprocessors 304 to perform, for example, frequency based processing, such as pitch-shifting or signal harmonizing.
  • the processing may also include time based processing, such as signal filtering.
  • the control sequencer 316 generates positioning instructions 312 to transmit to the position processors 306 to adjust the perceived spatial positions of the subprocessed voices 308.
  • the position processors outputs a signal for each of the four speakers to produce a perceived position of the voice to the listener.
  • varying amounts of the voice appear in the 3D processor outputs 220.
  • the control sequencer 316 generates mixing instructions to mix the processed signals 220 together. This is achieved by the mixer 208, which mixes the signals received from the processor 206, according to the mixing instructions 224, to form mixer outputs 222. The mixer outputs are transmitted to the speakers to produce sounds corresponding to the processing and spatial repositioning which can be perceived by the listener.
  • the method continues by processing any remaining script instructions that exist. For example, if the sound signal is a song that lasts three minutes, the script may include a list of instructions to be processed for the three minute duration.
  • time synchronization exists between the components of the processing system 200 and the sound signal. For example, if a sound signal is three minutes in duration, and spatial repositioning is to occur at two minutes into the sound signal, the instruction generator 210, the unmixer 204 and the stream processors 206 are synchronized to achieve this.
  • the sound signal and the control scripts include time stamps.
  • the control sequencer 316 generates instructions to the components of the processing system by reading the time stamps on the control script and sending the instructions at the appropriate time in the processing.
  • the subprocessors 304 and the position processors 306, read the time stamps on the instructions they receive and match those time stamps with time stamps accompanying the sound signal. Thus, it is possible to know exactly when processing is to be applied to a particular stream.
  • the mixer 208 also receives time stamp information with its instructions from the control sequencer 316.
  • the mixer uses the time stamp information to determine when to apply selected mixing functions.
  • the mixer can also obtain time stamp information from each received stream and align the received streams based on the time stamps before combining them, so that no distortion is introduce by combining mis-aligned streams.
  • a master clock is coupled to the components of the processing system 200, and is used to synchronize the components with the time stamps accompanying the sound signal and script file.
  • a time stamp accompanying the sound signal is used to synchronize the system. In that case, each component reads the time stamp on the sound signal it is to process in order to determine when to apply the script instructions.
  • the sound source provides an analog signal that is converted to a digital signal and tagged with a time stamp which can then be used by the components in the sound processing system 200.
  • FIG. 8 shows an exemplary script 800 for use in processing sounds in accordance with the present invention.
  • the script 800 comprises seven instructions 801-807, which are to be processed to create desired spatial effects on selected sound components of sounds from a sound source.
  • FIG. 9 shows the listening room 100 of FIG. 1 and includes a modified music source 902 coupled to the sound processing system 200 of FIG. 3, which is further coupled to the amplifier 104.
  • the modified music source 902 is modified in accordance with the present invention and has a sound output 904 and a script output 906 coupled to the sound processing system 200.
  • the modified sound source 902 may be a CD player that plays a CD having both a music track and a script file embedded on it. During playback, the music track is fransmitted from the sound output 904 and the associated script file is transmitted from the script output 906.
  • the sound processing system 200 has its four outputs 220 coupled to the amplifier 104 that provides sound signals to the four speakers in the listening room 100.
  • the exemplary script 800 will be assumed to be the script embedded on the CD with the music track.
  • the music track contains sounds representative of a singer's voice, a piano and a guitar.
  • the perceived spatial positions relative to the listener 110, of the voice 908, piano 910 and guitar 912 are as shown in FIG. 9.
  • the listener 110 is in the center of the listening room 100, facing front and equidistant from the four speakers.
  • the instruction generator recieves the first three script instructions and generates the appropriate instruction for each component of the sound processing system 200.
  • the first instruction 801 commands the sound processing system to execute a create voice command (885), to create voice LD 2 (886) using the center unmixing technique (887).
  • the center unmixing technique uses coefficients 0, 1, and 2 (888), where only the coefficient 0 has a value greater than zero.
  • the command begins at time stamp 0:00 (889) and produces a perceived voice at an angle of 0 degrees (890) at a radius 1 meter (891).
  • the voice becomes active .1 seconds (892) after the time stamp 0:00.
  • the second instruction 802 commands the sound processing system to execute a create voice command (893), to create voice ID 3 (894) using the center unmixing technique (895).
  • the center unmixing technique uses coefficients 0, 1, and 2 (896).
  • the voice becomes active .1 seconds (897) after the time stamp 0:00.
  • This instruction maintains the position of sound components located at the center as provided by the original source.
  • the third instruction 803 commands the sound processing system to execute a create voice command (870), to create voice ID 4 (872) using the center unmixing technique (874).
  • the center unmixing technique uses coefficients 0, 1, and 2 (876).
  • the command begins at time stamp 0:00 (878) and produces a perceived voice at an angle of 0 degrees (880) at a radius 1 meter (882).
  • the voice becomes active .1 seconds (884) after the time stamp 0:00.
  • This instruction maintains the position of sound components located at the left side as provided by the original source.
  • the sound processing system essentially produces sound components having spatial positions corresponding to the spatial positions initially provided by the sound source.
  • the voices having IDs (2, 3 and 4) created by the instructions 801, 802 and 803, include the singer's voice 908, the guitar 912 and the piano 910. Center unmixing was used for each instruction and the three relevant coefficients are set so that portions of the Left, Right and Center of the original sound source are used to create the voices having IDs of 2, 3 and 4. These instructions generally maintain the perceived position of the sound components as provided by the sound source.
  • the fourth instruction 804, embedded on the CD is input to the sound processing system.
  • the fourth instruction commands the sound processing system to execute a create voice command (810), to create voice LD 1 (812) using the center unmixing technique (814).
  • the center unmixing technique uses coefficients 0, 1, and 2 (816).
  • the command begins at time stamp 1:00 (818) and produces a perceived center voice at an angle of 90 degrees (820) at a radius 1 meter (822).
  • the voice becomes active .1 seconds (824) after the time stamp 1 :00.
  • the voice created by the instruction 804 includes the center portion of the sound which generally includes the singer's voice. Since center unmixing was used, the coefficients 816 are set so that only the coefficient for the center band has a nonzero value. As a result, the sounds in the center portion, which include the singer's voice, are rotated from an initial spatial position at 908 to the position shown at 914. Two copies of the singer's voice can now be perceived by the listener. The first copy is derived from voice ID 3 and is percieved at position 908. The second copy is derived from voice LD 1 and is percieved 1 meter from the listener at an angle of 90 degrees as shown at 914.
  • the fifth instruction (805) modifies the voice created having voice ID 1 (828), by executing another create voice command (826).
  • the fifth instruction commands the music processing system to execute the create voice command again using the center unmixing technique (830).
  • the center unmixing technique uses coefficients 0,1 and 2 (832).
  • the command begins at time stamp 1 :30 (834) and produces a perceived voice at an angle of 180 (836) and a radius 1 meter (838).
  • the voice becomes active .1 seconds (840) after the time stamp 1:30.
  • the voice created by the instruction 805 is shown. Notice the effect of the instruction was to rotate the perceived voice from the position 914 to a new position shown at 916.
  • the perceived positions of the piano 908 and the guitar 910 are not changed by the execution of the instruction 805, since they are not part of the stream unmixed using the center unmixing technique.
  • two copies of the singer's voice are percieved, one at position 908 due to voice LD 3, and one at position 916 due to voice ID 1.
  • the sixth instruction (806) again modifies the voice created having voice ID 1 (842), by executing another create voice command (844).
  • the sixth instruction commands the music processing system to execute the create voice command again using the center unmixing technique (846).
  • the center unmixing technique uses coefficients 0,1 and 2 (848).
  • the command begins at time stamp 2:00 (850) and produces a perceived voice at an angle of 225 degrees (852) and a radius 1 meter (854).
  • the voice becomes active .1 seconds (856) after the time stamp 1 :30.
  • the voice created by the instruction 806 is shown. Notice the effect of the instruction was to again rotate the perceived voice from the position 916 to a new position shown at 918. Thus, two copies of the singer's voice are percieved, one at position 908 due to voice ID 3, and one at position 918 due to voice ID 1.
  • the seventh instruction (807) again modifies the voice created having voice ID 1 (858), by executing another create voice command (860).
  • the seventh instruction commands the music processing system to execute the create voice command again using the center unmixing technique (862).
  • the center unmixing technique uses coefficients 0,1 and 2 (864).
  • the command begins at time stamp 2:30 (866) and produces a perceived voice at an angle of 270 degrees (868) and a radius 1 meter (870).
  • the voice becomes active .1 seconds (872) after the time stamp 2:30.
  • the voice created by the instruction 807 is shown. Notice that the effect of the instruction is to again rotate the perceived voice from the position 918 to a new position shown at 920.
  • two copies of the singer's voice are percieved, one at position 908 due to voice ID 3, and one at position 920 due to voice ID 1.
  • the above example demonstrates that by providing script instructions to the sound processing system 200 included in the present invention, the perceived spatial position of sounds can be manipulated in a variety of ways given a particular speaker arrangement.
  • FIG. 10 shows an exemplary portion of a storage medium 1000 that has a data track 1002 with embedded sound 1004 and script data 1006 and can be used in accordance with the present invention.
  • the storage medium 1000 could be part of a CD, tape, disk or other type of storage medium used to store sound signals.
  • the present invention provides a method and apparatus for processing sound signals to produced enhanced sound signals. It will be apparent to those with skill in the art that modifications to the above methods and embodiments can occur without deviating from the scope of the present invention. Accordingly, the disclosures and descriptions herein are intended to be illustrative, but not limiting, of the scope of the invention which is set forth in the following claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

A system (200) for processing a sound signal (212) that allows dynamic customiization of perceived spatial positions and sound qualities of sound components associated with the sound signal (212). The system provides apparatus for processing a sound signal (212) that includes an input to receive the sound signal (212), a sound unmixer (204) coupled to the input to receive the sound signal (212) and unmix at least one sound stream (216) from the sound signal (212) based on at least one unmixing instruction (214), and an output coupled to the sound unmixer (214) to output the at least one sound stream (216).

Description

PROCESS FOR REMOVING VOICE FROM STEREO RECORDINGS
CROSS-REFERENCES TO RELATED APPLICATIONS This application is a continuation in part of U.S. Application No. 09/405,941 filed September 27, 1999, entitled PROCESS FOR REMOVING VOICE FROM STEREO RECORDINGS. This application also claims priority from co-pending U.S. Provisional Patent Application 60/165,058 filed November 12, 1999, entitled DYNAMIC REPROCESSING FOR ENHANCED AUDIO AND MUSIC, the disclosure of which is incorporated in its entirety herein for all purposes.
FIELD OF THE INVENTION The present invention relates generally to a system for processing audio and music signals, and more particularly, to a dynamic processing system to produce enhanced audio and music signals.
BACKGROUND OF THE INVENTION A consistent stream of technological developments has changed the way people listen to and enjoy audio and musical performances. For example, sound digitization has provided a way for large volumes of sound information to be stored on a small, light package known as a compact disk (CD). It is now possible for people to have home sound systems that rival even the best theater systems.
FIG. 1 shows a top view of a listening room 100 containing typical music processing equipment including a music source 102, an amplifier 104 and four speakers 106. The music source 102 is a compact disk (CD) player, but could be another type of source, like a cassette tape player. The music source 102 couples to the amplifier 104 so that music received by the amplifier 104 can be amplified and transmitted over cables 108 to the speakers 106. A listener 110 is located approximately at the center of the listening room so that the four speakers are roughly the same distance away. The speakers are designated as front left (FL), front right (FR), rear left (RL) and rear right (RR). When music is played through the speakers it is possible for the listener 110, who is facing front, to perceive spatial positions relating to sound components within the music. For example, the listener 110 may perceive that a singer's voice 112 is directly in front of him. The listener may also perceive that the sound of a piano 114 is to his front and right, and that the sound of a guitar 116 is behind and to the left. Although FIG. 1 depicts the spatial position of musical instruments, it is also possible to perceive spatial positions for other sound generating objects. For example, spatial positions for the sound of an automobile engine or the sound of the ocean may also be perceived using the listening room 100 as show in FIG. 1. However, a significant problem exists in that the spatial positions and sound qualities of the sound components in a recording, such as on a CD, are determined when the recording is created. Thus, it may not be possible for the sound components of a sound signal to be associated with different spatial positions or sound qualities that may be more enjoyable to the listener.
SUMMARY OF THE INVENTION
The present invention provides a system for processing a sound signal that allows listeners to dynamically customize perceived spatial positions and sound qualities of sound components associated with the sound signal. For example, the listener may configure the system to reposition the perceived position of a singer's voice or may cause the perceived position of the singer's voice to dynamically change in accordance with a preprogrammed script. The listener may also use the system to automatically reposition the perceived spatial positions of the sound components based on events detected within the sound signal itself. For example, the detected beat of a drum may be used to changed the perceived spatial position of the singer's voice. It is also possible to use the system to change the sound qualities of the sound components as desired.
One embodiment of the present invention includes apparatus for processing a sound signal that comprises an input to receive the sound signal, a sound unmixer coupled to the input to receive the sound signal and unmix at least one sound stream from the sound signal based on at least one unmixing instruction, and an output coupled to the sound unmixer to output the at least one sound stream.
Another embodiment of the present invention provides a method of processing a sound signal. The method comprising the steps of receiving the sound signal, unmixing at least one sound stream from the sound signal based on at least one unmixing instruction, and outputting the at least one sound stream. A further understanding of the nature and the advantages of the inventions disclosed herein may be realized by reference to the remaining portions of the specification and the attached drawings. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 shows a listening room containing prior art music processing components;
FIG. 2 shows a block diagram of a sound processing system constructed in accordance with the present invention.
FIG. 3 shows a detailed block diagram of the sound processing system of FIG. 2;
FIG. 4 is a block diagram depicting the operations of a sound unmixer included in the present invention; FIG. 5 is a block diagram of a computer system for implementing a sound unmixer in accordance with the present invention;
FIG. 6 shows an exemplary format for a control script for use in accordance with the present invention;
FIG. 7 shows a sound processing method for use with the sound processing system of FIG. 3;
FIG. 8 shows an exemplary control script that can be used to process a sound signal in accordance with the present invention;
FIG. 9 shows the effects of sound processing a sound signal using the exemplary control script of FIG. 8; and FIG. 10 shows an exemplary portion of a storage medium 1000 that includes a data track with embedded sound data and script data.
DESCRIPTION OF THE SPECIFIC EMBODIMENTS The present invention provides a system for processing sound signals that allows listeners to dynamically customize perceived spatial positions and/or sound qualities of components associated with the sound signals.
FIG. 2 shows a block diagram of a sound processing system 200 constructed in accordance with the present invention. The sound processing system 200 includes a sound source 202, a sound unmixer 204, a stream processor 206, a mixer 208 and an instruction generator 210.
The sound source 202 has a sound output 212 that couples to the sound unmixer 204. In one embodiment, the sound source may be any type of sound source, such as a CD player, or cassette tape player. The sound source may also be a device that outputs sound data, such as a computer or a musical instrument like an electronic keyboard. Even a microphone picking up a live performance is suitable for use as a sound source in the present system.
The sound output 212 includes digital data representative of the sounds to be processed. If the sound source 202 is a CD, then digital data on the CD would be transmitted on the sound output 212. If the sound source is a cassette tape, wherein an analog signal represents the sounds to be processed, an analog to digital (A/D) converter could be included in the sound source to produce digital sound data for transmission on the sound output 212.
In another embodiment, the sound source 202 is a modified sound source that is capable of operating with modified media, such as modified CDs or cassette tapes that have sound data and control data stored on them. Thus, the modified sound source would be able to output both the digital sound data 212 and the control data 226 when playing back the modified media.
The sound signal can be a single signal or a combination of signals. For example, the sound source may be a CD player and the sound signal may be two signals representing the left and right channels, or four signals representing left and right channels for both front and back speaker locations.
The sound unmixer 204 is coupled to receive the sound output 212. The sound unmixer 204 also receives unmix instructions 214. The sound unmixer unmixes sound streams from the sound signal based on the unmix instructions. The unmix instructions are provided by the instruction generator 210. A later section of this document provides a complete description of the unmix instructions.
Using the unmix instructions, the sound unmixer produces one or more sound streams 216, which are also referred to as "voices." Each of the sound streams may represent various portions of the sound signal. For example, one stream may represent high frequency components of the sound signal 212, while a second stream represents low frequency components. However, the sound unmixer is very flexible in the way that it unmixes sound streams to represent portions of the input sound signal. For example, special processing may be performed on the sound signal to produce an unmixed stream that contains only certain spectral components of the sound signal. It is also possible to output unmixed sound streams directly from the sound unmixer 204 as shown at 232. The stream processor 206 is coupled to receives the unmixed streams 216. The stream processor also receives processing instructions 218 from the instruction generator 210. The stream processor processes the unmixed streams from the sound unmixer based on the processing instructions. A later section of this document provides a complete description of the processing instructions. Using the processing instructions, the stream processor produces processed streams 220. The stream processor 206 processes the sound streams 216 in a number of ways. For example, frequency domain processing, like pitch-shifting, may be performed. Other processes include three-dimensional (3D) position processing, wherein the perceived spatial positions of sounds represented by a stream are changed. Other types of processing performed by the stream processor 206, such as time domain processing, are described in greater detail in a later section of this document. It is also possible to output processed streams directly, as shown at 234.
The mixer 208 receives the processed streams 220 and combines them to form an output signal 222. The mixer comprises logic to combine the processed streams in accordance with mixing instructions 224 received from the instruction generator 210. The mixer may include delay lines or storage buffers to time synchronize the processed streams when forming the output signal 222. The output signal 222 may then be input to a sound system, such as the sound system of FIG. 1, to reproduce the results of the sound processing system 200 for enjoyment by the listener. Streams output directly from the sound unmixer 204 or the stream processor 206, such as streams 232 and 234, may also be input to the sound system, thereby bypassing the mixer 222.
The instruction generator 210 provides unmixing instructions 214, processing instructions 218 and mixing instructions 224. In one embodiment, the instruction generator 210 generates the instructions based on a control script received at control input 228. In another embodiment, the instruction generator generates the instructions based on information received at user input 230. In another embodiment, the instruction generator generates the instructions based on control data 226 received from the sound source 202, wherein the sound source is a modified sound source capable of outputting both sound 212 and control data 226. In another embodiment, the instruction generator generates the instructions based on information detected in the sound signal 212.
FIG. 3 shows a detailed block diagram of the processing system 200. In the following description it will be assumed that the output produced by the processing system is suitable for use in a sound system having four speakers, such as the sound system of FIG. 1. However, it will be apparent to one with skill in the art that embodiments of the present invention can be constructed having any number of outputs to support a sound system having any number of speakers.
The processor 206 is shown comprising a number of subprocessors 304 and a corresponding number of 3D position processors 306. The subprocessors and 3D position processors are used to process the unmixed streams 216.
The subprocessors 304 are used to process the unmixed streams in ways that generally do not change their perceived spatial position. For example, a subprocessor may perform pitch-shifting or signal harmonizing on an unmixed stream. While such processes may change audible characteristics of the stream as perceived by a listener, they generally do not change the perceived spatial position, however, the subprocessors could be programmed to do so if desired. Thus, the subprocessors can perform all manner of signal processing on the unmixed streams to produce subprocessed streams 308. When the subprocessing is complete, the subprocessed streams 308 are input to the 3D position processors 306. The 3D position processors 306 operate to reposition the perceived spatial position of the sounds in the unmixed streams. For example, assuming the listener is seated in the listening room 100 and facing front, one unmixed stream may represent the singer's voice 112. The singer's voice may be perceived to be directly in front of the listener. The 3D position processors may operate on that stream to change the perceived position of the singer's voice. For example, the singer's voice may be repositioned to be behind the listener. A more detailed example is provided in a later section of this document.
To change the perceived position of a stream, the 3D position processors produce positioning outputs 314 utilizing any 3D or 2D positioning technique. For example, in one embodiment the 3D position processors provide a portion of the unmixed stream to each speaker. By changing the portions of the sound stream provided to each speaker, the perceived spatial position of the stream may be repositioned around the listening room.
The processor instructions 218 determine what processes and positioning to perform on the streams 216. The processor instructions 218 include subprocessor instructions 310 and position processor instructions 312. The subprocessor instructions 310 are used by the subprocessors 304 to determine what signal processing functions are to be performed on the unmixed streams. For example, processes to produce pitch-shifting or echo effects. The position processor instructions 312 are used by the 3D position processors 306 to determine how to change the perceived spatial position of the subprocessed streams 308. Thus, the instruction generator 210 is capable of controlling the operation of both the subprocessors 304 and the 3D position processors 306.
The processor outputs 220 of the processor 206 are coupled to the mixer 208. Assuming that the sound processing system is designed to produce results for playback on a 4 speaker system, each of the 3D processors produces four position signals. The position signals will produce the desired spatial position for the stream when input into a four speaker sound system. It will be apparent to one with skill in the art that any number of speakers may be located in the listening room, and that based on speaker arrangements, the perceived position of unimxed streams may be changed to virtually any position.
The mixer 208 mixes together the processed signals 220 representing all the streams to produce four output signals 222 suitable for use with a four speaker sound system. The mixer 208 receives mixing instructions to determine how to mix together the streams. Thus it is possible to adjust the relative signal level of one stream with respect to another when forming the output signals 222. As a result, when played on a four speaker system, all of the streams will be perceived by a listener to have the desired processing and corresponding spatial positions.
The unmixer 204 creates and outputs the unmixed streams 216 using an unmixing process described in a later section of this document. The unmixer 204 is capable of outputing multiple unmixed streams, wherein each stream may be input to a separate subprocessor included in the processor 206. The instruction generator 210 produces instructions for the sound unmixer 204, the subprocessors 304, the 3D processors 306 and the mixer 208. The instruction generator 210 includes a control sequencer 316, a sound analyzer 318 and a commumcation interface 350..
The script input 228 couples to the communication interface 350. The communication interface 350 receives the script data from an external source and provides it to the control sequencer 316 via script channel 352. The communication interface may include a modem for connecting to other computers or computer networks. The commumcation interface may also include additional memory for storage of received script data. Other types of communication devices may be contained in the commumcation interface 350. For example, infra-red (IR), radio frequency (RF) or other type of commumcation device may be included in the commumcation interface 350 so that script data may be received from a variety of sources.
The control sequencer is also coupled to receive control data 226 that may be included as part of the sound source, when the sound source is a modified sound source that outputs both sound signals and control data. For example, the control script information may be embedded on a modified CD containing both music and script data. In that case, a single CD would contain music and a control script defining how the music is to be processed to achieve a specific effect on playback.
The control sequencer also includes a memory 322 having script presets. The script presets are determined before processing begins and are stored in the memory 322 for future use.
The sound analyzer 318 is also part of the instruction generator 210. The sound analyzer 318 is coupled to the sound source 202 to receive the sound signal 212 and to detect selected events within the sound signal. For example, the beat of a drum or a crash of a cymbal may be events that are detected by the sound analyzer. The control sequencer 316 instructs the sound analyzer 318 to detect selected events via an event channel 320. The event channel 320 is also used by the sound analyzer to transmit indications to the control sequencer 316, that the selected events have been detected. The control sequencer uses these detected events to control the generation of instructions to the components of the sound processing system 200.
The user input 230 couples to the control sequencer 316 to allow a user to interact with the instruction generator 210 to control operation of the sound processing system 200. For example, the user may use the user input to select whether the external script input 228 or the control data input 226 are used to receive scripts for processing the sound signal 212. The user may also specify operation of any of the other components of the sound processing system by using the user input. In one embodiment, the user can instruct the control sequencer 316 to activate the sound analyzer 318 to detect selected events in the sound signal 212. Further, upon detection of the selected events, the control sequencer will use the presets stored in the memory 322 to generate instructions for the components of the sound processing system. The user input 230 may also be used to enter control script information directly into the instruction generator 210.
In another embodiment of the present invention, the unmixer 204 and the instruction generator 210 provide unmixed streams 216 and control instructions 214, 310, 312, 224 to an external system (not shown) that may include subprocessors, 3D position processors and mixers. The external system may be another computer program or computer system including hardware and software. The external system may also be located at a different location from the components of the system 200. As a result, it is possible to distribute the processing of the unmixed streams to one or more systems. However, it will be apparent to one with skill in the art that merely distributing the processing does not deviate from the scope of the invention, which includes ways to produce unmixed streams which may be processed in accordance with instructions based on a control script.
Therefore it is possible to process sounds in a variety of ways using the sound processing system 200. In one method, the sound is processed using events detected within the sound itself. In another method, sound is processed using script information embedded with the sound at the sound source. In another method of processing, the script information is independent from the sound source, for example, a separate data file, that can be input to the control sequencer 316 to control how the sounds are processed. The invention is related to the use of the sound processing system 200 for dynamic sound processing. According to one embodiment of the invention, dynamic sound processing is provided by the sound processing system 200 in response to the control sequencer 316 executing one or more sequences of one or more instructions. Such instructions may be read into the control sequencer 316 from another computer-readable medium, such as the sound source 202. Execution of the sequences of instructions causes the control sequencer to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software. The term "computer-readable medium" as used herein refers to any medium that participates in providing instructions to the control sequencer 316 for execution. Such a medium may take many forms, including, but not limited to, non- volatile media, volatile media, and transmission media. Non-volatile media include, for example, optical or magnetic disks, such as those that may be used in conjunction with the sound source 202. Volatile media include dynamic memory, such as dynamic memory that may be associated with the presets 322. Transmission media include coaxial cables, copper wire, and fiber optics, including the wires that comprise the script input 228. Transmission media can also take the form of radio or light waves, such as those generated during radio frequency (RF) and infra-red (IR) data communications. Common forms of computer-readable media include, for example, floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns or holes, a RAM, a PROM, an EPROM, a FLASH-EPROM, computer data storage structure, any other memory chip or cartridge, a carrier wave as describe hereinafter, or any other medium from which a computer can read. Various forms of computer-readable media may be involved in carrying one or more sequences of one or more instructions to the control sequencer 316 for execution. For example, the instructions may initially be borne on a magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to the sound processing system 200 can receive the data on the telephone line via the script input 228. The communication interface 350 receives the data and forwards the data over the channel 352 to the control sequencer 316 which executes instructions included in the data. The instructions received by the control sequencer 316 may optionally be stored in an internal memory within the control sequencer either before or after execution by the control sequencer 316.
The communication interface 350 provides a two-way data communication coupling to a script input 228 that may be connected to a local network (not shown). For example, the communication interface 350 may be an integrated services digital network (ISDN) card or a modem to provide a data commumcation connection to a corresponding type of telephone line. As another example, the commumcation interface 350 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, the communication interface 350 sends and receives electrical, electromagnetic, or optical signals that carry digital data streams representing various types of information. If the script input 228 is to be coupled to a data network, a connection may be established through a local network (not shown) to a host computer or to data equipment operated by an Internet Service Provider (ISP). The ISP in turn provides data communication services through the worldwide packet data communication network, now commonly referred to as the "Internet." The local network and the Internet both use electrical, electromagnetic or optical signals that carry digital data streams. The signal through the various networks and the signals on the script input 228 and through the communication interface 350, which carry the digital data to and from the sound processing system 200, are exemplary forms of carrier waves transporting the information.
The sound processing system 200 can send messages and receive data, including program codes, through the networks(s), the script input 228 and the commumcation interface 350. In the Internet example, an Internet server might transmit code for an application program through the Internet, ISP, local network, and communication interface 350. In accordance with the invention, one such downloaded application provides for dynamic sound processing as described herein. The received code may be executed by the control sequencer 316 as it is received, and/or stored in memory 322 as presets, or other non- volatile storage for later execution. In this manner, the sound processing system 200 obtains application code in the form of a carrier wave. FIG. 4 shows a block diagram depicting the functionality of one embodiment of the unmixer 204 including various internal operations and corresponding signals. In FIG. 4, it will be assumed that the input sound signal 212 includes left (L) and right (R) stereo channels, however, it will be obvious to one skilled in the art that minor modifications can be made to process more sound channels without deviating from the scope of the invention. The left and right stereo channels are input to discrete Fourier transform (DFT) blocks 402L and 402R, respectively. In a preferred embodiment, the stereo channels will be in the form of digital signals. However, for analog stereo channels, the channels can be digitized using techniques well-known in the art. The outputs of the DFT blocks 402L and 402R are the frequency domain spectra of the left and right stereo channels. Peak detection blocks 404L and 404R detect the peak frequencies where peaks occur in the frequency domain spectra. This information is then passed to a subtraction block 406, which generates a difference spectra signal having values equal to the difference of the left and right frequency domain spectra at each peak frequency. If voice signals are panned to center, then the magnitudes and phases of the frequency domain spectra for each channel at voice frequencies will be almost identical. Accordingly, the magnitude of the difference spectra at those frequencies will be small.
The difference signal as well as the left and right peak frequencies and frequency domain spectra are input to an amplitude adjustment block 410. The amplitude adjustment block utilizes the magnitudes of the difference spectra and frequency domain spectra of each channel to modify the magnitudes of the frequency domain spectra of each channel and output a modified spectra. The magnitude of the modified spectra depends on the magnitude of the difference spectra. Accordingly, the magnitude of the modified frequency domain spectra will be low for frequencies corresponding to voice. The modified frequency domain spectra for each channel is input to inverse discrete Fourier (LDFT) transform blocks 412L and 412R, which output time domain signals based on the modified spectra. Since the modified spectra was attenuated at frequencies corresponding to voice the modified stereo channels (L' and R') output by the IDFT blocks 412L and 412R will have the voice removed. However, the instruments and other sounds not panned to the center will remain in the original stereo channels so that the stereo quality of the recording will be preserved. Additionally, a center output containing the unmixed spectra is input to LDFT block 412C that outputs time domain signals (C) based on the unmixed spectra. The time domain signals L', C and R' are input to mixer 414 that combines the received signals to produce seven "voices." Each voice represents some combination of the L', C and R' signals. Therefore, it is possible that V0 represents only the C signal and that VI is comprised of some proportion of L' and C, for example.
The unmixing instructions 214 are received by the unmixer 204 and used to determine how to unmix the input signal 212 to form the output voices (VO-7). For example, the unmixing instructions specify how to combine the L', C and R' outputs to form the voice outputs. The unmixing instructions also provide unmixing parameters that can be used by the subtractor 406 and the amplitude adjustor 410 to select a portion of input signal 212 to be unmixed and provided to the IDFT block 412C. For example, the unmixing parameters are used to select a center portion of the input signal 212 to be unmixed. Thus, equal amplitudes of frequency peaks that occur in both the left and right stereo channels would be unmixed. The effect of this operation can be demonstrated by considering a case where a singer's voice is spatially centered between the left and right channels. Since the singer's voice so positioned would result in identical frequency peaks in the left and right channels, equal amounts of these frequency peaks are removed and as a result, the singer's voice would be unmixed from the sound signal.
In another embodiment the unmixing parameters include amplitude weighting parameters that may be used to unmix signals that do not appear equally in both left and right channels. For example, the singer's voice used in the above example, may be spatially positioned off center, and thus, more toward either the left or right channel. As a result, the frequency peaks representing the singer's voice would have greater amplitude corresponding to the side where the singer voice is located. The amplitude weighting parameters are used by the subtractor 406 and the amplitude adjustor 410 to unmix the singer's voice by compensating for the greater amplitude of the frequency peaks representing the singer's voice that appear one channel (either left or right). As a result, the larger amplitude frequency peaks on that channel would be unmixed while lower amplitude frequency peaks on the other channel would be unmixed. Thus, even if the singer's voice appears to be spatially off-center, given the appropriate unmixing parameters the singer's voice can still be unmixed by the unmixer 204. The above described unmixing process can be used to unmix virtually any part of the input signal to produce one or more of the voice outputs. The unmixing is performed by hardware and/or software that receives the unmixing instructions and performs the above defined functions accordingly. The various operations performed by the blocks of FIG. 4 will now be described in greater detail. The Phase Vocoder and DFT
A frequency-domain representation of the input signal 212 can be obtained by use of a phase-vocoder, a process in which the incoming signal is split into overlapping, windowed, short-term frames which are then processed by a Fourier Transform, resulting in a series of short-term frequency domain spectra representing the spectral content of the input signal in each short-term frame. The frequency-domain representation can then be altered and a modified time-domain signal reconstructed by use of overlapping windowed inverse Fourier transforms. The phase vocoder is a very standard and well known tool that has been used for years in many contexts (voice coding high-quality time-scaling frequency-domain effects and so on).
Assuming the incoming stereo signal is processed by the phase-vocoder, for each stereo input frame there is a pair of frequency-domain spectra that represent the spectral content of the short-term left and right signals. The short-term spectrum of the left signal is denoted by XLk ,t) , where Ω^ is the frequency channel and t is the time corresponding to the short-time frame. Similarly, the short-term spectrum of the right signal is denoted by XR { lk , t) . Both X Lk , t) and XRk , t) are arrays of complex numbers with amplitudes and phases.
Peak Detection The first step consists of identifying peaks in the magnitudes of the short-term spectra. These peaks indicate sinusoidal components that can either belong to the singer's voice or to background instruments. To find the peaks, one calculates the magnitude of XLk ,t) or of XRk ,t)oτ of XLk ,t) + XRk ,t) and one performs a peak detection process. One such peak detection scheme consists of declaring as peaks those channels where the amplitude of a channel is larger than the two neighbor channels on the left and the two neighbor channels on the right. Associated with each peak is a so called region of influence composed of all the frequency channels around the peak. The consecutive regions of influence are contiguous and the limit between two adjacent regions can be set to be exactly mid-way between two consecutive peaks or to be located at the channel of smallest amplitude between the two consecutive peaks. Difference Calculation and Gain Estimation
The Left-Right difference signal in the frequency domain is obtained next by calculating the difference between the left and right spectra using:
D(Ωko ,t) = XLko ,ή - XRko ,t) (1) for each peak frequency Ωk .
For peaks that correspond to components belonging to the voice (or any instrument panned in the center) the magnitude of this difference will be small relative to either XL \Ωk ,t) or XRk ,t), while for peaks that correspond to components belonging to background instruments this difference will not be small. Using D(Ωk ,t) to reconstruct the time-domain signal would result in the exact equivalent of the standard (Left minus Right) algorithm with a mono output.
Rather, the key idea is to calculate how much of a gain reduction it takes to bring XLk ,t) and XRk ,t) down to the level of D(Ωk ,t) and apply this gain in the frequency domain, leaving the phases unchanged. Specifically the left and right gains are calculated as follows: riio ,t) = min(l,|E.(Ωto ,t)|/| iio ,t)|)
Figure imgf000016_0001
which are the left gain and the right gain for each peak frequency. The min() function assures that these gains are not allowed to become larger than 1. Peaks for which YLk , t) is close to 0 are deemed to correspond to the voice while peaks for which YLk ,t) is close to 1 are deemed to correspond to the background instruments.
Voice Removal
To remove the voice one will apply a real gain GL Rk ,t) to all the channels in the region of influence of the peak :
YLko,ή = XLko ,ή GLka
YRko ,ή = XRko ,t) GRk ,ή.
The gains GL Rk ,t) are real, and therefore the modified channels YL R \p.k ,t) have the same phase as the original channels X L R lk ,t) but their magnitudes have been modified. To remove the voice, GL Rk ,t) should be small whenever TL R Ωk ,t) is small and should be close to 1 whenever Vl Rk , t) is close to 1. One choice is to define
Figure imgf000017_0001
where the modified channels YL Rk ,t) are given the same magnitude as the difference
D(Ωko ,t). As a result, the signal reconstructed from YLko,t) and YRko ,t) will retain the stereo image of the original signal but the voice components will have been significantly reduced.
Another choice is to define Gi,Λ (Ω,o ,t) = (ri,Λ(Ω t))α with α > 0. Where the exponent α controls the amount of reduction brought by the algorithm: α close to 0 does not remove much while large values of remove more and α = 1 removes exactly the same amount as the standard Left-Right technique. Using large values of α makes it possible to attain a larger amount of voice removal than possible with the standard technique. In general, the gain function is a function based on the magnitude of the difference spectra.
Voice Amplification
To amplify the voice and attenuate the background instruments the gains GL R Ωko ,t) should be chosen to be close to 1 for small TL Rkt> ,t) and close to 0 for
FL Rk ,t) close to 1, i.e., an increasing function of the inverse of the magnitude. Examples include:
GI,Λ (Ω,o ,t) = l - ri,Λio ,t) or
Figure imgf000017_0002
etc. Because GL Rk ,t) is small for channels that belong to background instruments (for which FL Rk ,t) is close to 1), background instruments are attenuated while the voice is left unchanged. Thus, it is possible to unmix the voice components from the sound signal.
Gain Smoothing It is often useful to perform time-domain smoothing of the gain values to avoid erratic gain variations that can be perceived as a degradation of the signal quality. Any type of smoothing can be used to prevent such erratic variations. For example, one can generate a smoothed gain by setting
<UΩ ')
Figure imgf000018_0001
- (ι-β)GL,Rko,t -ι) where β is a smoothing parameter between 0 (a lot of smoothing) and 1 (no smoothing) and (t - 1) denotes the time at the previous frame and G is the smoothed version of G. Other types of linear or non-linear smoothing can be used.
Frequency Selective Processing
Because the voice signal typically lies in a reduced frequency range (for example from 100Hz to 4kHz for a male voice) it is possible to set the gains GL Rk ,t) to arbitrary values for frequency outside that range. For example, when removing the voice we can assume that there are no voice components outside of a frequency range ω πάn → ω πaχ m& set tne gains to 1 for frequency outside that range: Gιto ,t) = 1 for Ω,o < ω^ or Ω,o > ω^ . Thus, components belonging to an instrument panned in the center (such as a bass-guitar or a kick drum) but whose spectral content do not overlap that of the voice, will not be attenuated as they would with the standard method.
For voice amplification one could set those gains to 0: ^io ,t) = 0 for Ω,o < ωmiΩ or Ωko > ω^ so that instruments falling outside the voice range would be removed automatically regardless of where they are panned.
Left/Right Balance
Sometimes the voice is not panned directly in the center but might appear in both channels with a small amplitude difference. This would happen, for example, if both channels were transmitted with slightly different gains. In that case, the gain mismatch can easily be incorporated in Eq. (1):
D'(Ωko,ή = <SXLko ,t) - XRko ,ή where δ is a gain adjustment factor that represents the gain ratio between the left and right channels. Thus, by using the appropriate delta ( δ) it is possible to unmix sound components that are not centered between the left and right channels, but are panned to one side or the other. The appropriate ( δ) will result in the frequency components of interest having a very small difference spectra.
IDFT and Signal Reconstruction Once ^(Ω^ ,t) and ^(Ω^ ,t) have been reconstructed for every frequency channel, the resulting frequency domain representation is used to reconstruct the time-domain signal according to the standard phase-vocoder algorithm.
FIG. 5 is a block diagram of one embodiment of the unmixer 204 that includes a CPU, memory, input and output system, and peripherals, suitable for use to unmix selected sound components of an input signal. The unmixer 204 is capable of receiving the unmixing instructions 214 and executing unmixing software that interprets the unmixing instructions 214 to perform unmixing operations to produce the desired voices (V0-7). In a preferred embodiment, the unmixer 204 includes a digital signal processor (DSP) (not shown) under control of the CPU. FIG. 6 shows an exemplary embodiment of sound processing instructions 600 constructed in accordance with the present invention. The sound processing instructions contain name and value pairs that can be used by the components of the sound processing system 200 to process sounds in accordance with the present invention. The following is a description of the name and value pairs of the sound processing instructions 600. However, the following list is exemplary and not intended to provide an exhaustive list of all possible processing instructions.
Figure imgf000019_0001
FIG. 7 shows a sound processing method 700 for use with the sound processing system 200 of FIG. 3. The sound processing method 700 can be used to process an input sound source to reposition perceived spatial positions of sound components within the sound signal.
At block 702, a sound source provides a sound signal to the sound processing system of the present invention. For example, the sound source 202 provides the sound signal 210 for processing.
At block 704, a control script for processing the sound signal is determined. In one embodiment, the user instructs the control sequencer where to find the control script. For example, the user indicates via the user input 230 that an external script is to be received from the script input 228 or that a script accompanying the sound signal at script data input 226 is to be used.
At block 706, the control sequencer 316 begins obtaining script instructions from the selected script input.
At block 708, the control sequencer decodes the script and generates unmixing instructions to the sound unmixer 204. For example, the unmixing instructions provide coefficients for forming one or more voices 216 output from the unmixer.
At block 710, one or more voices 216 are output from the unmixer 204 in response to the unmixing instructions. Although FIG. 3 depicts three voices 216, any number of voices may be produced by the unmixer 204. At block 712, the control sequencer 316 generates processing instructions 310 to transmit to the subprocessors 304 for processing the voices 216 created by the unmixer 204. The processing instructions instruct the subprocessors 304 to perform, for example, frequency based processing, such as pitch-shifting or signal harmonizing. The processing may also include time based processing, such as signal filtering. At block 714, the control sequencer 316 generates positioning instructions 312 to transmit to the position processors 306 to adjust the perceived spatial positions of the subprocessed voices 308. For example, assuming the sound processing system is to be used with a four speaker system, the position processors outputs a signal for each of the four speakers to produce a perceived position of the voice to the listener. As a result, varying amounts of the voice appear in the 3D processor outputs 220.
At block 716, the control sequencer 316 generates mixing instructions to mix the processed signals 220 together. This is achieved by the mixer 208, which mixes the signals received from the processor 206, according to the mixing instructions 224, to form mixer outputs 222. The mixer outputs are transmitted to the speakers to produce sounds corresponding to the processing and spatial repositioning which can be perceived by the listener.
At block 718, the method continues by processing any remaining script instructions that exist. For example, if the sound signal is a song that lasts three minutes, the script may include a list of instructions to be processed for the three minute duration.
Time Synchronization
In order to coπectly process the sound signals, time synchronization exists between the components of the processing system 200 and the sound signal. For example, if a sound signal is three minutes in duration, and spatial repositioning is to occur at two minutes into the sound signal, the instruction generator 210, the unmixer 204 and the stream processors 206 are synchronized to achieve this.
In one embodiment, the sound signal and the control scripts include time stamps. The control sequencer 316 generates instructions to the components of the processing system by reading the time stamps on the control script and sending the instructions at the appropriate time in the processing. Likewise, the subprocessors 304 and the position processors 306, read the time stamps on the instructions they receive and match those time stamps with time stamps accompanying the sound signal. Thus, it is possible to know exactly when processing is to be applied to a particular stream.
The mixer 208 also receives time stamp information with its instructions from the control sequencer 316. The mixer uses the time stamp information to determine when to apply selected mixing functions. The mixer can also obtain time stamp information from each received stream and align the received streams based on the time stamps before combining them, so that no distortion is introduce by combining mis-aligned streams.
In one embodiment of the present invention, a master clock is coupled to the components of the processing system 200, and is used to synchronize the components with the time stamps accompanying the sound signal and script file. In another embodiment of the present invention, a time stamp accompanying the sound signal is used to synchronize the system. In that case, each component reads the time stamp on the sound signal it is to process in order to determine when to apply the script instructions. In another embodiment, the sound source provides an analog signal that is converted to a digital signal and tagged with a time stamp which can then be used by the components in the sound processing system 200. Sound Processing Example
A sound processing example will now be provided to demonstrate how sounds may be processed by the sound processing system 200 using an exemplary script to achieve desired spatial effects. FIG. 8 shows an exemplary script 800 for use in processing sounds in accordance with the present invention. The script 800 comprises seven instructions 801-807, which are to be processed to create desired spatial effects on selected sound components of sounds from a sound source.
FIG. 9 shows the listening room 100 of FIG. 1 and includes a modified music source 902 coupled to the sound processing system 200 of FIG. 3, which is further coupled to the amplifier 104. The modified music source 902 is modified in accordance with the present invention and has a sound output 904 and a script output 906 coupled to the sound processing system 200. For example, the modified sound source 902 may be a CD player that plays a CD having both a music track and a script file embedded on it. During playback, the music track is fransmitted from the sound output 904 and the associated script file is transmitted from the script output 906. The sound processing system 200 has its four outputs 220 coupled to the amplifier 104 that provides sound signals to the four speakers in the listening room 100.
For the following discussion, it will be assumed that the music track is approximately three minutes in duration and begins at time 0:00 and ends at time 3:00. The exemplary script 800 will be assumed to be the script embedded on the CD with the music track. Thus, when the CD player is activated and playback of the CD begins, the music track and the script file are output to the sound processing system. Therefore, the music data is input to the sound unmixer and the script data is input to the instruction generator. The music track contains sounds representative of a singer's voice, a piano and a guitar. As playback begins, it is assumed that the perceived spatial positions relative to the listener 110, of the voice 908, piano 910 and guitar 912 are as shown in FIG. 9. The listener 110 is in the center of the listening room 100, facing front and equidistant from the four speakers.
Referring now to FIG. 8, the first three instructions 801, 802, and 803, which are embedded on the CD, are input to the sound processing system. Thus, the instruction generator recieves the first three script instructions and generates the appropriate instruction for each component of the sound processing system 200. The first instruction 801 commands the sound processing system to execute a create voice command (885), to create voice LD 2 (886) using the center unmixing technique (887). The center unmixing technique uses coefficients 0, 1, and 2 (888), where only the coefficient 0 has a value greater than zero. The command begins at time stamp 0:00 (889) and produces a perceived voice at an angle of 0 degrees (890) at a radius 1 meter (891). The voice becomes active .1 seconds (892) after the time stamp 0:00. This instruction maintains the position of sound components located at the right side as provided by the original source. The second instruction 802 commands the sound processing system to execute a create voice command (893), to create voice ID 3 (894) using the center unmixing technique (895). The center unmixing technique uses coefficients 0, 1, and 2 (896). The voice becomes active .1 seconds (897) after the time stamp 0:00. This instruction maintains the position of sound components located at the center as provided by the original source. The third instruction 803 commands the sound processing system to execute a create voice command (870), to create voice ID 4 (872) using the center unmixing technique (874). The center unmixing technique uses coefficients 0, 1, and 2 (876). The command begins at time stamp 0:00 (878) and produces a perceived voice at an angle of 0 degrees (880) at a radius 1 meter (882). The voice becomes active .1 seconds (884) after the time stamp 0:00. This instruction maintains the position of sound components located at the left side as provided by the original source.
Therefore, at the end of the first three instructions 801, 802 and 803, the sound processing system essentially produces sound components having spatial positions corresponding to the spatial positions initially provided by the sound source. Referring now to FIG. 9, the voices having IDs (2, 3 and 4) created by the instructions 801, 802 and 803, include the singer's voice 908, the guitar 912 and the piano 910. Center unmixing was used for each instruction and the three relevant coefficients are set so that portions of the Left, Right and Center of the original sound source are used to create the voices having IDs of 2, 3 and 4. These instructions generally maintain the perceived position of the sound components as provided by the sound source.
Referring now to FIG. 8, the fourth instruction 804, embedded on the CD, is input to the sound processing system. The fourth instruction commands the sound processing system to execute a create voice command (810), to create voice LD 1 (812) using the center unmixing technique (814). The center unmixing technique uses coefficients 0, 1, and 2 (816). The command begins at time stamp 1:00 (818) and produces a perceived center voice at an angle of 90 degrees (820) at a radius 1 meter (822). The voice becomes active .1 seconds (824) after the time stamp 1 :00.
Referring now to FIG. 9, the voice created by the instruction 804 includes the center portion of the sound which generally includes the singer's voice. Since center unmixing was used, the coefficients 816 are set so that only the coefficient for the center band has a nonzero value. As a result, the sounds in the center portion, which include the singer's voice, are rotated from an initial spatial position at 908 to the position shown at 914. Two copies of the singer's voice can now be perceived by the listener. The first copy is derived from voice ID 3 and is percieved at position 908. The second copy is derived from voice LD 1 and is percieved 1 meter from the listener at an angle of 90 degrees as shown at 914.
Referring again to FIG. 8, the fifth instruction (805) modifies the voice created having voice ID 1 (828), by executing another create voice command (826). The fifth instruction commands the music processing system to execute the create voice command again using the center unmixing technique (830). The center unmixing technique uses coefficients 0,1 and 2 (832). The command begins at time stamp 1 :30 (834) and produces a perceived voice at an angle of 180 (836) and a radius 1 meter (838). The voice becomes active .1 seconds (840) after the time stamp 1:30.
Referring now to FIG. 9, the voice created by the instruction 805 is shown. Notice the effect of the instruction was to rotate the perceived voice from the position 914 to a new position shown at 916. The perceived positions of the piano 908 and the guitar 910 are not changed by the execution of the instruction 805, since they are not part of the stream unmixed using the center unmixing technique. Thus, two copies of the singer's voice are percieved, one at position 908 due to voice LD 3, and one at position 916 due to voice ID 1. Referring again to FIG. 8, the sixth instruction (806) again modifies the voice created having voice ID 1 (842), by executing another create voice command (844). The sixth instruction commands the music processing system to execute the create voice command again using the center unmixing technique (846). The center unmixing technique uses coefficients 0,1 and 2 (848). The command begins at time stamp 2:00 (850) and produces a perceived voice at an angle of 225 degrees (852) and a radius 1 meter (854). The voice becomes active .1 seconds (856) after the time stamp 1 :30.
Referring now to FIG. 9, the voice created by the instruction 806 is shown. Notice the effect of the instruction was to again rotate the perceived voice from the position 916 to a new position shown at 918. Thus, two copies of the singer's voice are percieved, one at position 908 due to voice ID 3, and one at position 918 due to voice ID 1.
Referring again to FIG. 8, the seventh instruction (807) again modifies the voice created having voice ID 1 (858), by executing another create voice command (860). The seventh instruction commands the music processing system to execute the create voice command again using the center unmixing technique (862). The center unmixing technique uses coefficients 0,1 and 2 (864). The command begins at time stamp 2:30 (866) and produces a perceived voice at an angle of 270 degrees (868) and a radius 1 meter (870). The voice becomes active .1 seconds (872) after the time stamp 2:30.
Referring now to FIG. 9, the voice created by the instruction 807 is shown. Notice that the effect of the instruction is to again rotate the perceived voice from the position 918 to a new position shown at 920. Thus, two copies of the singer's voice are percieved, one at position 908 due to voice ID 3, and one at position 920 due to voice ID 1.
Therefore, the above example demonstrates that by providing script instructions to the sound processing system 200 included in the present invention, the perceived spatial position of sounds can be manipulated in a variety of ways given a particular speaker arrangement.
FIG. 10 shows an exemplary portion of a storage medium 1000 that has a data track 1002 with embedded sound 1004 and script data 1006 and can be used in accordance with the present invention. The storage medium 1000 could be part of a CD, tape, disk or other type of storage medium used to store sound signals.
The present invention provides a method and apparatus for processing sound signals to produced enhanced sound signals. It will be apparent to those with skill in the art that modifications to the above methods and embodiments can occur without deviating from the scope of the present invention. Accordingly, the disclosures and descriptions herein are intended to be illustrative, but not limiting, of the scope of the invention which is set forth in the following claims.

Claims

WHAT IS CLAIMED IS: 1. Apparatus for processing a sound signal comprising: an input to receive the sound signal; a sound unmixer coupled to the input to receive the sound signal and unmix at least one sound stream from the sound signal based on at least one unmixing instruction; and an output coupled to the sound unmixer to output the at least one sound stream.
2. The apparatus of claim 1, further comprising an instruction generator coupled to the sound unmixer and having logic to generate the at least one unmixing instruction.
3. The apparatus of claim 2, wherein the instruction generator generates the at least one unmixing instruction based on a control script.
4. The apparatus of claim 3, wherein the control script is stored in the instruction generator.
5. The apparatus of claim 3, wherein the instruction generator includes a script input to receive the control script.
6. The apparatus of claim 3, wherein the control script is coupled with the sound signal on a transmission media.
7. The apparatus of claim 2, wherein the instruction generator includes a sound analyzer having logic to receive the sound signal and to detect a selected event in the sound signal.
8. The apparatus of claim 7, wherein the instruction generator has logic to generate the at least one unmixing instruction based on detection of the selected event.
9. The apparatus of claim 1, further comprising at least one sfream processor coupled to the sound unmixer, the stream processor having logic to receive and process the at least one sound stream based on processing instructions to form at least one processed stream.
10. The apparatus of claim 9, further comprising an instruction generator coupled to the sound unmixer and having logic to generate the at least one unmixing instruction.
11. The apparatus of claim 10, wherein the instruction generator is coupled to the at least one stream processor and the processing instructions are generated by the instruction generator.
12. The apparatus of claim 9, wherein the at least one stream processor includes at least one subprocessor having logic to perform frequency processing on the at least one sound stream based on at least one subprocessor instruction to form at least one frequency processed stream.
13. The apparatus of claim 12, further comprising an instruction generator coupled to the at least one subprocessor having logic to generate the at least one subprocessor instruction.
14. The apparatus of claim 9, wherein the at least one stream processor includes at least one position processor having logic to perform spatial position processing on the at least one sound stream based on at least one positioning instructions to form at least one positions sound stream.
15. The apparatus of claim 14, further comprising an instruction generator coupled to the at least one position processor having logic to generate the at least one positioning instruction.
16. The apparatus of claim 9, further comprising a mixer coupled to the at least one stream processor and having logic to receive the at least one processed stream to produce at least one output stream.
17. A method of processing a sound signal comprising the steps of: receiving the sound signal; unmixing at least one sound stream from the sound signal based on at least one unmixing instruction; and outputting the at least one sound sfream.
18. The method of claim 17, further comprising a step of generating the at least one unmixing instruction.
19. The method of claim 18, wherein the step of generating comprises a step of generating the at least one unmixing instruction based on a control script.
20. The method of claim 19, further comprising a step of receiving the control script at a script input.
21. The method of claim 19, further comprising a step of receiving the control script, wherein the control script is coupled with the sound signal on a transmission media.
22. The method of claim 19, further comprising a step of receiving the control script that is input by a user.
23. The method of claim 19, further comprising the step of detecting a selected event in the sound signal.
24. The method of claim 23, further comprising a step of generating the at least one unmixing instruction based on detection of the selected event.
25. The method of claim 17, further comprising steps of: receiving the at least one sound stream; and processing the at least one sound stream based on at least one processing instruction to form at least one processed stream.
26. The method of claim 25, further comprising a step of generating the processing instructions based on a control script.
27. The method of claim 25, further comprising a step of performing frequency processing on the at least one sound stream based on at least one subprocessor instruction to form at least one frequency processed stream.
28. The method of claim 27, further comprising a step of generating the at least one subprocessor instruction based on a control script.
29. The method of claim 25, further comprising a step of performing spatial position processing on the at least one sound stream based on at least one positioning instruction to form at least one positioned sound stream.
30. The method of claim 29, further comprising a step of generating the at least one positioning instruction based on a control script.
31. The method of claim 29, further comprising a step of mixing the at least one positioned sound stream to produce at least one output sfream.
32. A computer-readable medium of instructions for processing a sound signal, comprising: means for unmixing the sound signal based on at least one unmixing instruction to form at least one sound stream; and means for generating the at least one unmixing instruction.
33. The computer-readable medium of claim 32, further comprising means for subprocessing the at least one sound stream based on at least one subprocessing instruction to form at least one subprocessed sound sfream.
34. The computer-readable medium of claim 33, further comprising means for performing 3D positioning on the at least one subprocessed sound stream to form at least one repositioned stream.
35 The computer-readable medium of claim 34, further comprising means for mixing the at least one repositioned stream to form at least one processed sound signal.
36. A computer program product for controlling a sound processing system to process a sound signal, comprising: a recording medium readable by the sound processing system; and means recorded on the recording medium for directing the sound processing system to receive the sound signal and unmix at least one sound stream from the sound signal based on at least one unmixing instruction.
37. The computer program product of claim 36, further comprising means recorded on the recording medium for directing the sound processing system to generate the at least one unmixing instruction.
38. The computer program product of claim 36, further comprising means recorded on the recording medium for directing the sound processing system to process the at least one sound stream based on at least one processing instruction to form at least one processed sound stream.
39. The computer program product of claim 36, further comprising means recorded on the recording medium for directing the sound processing system to generate the at least one processing instruction.
40. The computer program product of claim 36, further comprising means recorded on the recording medium for directing the sound processing system to re- positioning the spatial position of the at least one sound stream to form at least one repositioned stream.
41. A computer data signal embodied in a carrier wave comprising: a compression source code segment comprising: a first source code segment that includes instructions for unmixing at least one sound sfream from a sound signal based on at least one unmixing instruction; and a second source code segment that includes instructions for generating the at least one unmixing instruction.
42. The computer data signal of claim 41 , further comprising a third source code segment comprising instructions for repositioning a spatial position of the at least one sound stream to form at least one repositioned stream.
43. Apparatus for processing a sound signal comprising: a sound unmixer having logic to receive the sound signal and unmix at least one sound sfream based on at least one unmixing instruction; at least one stream processor coupled to the sound unmixer and having logic to process the at least one sound stream based on at least one processing instruction to produce at least one processed sound sfream; a mixer coupled to the at least one stream processor and having logic to process the at least one processed sfream based on at least one mixing instruction to form at least one output stream; and an instruction generator coupled to the sound unmixer, the at least one stream processor and the mixer, wherein the instruction generator includes logic to generate the at least one unmixing instruction, the at least one processing instruction and the at least one mixing instruction.
44. The apparatus of claim 43, wherein the instruction generator generates the at least one unmixing instruction, processing instruction and mixing instruction based on a control script.
45. The apparatus of claim 44, wherein the control script is combined with the sound signal.
46. The apparatus of claim 44, wherein the control script is receive by the instruction generator and the control script is independent from the sound signal.
47. A computer software product that includes a medium readable by a sound processing system that includes a sound unmixer, an instruction generator, a sfream processor and a mixer, wherein the sound processing system processes a sound signal, and the medium having stored thereon: a first sequence of instructions which, when executed by the sound processing system, causes the sound processing system to unmix at least one sound sfream from the sound signal.
48. The computer software product of claim 47, further comprising a second sequence of instructions which, when executed by the sound processing system, causes the sound processing system to process the at least one sound signal to form at least one processed sound signal.
PCT/US2000/026601 1999-09-27 2000-09-27 Process for removing voice from stereo recordings WO2001024577A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US10/415,770 US8767969B1 (en) 1999-09-27 2000-09-27 Process for removing voice from stereo recordings
AU79873/00A AU7987300A (en) 1999-09-27 2000-09-27 Process for removing voice from stereo recordings

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US09/405,941 1999-09-27
US09/405,941 US6405163B1 (en) 1999-09-27 1999-09-27 Process for removing voice from stereo recordings
US60/165,058 1999-11-12

Publications (1)

Publication Number Publication Date
WO2001024577A1 true WO2001024577A1 (en) 2001-04-05

Family

ID=23605861

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/026601 WO2001024577A1 (en) 1999-09-27 2000-09-27 Process for removing voice from stereo recordings

Country Status (2)

Country Link
US (1) US6405163B1 (en)
WO (1) WO2001024577A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7257231B1 (en) 2002-06-04 2007-08-14 Creative Technology Ltd. Stream segregation for stereo signals
EP2131610A1 (en) * 2008-06-02 2009-12-09 Starkey Laboratories, Inc. Compression and mixing for hearing assistance devices
US20110051936A1 (en) * 2009-08-27 2011-03-03 Sony Corporation Audio-signal processing device and method for processing audio signal
US7970144B1 (en) 2003-12-17 2011-06-28 Creative Technology Ltd Extracting and modifying a panned source for enhancement and upmix of audio signals
US8219390B1 (en) 2003-09-16 2012-07-10 Creative Technology Ltd Pitch-based frequency domain voice removal
EP2523472A1 (en) * 2011-05-13 2012-11-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method and computer program for generating a stereo output signal for providing additional output channels
EP2544466A1 (en) * 2011-07-05 2013-01-09 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and apparatus for decomposing a stereo recording using frequency-domain processing employing a spectral subtractor
US8767969B1 (en) 1999-09-27 2014-07-01 Creative Technology Ltd Process for removing voice from stereo recordings
US9031242B2 (en) 2007-11-06 2015-05-12 Starkey Laboratories, Inc. Simulated surround sound hearing aid fitting system
US9185500B2 (en) 2008-06-02 2015-11-10 Starkey Laboratories, Inc. Compression of spaced sources for hearing assistance devices
US9485589B2 (en) 2008-06-02 2016-11-01 Starkey Laboratories, Inc. Enhanced dynamics processing of streaming audio by source separation and remixing

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1206043B1 (en) * 2000-11-08 2009-12-23 Sony Deutschland GmbH Noise reduction in a stereo receiver
US7567845B1 (en) 2002-06-04 2009-07-28 Creative Technology Ltd Ambience generation for stereo signals
WO2004015683A1 (en) * 2002-08-02 2004-02-19 Koninklijke Philips Electronics N.V. Method and apparatus to improve the reproduction of music content
DK1695590T3 (en) * 2003-12-01 2014-06-02 Wolfson Dynamic Hearing Pty Ltd Method and apparatus for producing adaptive directional signals
ATE388599T1 (en) * 2004-04-16 2008-03-15 Dublin Inst Of Technology METHOD AND SYSTEM FOR SOUND SOURCE SEPARATION
US8626494B2 (en) * 2004-04-30 2014-01-07 Auro Technologies Nv Data compression format
US8009837B2 (en) * 2004-04-30 2011-08-30 Auro Technologies Nv Multi-channel compatible stereo recording
JP4594681B2 (en) * 2004-09-08 2010-12-08 ソニー株式会社 Audio signal processing apparatus and audio signal processing method
JP2006100869A (en) * 2004-09-28 2006-04-13 Sony Corp Sound signal processing apparatus and sound signal processing method
JP4580210B2 (en) * 2004-10-19 2010-11-10 ソニー株式会社 Audio signal processing apparatus and audio signal processing method
US20060112812A1 (en) * 2004-11-30 2006-06-01 Anand Venkataraman Method and apparatus for adapting original musical tracks for karaoke use
US7912232B2 (en) * 2005-09-30 2011-03-22 Aaron Master Method and apparatus for removing or isolating voice or instruments on stereo recordings
US20070237341A1 (en) * 2006-04-05 2007-10-11 Creative Technology Ltd Frequency domain noise attenuation utilizing two transducers
US9088855B2 (en) * 2006-05-17 2015-07-21 Creative Technology Ltd Vector-space methods for primary-ambient decomposition of stereo audio signals
US7336220B2 (en) * 2006-06-01 2008-02-26 M/A-Com, Inc. Method and apparatus for equalizing broadband chirped signal
US8335330B2 (en) * 2006-08-22 2012-12-18 Fundacio Barcelona Media Universitat Pompeu Fabra Methods and devices for audio upmixing
JP4827661B2 (en) * 2006-08-30 2011-11-30 富士通株式会社 Signal processing method and apparatus
US7974838B1 (en) * 2007-03-01 2011-07-05 iZotope, Inc. System and method for pitch adjusting vocals
US20080300702A1 (en) * 2007-05-29 2008-12-04 Universitat Pompeu Fabra Music similarity systems and methods using descriptors
US8180062B2 (en) * 2007-05-30 2012-05-15 Nokia Corporation Spatial sound zooming
US8085940B2 (en) * 2007-08-30 2011-12-27 Texas Instruments Incorporated Rebalancing of audio
US8509454B2 (en) * 2007-11-01 2013-08-13 Nokia Corporation Focusing on a portion of an audio scene for an audio signal
JP5365380B2 (en) * 2009-07-07 2013-12-11 ソニー株式会社 Acoustic signal processing apparatus, processing method thereof, and program
DK2696599T3 (en) 2012-08-07 2016-08-29 Starkey Labs Inc Compression of different sources of hearing aids
US9071900B2 (en) * 2012-08-20 2015-06-30 Nokia Technologies Oy Multi-channel recording
DK2747458T3 (en) 2012-12-21 2015-11-09 Starkey Lab Inc Improved dynamic processing of streaming audio at the source separation and remixing
US9473852B2 (en) * 2013-07-12 2016-10-18 Cochlear Limited Pre-processing of a channelized music signal
WO2015179914A1 (en) 2014-05-29 2015-12-03 Wolfson Dynamic Hearing Pty Ltd Microphone mixing for wind noise reduction
CN104053120B (en) * 2014-06-13 2016-03-02 福建星网视易信息系统有限公司 A kind of processing method of stereo audio and device
US9820073B1 (en) 2017-05-10 2017-11-14 Tls Corp. Extracting a common signal from multiple audio signals
CN110278721B (en) * 2018-01-18 2021-10-12 Ask工业有限公司 Method for outputting an audio signal depicting a musical piece into an interior space via an output device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5666424A (en) * 1990-06-08 1997-09-09 Harman International Industries, Inc. Six-axis surround sound processor with automatic balancing and calibration
US5727068A (en) * 1996-03-01 1998-03-10 Cinema Group, Ltd. Matrix decoding method and apparatus
US5890125A (en) * 1997-07-16 1999-03-30 Dolby Laboratories Licensing Corporation Method and apparatus for encoding and decoding multiple audio channels at low bit rates using adaptive selection of encoding method
US5946352A (en) * 1997-05-02 1999-08-31 Texas Instruments Incorporated Method and apparatus for downmixing decoded data streams in the frequency domain prior to conversion to the time domain
US6021386A (en) * 1991-01-08 2000-02-01 Dolby Laboratories Licensing Corporation Coding method and apparatus for multiple channels of audio information representing three-dimensional sound fields

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5400410A (en) * 1992-12-03 1995-03-21 Matsushita Electric Industrial Co., Ltd. Signal separator
JPH0764577A (en) * 1993-08-30 1995-03-10 Mitsubishi Electric Corp Karaoke device
US5511128A (en) 1994-01-21 1996-04-23 Lindemann; Eric Dynamic intensity beamforming system for noise reduction in a binaural hearing aid
JP3568584B2 (en) * 1994-06-28 2004-09-22 ローム株式会社 Audio equipment
US5719344A (en) * 1995-04-18 1998-02-17 Texas Instruments Incorporated Method and system for karaoke scoring
US5778082A (en) * 1996-06-14 1998-07-07 Picturetel Corporation Method and apparatus for localization of an acoustic source
US6148086A (en) * 1997-05-16 2000-11-14 Aureal Semiconductor, Inc. Method and apparatus for replacing a voice with an original lead singer's voice on a karaoke machine
US6311155B1 (en) * 2000-02-04 2001-10-30 Hearing Enhancement Company Llc Use of voice-to-remaining audio (VRA) in consumer applications

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5666424A (en) * 1990-06-08 1997-09-09 Harman International Industries, Inc. Six-axis surround sound processor with automatic balancing and calibration
US6021386A (en) * 1991-01-08 2000-02-01 Dolby Laboratories Licensing Corporation Coding method and apparatus for multiple channels of audio information representing three-dimensional sound fields
US5727068A (en) * 1996-03-01 1998-03-10 Cinema Group, Ltd. Matrix decoding method and apparatus
US5946352A (en) * 1997-05-02 1999-08-31 Texas Instruments Incorporated Method and apparatus for downmixing decoded data streams in the frequency domain prior to conversion to the time domain
US5890125A (en) * 1997-07-16 1999-03-30 Dolby Laboratories Licensing Corporation Method and apparatus for encoding and decoding multiple audio channels at low bit rates using adaptive selection of encoding method

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8767969B1 (en) 1999-09-27 2014-07-01 Creative Technology Ltd Process for removing voice from stereo recordings
US7257231B1 (en) 2002-06-04 2007-08-14 Creative Technology Ltd. Stream segregation for stereo signals
US7315624B2 (en) 2002-06-04 2008-01-01 Creative Technology Ltd. Stream segregation for stereo signals
US8219390B1 (en) 2003-09-16 2012-07-10 Creative Technology Ltd Pitch-based frequency domain voice removal
US7970144B1 (en) 2003-12-17 2011-06-28 Creative Technology Ltd Extracting and modifying a panned source for enhancement and upmix of audio signals
US9031242B2 (en) 2007-11-06 2015-05-12 Starkey Laboratories, Inc. Simulated surround sound hearing aid fitting system
US8705751B2 (en) 2008-06-02 2014-04-22 Starkey Laboratories, Inc. Compression and mixing for hearing assistance devices
US9332360B2 (en) 2008-06-02 2016-05-03 Starkey Laboratories, Inc. Compression and mixing for hearing assistance devices
US9924283B2 (en) 2008-06-02 2018-03-20 Starkey Laboratories, Inc. Enhanced dynamics processing of streaming audio by source separation and remixing
US9485589B2 (en) 2008-06-02 2016-11-01 Starkey Laboratories, Inc. Enhanced dynamics processing of streaming audio by source separation and remixing
US9185500B2 (en) 2008-06-02 2015-11-10 Starkey Laboratories, Inc. Compression of spaced sources for hearing assistance devices
EP2131610A1 (en) * 2008-06-02 2009-12-09 Starkey Laboratories, Inc. Compression and mixing for hearing assistance devices
US8929556B2 (en) * 2009-08-27 2015-01-06 Sony Corporation Audio-signal processing device and method for processing audio signal
US20110051936A1 (en) * 2009-08-27 2011-03-03 Sony Corporation Audio-signal processing device and method for processing audio signal
AU2012257865B2 (en) * 2011-05-13 2015-07-09 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method and computer program for generating a stereo output signal for providing additional output channels
EP2523472A1 (en) * 2011-05-13 2012-11-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method and computer program for generating a stereo output signal for providing additional output channels
WO2012156232A1 (en) * 2011-05-13 2012-11-22 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method and computer program for generating a stereo output signal for providing additional output channels
CN103518386A (en) * 2011-05-13 2014-01-15 德商弗朗霍夫应用研究促进学会 Apparatus and method and computer program for generating a stereo output signal for providing additional output channels
CN103518386B (en) * 2011-05-13 2017-11-28 德商弗朗霍夫应用研究促进学会 For producing stereo output signal to provide the device of extra output channels, method and computer-readable recording medium
US9913036B2 (en) 2011-05-13 2018-03-06 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method and computer program for generating a stereo output signal for providing additional output channels
EP2544466A1 (en) * 2011-07-05 2013-01-09 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and apparatus for decomposing a stereo recording using frequency-domain processing employing a spectral subtractor

Also Published As

Publication number Publication date
US6405163B1 (en) 2002-06-11

Similar Documents

Publication Publication Date Title
WO2001024577A1 (en) Process for removing voice from stereo recordings
US8751029B2 (en) System for extraction of reverberant content of an audio signal
JP5467105B2 (en) Apparatus and method for generating an audio output signal using object-based metadata
US7583805B2 (en) Late reverberation-based synthesis of auditory scenes
EP2974010B1 (en) Automatic multi-channel music mix from multiple audio stems
KR101569032B1 (en) A method and an apparatus of decoding an audio signal
KR100458021B1 (en) Multi-channel audio enhancement system for use in recording and playback and methods for providing same
US7567845B1 (en) Ambience generation for stereo signals
US7162045B1 (en) Sound processing method and apparatus
WO2005101898A2 (en) A method and system for sound source separation
CN103650538A (en) Method and apparatus for decomposing a stereo recording using frequency-domain processing employing a spectral weights generator
JP5307770B2 (en) Audio signal processing apparatus, method, program, and recording medium
US8767969B1 (en) Process for removing voice from stereo recordings
JP5058844B2 (en) Audio signal conversion apparatus, audio signal conversion method, control program, and computer-readable recording medium
JP7256164B2 (en) Audio processing device and audio processing method
JP5202021B2 (en) Audio signal conversion apparatus, audio signal conversion method, control program, and computer-readable recording medium
RU2384973C1 (en) Device and method for synthesising three output channels using two input channels
JP2015065551A (en) Voice reproduction system
WO2017188141A1 (en) Audio signal processing device, audio signal processing method, and audio signal processing program
AU2013200578A1 (en) Apparatus and method for generating audio output signals using object based metadata

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
WWE Wipo information: entry into national phase

Ref document number: 10415770

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: JP