WO2001004870A1 - Method of automatic recognition of musical compositions and sound signals - Google Patents

Method of automatic recognition of musical compositions and sound signals Download PDF

Info

Publication number
WO2001004870A1
WO2001004870A1 PCT/GR2000/000024 GR0000024W WO0104870A1 WO 2001004870 A1 WO2001004870 A1 WO 2001004870A1 GR 0000024 W GR0000024 W GR 0000024W WO 0104870 A1 WO0104870 A1 WO 0104870A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
vectors
model
unknown
group
Prior art date
Application number
PCT/GR2000/000024
Other languages
French (fr)
Inventor
Constantin Papaodysseus
Constantin Triantafillou
George Roussopoulos
Constantin Alexiou
Athanasios Panagopoulos
Dimitrios Fragoulis
Original Assignee
Constantin Papaodysseus
Constantin Triantafillou
George Roussopoulos
Constantin Alexiou
Athanasios Panagopoulos
Dimitrios Fragoulis
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Constantin Papaodysseus, Constantin Triantafillou, George Roussopoulos, Constantin Alexiou, Athanasios Panagopoulos, Dimitrios Fragoulis filed Critical Constantin Papaodysseus
Priority to EP00940675A priority Critical patent/EP1147511A1/en
Publication of WO2001004870A1 publication Critical patent/WO2001004870A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/02Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos
    • G10H1/06Circuits for establishing the harmonic content of tones, or other arrangements for changing the tone colour
    • G10H1/12Circuits for establishing the harmonic content of tones, or other arrangements for changing the tone colour by filtering complex waveforms
    • G10H1/125Circuits for establishing the harmonic content of tones, or other arrangements for changing the tone colour by filtering complex waveforms using a digital filter
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/121Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
    • G10H2240/131Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
    • G10H2240/141Library retrieval matching, i.e. any of the steps of matching an inputted segment or phrase with musical database contents, e.g. query by humming, singing or playing; the steps may include, e.g. musical analysis of the input, musical feature extraction, query formulation, or details of the retrieval process
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/131Mathematical functions for musical analysis, processing, synthesis or composition
    • G10H2250/215Transforms, i.e. mathematical transforms into domains appropriate for musical signal processing, coding or compression
    • G10H2250/235Fourier transform; Discrete Fourier Transform [DFT]; Fast Fourier Transform [FFT]

Definitions

  • This invention refers to a method of automatic recognition of musical compositions and sound signals and it is used in order to identify musical compositions and sound signals transmitted by radio, TV and/or performed in public places.
  • the unknown musical composition or sound signal is received, in which the same procedure of extracting a corresponding set of characteristics is applied. These characteristics are compared with the corresponding sets of characteristics of the model signals and, by means of a number original criteria, it is decided if one (and which one exactly) of the model signals corresponds to the unknown signal under consideration. This procedure is described in figure 1.
  • the whole frequency band from 0 to 11025 Hz is divided to sub-bands that are almost exponentially distributed.
  • Hz is divided in 60 sub-bands.
  • each model signal is digitised with a random sampling frequency F s preferably greater than or equal to 11025 Hz and a window of 8192 or 16384 or 32768 sample length, slides on the obtained digitised signal.
  • F s random sampling frequency
  • a window of 8192 or 16384 or 32768 sample length slides on the obtained digitised signal.
  • an adaptive Fast Fourier Transform is applied and the Discrete Fourier Transform absolute value is obtained.
  • the frequency domain window is divided in sections according to the aforementioned frequency sub-bands choice (see Table 1) and then, in every such section, all the peaks of the absolute value of the Fourier transform are spotted and the greater one is obtained. The value of this peak is called "section representative".
  • Wf_ 32768 samples is obtained; notice that in any case this window will be of the same length with the sliding window which was used for the model signals.
  • the L greater value representatives are spotted, where the value of L is the same with the one used for the model signals.
  • the window slides for l samples where the value of ⁇ i may vary from 0,55 * F s to 1,9 * F s samples, with most frequently used value the
  • STEP is a parameter expressing the shift step, that usually belongs to the interval [0.005, 0.01], the more frequently used value being 0.0075.
  • the identification procedure described so far is depicted in figure 3.
  • each group of unknown signal representatives is being compared with elements of the set of representatives of each model signal separately.
  • each of the S+l groups of M unknown signal representatives is compared with groups of M model signal representatives by means of the method consisting of the following steps:
  • V ! [60555249474339343330292220171411952 l]
  • step E 2 If, indeed, it is greater than or equal to 0.5 ⁇ *L, we proceed to step E 2 below. If it is smaller than 0.51* , then we consider that the set of the tests performed so far did not result to a successful recognition, so, after considering U j as the next representative- vector of the model signal, we start the comparison procedure again, beginning from the comparison of the vector V j with the new U j .
  • step E 3 If it is greater or equal, we proceed to step E 3 below. If it is smaller, then we consider that the set of tests performed so far did not result to a successful recognition, so, after considering U as the next representative- vector of the model signal, the comparison procedure starts again beginning from the comparison of the vector V j with the new U j .
  • step E M If it is greater or equal, we proceed to step E M below. If it is smaller, then we consider that the set of tests performed so far did not result to a successful recognition, so, after considering U j as the next representative- vector of the model signal, the comparison procedure starts again beginning from the comparison of the vector V j with the new U j .
  • V M the M representative vector of the unknown signal corresponding to the same with V j shift coefficient fj .
  • the comparison procedure starts again beginning from the comparison of the vector V with the new U . If all possible vectors of the model signal are unsuccessfully compared with one group of representatives of the unknown signal corresponding to the specific shift coefficient / , then we repeat the comparison procedure, using the group of representatives of the unknown signal corresponding to the next shift coefficient f i+l . If the comparison of a specific set of model vectors with all (S+l) groups of representatives of the unknown signal is unsuccessful, then we proceed to the comparison of the unknown signal with another set of model vectors.
  • the L greater value representatives are spotted, where the value of L is the same with the one used in the first criterion.
  • the irrevocable group of representatives of the unknown signal is compared to elements of the set of the representatives of the model signal, by means of a method similar to the first criterion consisting of the steps briefly described below:

Abstract

The invention refers to a method of automatic recognition of musical compositions and sound signals, which is used for the identification of musical compositions and sound signals played by radio or TV, or performed in public places. According to this invention, there is a selection of a desirably large number of musical compositions and sound signals, which we want to identify. In every one of these signals an original procedure is applied leading to the extraction of a set of characteristics which will finally represent a model signal. Subsequently, for the implementation of the recognition, the unknown musical composition or sound signal is received and digitised. To its digitised version the same procedure of extracting a set of characteristics is applied. These are compared with the corresponding sets of the model signals and with original criteria it is decided if there is a model signal that corresponds to the unknown signal under consideration. Moreover, it is decided which model signal exactly corresponds to the unknown one.

Description

Method of Automatic Recognition of Musical Compositions and
Sound Signals
This invention refers to a method of automatic recognition of musical compositions and sound signals and it is used in order to identify musical compositions and sound signals transmitted by radio, TV and/or performed in public places.
During the past, efforts for the development of methods for the automatic recognition of musical compositions and sound signals have been made, that led to the creation of systems performing this task. However, these methods and the related systems manifest low percentage of successful recognition both for the musical compositions and for the sound signals of interest. The introduced method offers much better percentage of fully automatic recognition, higher or equal than ninety eight percent (98%). According to this invention, there is a selection of a desirably high number of musical compositions and sound signals, which we want to identify. For easy reference we will refer to these compositions and signals with the term model signals. In every one of these signals an original procedure is applied leading to the extraction of a set of characteristics which will finally represent each model signal. Subsequently, for the implementation of the recognition, the unknown musical composition or sound signal is received, in which the same procedure of extracting a corresponding set of characteristics is applied. These characteristics are compared with the corresponding sets of characteristics of the model signals and, by means of a number original criteria, it is decided if one (and which one exactly) of the model signals corresponds to the unknown signal under consideration. This procedure is described in figure 1.
It is stressed that, officially, there is no reference in international bibliography for a similar method or a relative system. In the world market there are very few similar systems which offer a percentage of successful recognition less than sixty percent (60%).
The invention is described more thoroughly below:
First, the whole frequency band from 0 to 11025 Hz is divided to sub-bands that are almost exponentially distributed. An implementation of such a division presented in Table 1. According to this implementation, the whole frequency band from 0 to 11025
Hz is divided in 60 sub-bands.
Subsequently, each model signal is digitised with a random sampling frequency Fs preferably greater than or equal to 11025 Hz and a window of 8192 or 16384 or 32768 sample length, slides on the obtained digitised signal. In every such window, an adaptive Fast Fourier Transform is applied and the Discrete Fourier Transform absolute value is obtained. Next, the frequency domain window is divided in sections according to the aforementioned frequency sub-bands choice (see Table 1) and then, in every such section, all the peaks of the absolute value of the Fourier transform are spotted and the greater one is obtained. The value of this peak is called "section representative". Then the L "representatives" with the greater values are spotted, where the value of L may vary from 13 to 30, while the most frequently used L value is 20. The indicators of the sections corresponding to these representatives, sorted in increasing order, form a vector, which constitutes the "representative-vector" of the window. The above procedure is repeated while the window slides on the whole digitised model signal thus creating all the representative vectors for the specific model signal. Notice that, while the window slides on the model signal, the generated representative vectors often remain unchanged in two successive windows, successive in the sense that they start in two positions differing one sample the one from the other. For this reason, in every representative vector we assign a number indicating the number of subsequent windows in which the specific vector remained unchanged. For that number we will use the name "number of repetitions" of the representative vector. For the set of the generated representative vectors of each model signal we will use the name "the model signal set of representatives". The aforementioned procedure is described in figure 2.
For the identification of the unknown sound signal, which from now on will be called the "unknown signal", the following procedure is used: A part of the unknown signal of length varying from eight (8) to sixteen (16) seconds is received, digitised and registered, at least temporarily. At the beginning of that unknown signal part a window of length W^ = 8192 or W ^ = 16834 or
Wf_ = 32768 samples is obtained; notice that in any case this window will be of the same length with the sliding window which was used for the model signals. In this window a Fast Fourier Transform is applied and the absolute value of it is obtained. Afterwards, all the peaks of the absolute value of the Fourier transform are spotted and S copies of these peaks are created. For the creation of every copy of the peaks, the positions of the peaks are multiplied with a different coefficient /, , i = 0, 1, ...,
S, which is called "window shift coefficient". Thus, S+l different groups of peaks are created. For every one of these groups the following procedure is realised: the section to which each peak corresponds according to the aforementioned frequency sub-bands division, is spotted (see Table 1). For every section to which at least one peak corresponds, the greater peak is kept. The value of this peak is called "representative of the section of the unknown signal corresponding to the shift coefficient ft ".
Next, the L greater value representatives are spotted, where the value of L is the same with the one used for the model signals. The indicators of the sections corresponding to these representatives, sorted in increasing order, form a vector, which constitutes the "first representative vector of the unknown signal corresponding to the shift coefficient // ".
Afterwards, the window slides for l samples, where the value of ^i may vary from 0,55 * Fs to 1,9 * Fs samples, with most frequently used value the
^1 = 1,4 * Fs . For the new window position and for every shift coefficient fj , i =
0, 1,...,S, (S+l) vectors are computed with the way that described above; each such vector will be called "second representative vector of the unknown signal corresponding to the shift coefficient _/) ". The above procedure is repeated for M-2 windows, where each window starts at a sample having a distance of lt samples from the start of the previous one, i = 2, 3, ..., M-l, where the value of M may fluctuate between 7 and 13 windows, the most usual value being M = 9. In this way S+l groups of M representative vectors are obtained; for each such group we will employ the name "group of unknown signal representative vectors corresponding to the shift coefficient ".
It must be stressed that for a specific application the i. t values, i = 1, 2,..., M-
1, are not necessary equal, but must be kept fixed throughout the whole procedure. The exact number (S+l) of the shift coefficients _ ) varies from 1 to 15, while their values are given by the formula: l + (— ) * STEP, if i odd
2
// = 1, if i = 0 i = \,2,...,S ,
\ - (ill) * STEP, if i even
where STEP is a parameter expressing the shift step, that usually belongs to the interval [0.005, 0.01], the more frequently used value being 0.0075. The identification procedure described so far is depicted in figure 3.
For the realisation of the unknown signal recognition, each group of unknown signal representatives is being compared with elements of the set of representatives of each model signal separately. To set ideas, each of the S+l groups of M unknown signal representatives is compared with groups of M model signal representatives by means of the method consisting of the following steps:
Ei) If the first representative vector of one group of the unknown signal is called Vj and the first representative vector of the of the model signal is called U| , then initially, the number of the common elements between these two vectors is calculated. For example, if L = 20 and
V! =[60555249474339343330292220171411952 l]
U!=[605855494741393733302825201714119642] then the number of the common elements is thirteen (13). Subsequently, it is checked if the number of the common elements between the vectors Vj and Uj is greater than or equal to the number 0.51* , which is called
"requisite similarity threshold". If, indeed, it is greater than or equal to 0.5 \*L, we proceed to step E2 below. If it is smaller than 0.51* , then we consider that the set of the tests performed so far did not result to a successful recognition, so, after considering Uj as the next representative- vector of the model signal, we start the comparison procedure again, beginning from the comparison of the vector Vj with the new Uj .
Ej) If the second representative vector of the unknown signal, corresponding to the same shift coefficient with Vj / , is called V2 and the representative vector of the model signal corresponding to the sample (£\ *fj ), is called U2 , then we calculate the number of the common elements between these two vectors. Afterwards, we check if the number of the common elements between the vectors V2 and U2 is greater than or equal to the "requisite similarity threshold".
If it is greater or equal, we proceed to step E3 below. If it is smaller, then we consider that the set of tests performed so far did not result to a successful recognition, so, after considering U as the next representative- vector of the model signal, the comparison procedure starts again beginning from the comparison of the vector Vj with the new Uj .
E(M-D) If the (M-l) representative vector of the unknown signal corresponding to the same with Vj shift coefficient fj , is called V(M_j) and the representative
M-l vector of the model signal corresponding to the sample 2_J μ * f ) is called
U(M-i) , then we calculate the number of the common elements between these two vectors.
Next, we check if the number of the common elements between the vectors M-l) a <^ U M_!) is greater than or equal to the "requisite similarity threshold".
If it is greater or equal, we proceed to step EM below. If it is smaller, then we consider that the set of tests performed so far did not result to a successful recognition, so, after considering Uj as the next representative- vector of the model signal, the comparison procedure starts again beginning from the comparison of the vector Vj with the new U j .
EM) If the M representative vector of the unknown signal corresponding to the same with Vj shift coefficient fj , is called VM and the representative vector of the
M-l model signal corresponding to the sample ∑ μ * f ) is called UM , then we μ=ϊ calculate the number of the common elements between these two vectors VM and UM and we check if it is greater than or equal to the "requisite similarity threshold". If it is greater or equal, we proceed to step EM+ι below. If it is smaller, then we consider that the set of tests performed so far did not result to a successful recognition, so, after considering as Uj the next representative- vector of the model signal, the checking procedure starts again beginning from the comparison of the vector V with the new Uj .
EM+I) First we check how many of the pairs (Vj ,Uj ), (V2,U2 ),..., (VM ,UM ) have, according to the previous comparisons, a number of common elements in the interval [0.51 *L, 0.71 *L . If the number of these pairs is greater than 0.34*M, then we consider that the set of tests performed so far did not result to a successful recognition, so, after considering U as the next representative-vector of the model signal, the comparison procedure starts again beginning from the comparison of the vector Vj with the new Uj . If the number of these pairs is smaller or equal than 0.34*M then the following check is realised: For the pairs of the vectors pairs (V , U ), (V2,U2), ..., (VM ,UM ), having already being compared, we calculate the mean value of the number of the common elements. If this mean value is greater than or equal to 0.71*/, then we consider that the comparison between the group of the M representatives of the model signal corresponding to the shift coefficient f that we checked and the group of the representatives of the unknown signal, is successful. If the mean value is smaller than 0.71* then we consider that the set of tests performed so far did not result to a successful recognition, so, after considering as Uj the next representative-vector of the model signal, the comparison procedure starts again beginning from the comparison of the vector V with the new U . If all possible vectors of the model signal are unsuccessfully compared with one group of representatives of the unknown signal corresponding to the specific shift coefficient / , then we repeat the comparison procedure, using the group of representatives of the unknown signal corresponding to the next shift coefficient fi+l . If the comparison of a specific set of model vectors with all (S+l) groups of representatives of the unknown signal is unsuccessful, then we proceed to the comparison of the unknown signal with another set of model vectors.
If the result of the above comparison is successful for a group of the unknown signal corresponding to a specific shift coefficient, let's say fε, we proceed to the application of the irrevocable comparison criterion, which will be described below. As it is already mentioned, the successful application of the first aforementioned criterion results to the determination of a group of M representatives of the model signal Uj , U2 ,..--UM which "fit" to the group of the representatives of the unknown signal Vj , V2, ...,VM corresponding to the specific shift coefficient fε . Since the positions of these vectors in their corresponding signals are now known, it is possible to realise a sequence of comparisons between vectors of the unknown signal, corresponding to the specific shift coefficient fε, with the vectors of the model signal formed at the specific positions where the first criterion was satisfied.
In this way, in the digitised unknown signal of duration from eight (8) to sixteen (16) seconds, a window of length We is obtained beginning at the unknown signal starting point. In this window a fast Fourier transform is applied again and its absolute value is obtained. Subsequently, the peaks of the Fourier transform are spotted and their positions are multiplied with the shift coefficient fε, which has been previously verified that satisfies the first criterion. Then, in each section, the peaks are sorted according to their value. In each section to which at least one peak has been previously ascribed, the greater peak is kept to form the "representative vector of the unknown signal". Next, the L greater value representatives are spotted, where the value of L is the same with the one used in the first criterion. The indicators of the sections corresponding to these representatives, sorted in increasing order, form a vector that constitutes the "first irrevocable representative-vector of the unknown signal".
Then, the window slides for k samples, where the value of kγ is equal to
H * (M — X) ne vaιue 0f β fluctuates between 30 and 50. For the new window
D -\ position a new vector is calculated, the same way as it was described before, called "second irrevocable representative-vector of the unknown signal". The above procedure is repeated for over D-2 windows, each one starting at a distance of kt
. * (M -\) samples from the start of its previous window, where kt = , / = 2, 3, ..., D-\.
In this way, finally, a group consisting of D representative- vectors is created.
We will refer to this group with the name "irrevocable group of representatives of the unknown signal".
In order to obtain the final decision if the unknown signal corresponds to the model signal in hand, the irrevocable group of representatives of the unknown signal is compared to elements of the set of the representatives of the model signal, by means of a method similar to the first criterion consisting of the steps briefly described below:
Ti) If the first irrevocable representative- vector of the unknown signal is called * * VJ and U j is called the representative- vector of the model signal corresponding to the position, let's say Λi, where the first criterion has been satisfied, then initially we calculate the number of the common elements between these two vectors.
T2) If the second irrevocable representative-vector of the unknown signal is called V2 , then this vector is compared with vector U2 , which is the representative vector of the model signal corresponding to the position
Figure imgf000007_0001
+ k * fε, where fε is the shift coefficient that has been calculated from the first criterion.
T(D-D) If the (D-l)* irrevocable representative- vector of the unknown signal is called V jj_ ) and the representative- vector of the model signal corresponding to
D-2 the sample ( j * fε) is called U(D_JJ , then we calculate the number of the y=ι common elements between these two vectors.
Finally, having calculated the number of the common elements between these D pairs of vectors, in order to decide for the identification, we check if the two conditions stated below are satisfied:
[Condition 1] At least 0.825 * D from the pairs of the vectors, have common number of elements greater than 0.71 * L. [Condition 2] The total number of the common elements of the vectors, namely the sum of the common elements of the pairs
(Vi ,Uj),(V2 jU^- V^-i^U .j))), is greater than 0.6875 * D * L.
If these two conditions are satisfied, then we have successfully recognised that the specific musical composition corresponds to the model signal in hand.
The whole procedure of the identification is described in the Figures 3, 4 and 5.
Figure imgf000009_0001
Table 1

Claims

The method for the automatic recognition of musical compositions and sound signals, which is used for the identification of musical compositions and sound signals played by radio or TV or performed in public places, is based on the existence of a procedure, which is applied to the model signals and results to the extraction of a set of characteristics, which will finally represent each model signal. Besides, it is based on a similar procedure, which applies to the unknown musical composition or sound signal for the extraction of similar characteristics and, finally, it is based on a procedure of comparison performed between the representative sets of characteristics of the model and the unknown signal. This method is characterised by the model sets of characteristics corresponding to division of the frequency domain in bands. It is also characterised by two original criteria for the decision of the identification, according to which, two musical compositions or sound signals are identified only when: a) A group of M representative vectors of the model signal U , U2 , ...,UM where two successive vectors are calculated at samples having distance ,■ , /' = 1, 2, ..., M-l, "match" with a group of representatives of the unknown signal Vj , V2, ..., VM , which corresponds to a specific shift coefficient fε . Notice that, the values of £j , i = 1, 2, ..., M-l, are not necessarily equal, but , in any case, are kept fixed throughout the application. The matching betweenU , U2, ...,UM and Vj , V2, ..., VM is realised by means of the following criterion:
All comparisons between the vectors of the pairs (Vj ,U ), (V2,U2), ..., ( VM , UM ) are made and the number of pairs with common elements in the interval [0.51 * L, 0.71 * L] is computed. If it is greater than 0.34 * M then we consider that the set of comparisons performed so far did not result to a successful recognition. If this number is smaller or equal than 0.34 * M then it is checked if the mean value of the number of common elements of vectors of the above pairs (Vj ,U ),
(V2,U2), ..., (VM ,UM ) is greater than or equal to 0.71 * L. If it is, then we consider that the comparison between the group of the M representatives, corresponding to the shift coefficient fε, of the model signal in hand and the group of representatives of the unknown signal is successful. b) A second group of D irrevocable representative- vectors of the model signal
U , U2, ...,Uj) being calculated at a distance k} the one from its previous, where
. e * (M -\) kj = , / = 1, 2, ...-Z , which are not necessarily equal, but are, in any case, kept fixed throughout the application, "match" with a group of representatives of the unknown signal Vj , V2 , ... , VD which corresponds to a specific shift coefficient fε , according to the following criterion: • At least 0.825*Z) from the pairs of the vectors (Vj ,U ),(V2 ,U2),...,(V(D_j) ,U _j)) , have common number of elements greater than 0.71 * L.
• The total number of the common elements of the vectors (namely the one that results from the summation of the common elements of the pairs
Figure imgf000011_0001
is greater than 0.6875 * D * L.
If both these conditions (a) and (b) are satisfied, then we have successfully recognised the specific musical composition.
PCT/GR2000/000024 1999-07-08 2000-07-07 Method of automatic recognition of musical compositions and sound signals WO2001004870A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP00940675A EP1147511A1 (en) 1999-07-08 2000-07-07 Method of automatic recognition of musical compositions and sound signals

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GR99100235 1999-07-08
GR990100235 1999-07-08

Publications (1)

Publication Number Publication Date
WO2001004870A1 true WO2001004870A1 (en) 2001-01-18

Family

ID=10943871

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GR2000/000024 WO2001004870A1 (en) 1999-07-08 2000-07-07 Method of automatic recognition of musical compositions and sound signals

Country Status (3)

Country Link
EP (1) EP1147511A1 (en)
GR (1) GR1003625B (en)
WO (1) WO2001004870A1 (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002011123A2 (en) * 2000-07-31 2002-02-07 Shazam Entertainment Limited Method for search in an audio database
WO2002073593A1 (en) * 2001-03-14 2002-09-19 International Business Machines Corporation A method and system for the automatic detection of similar or identical segments in audio recordings
DE10117870A1 (en) * 2001-04-10 2002-10-31 Fraunhofer Ges Forschung Method and device for converting a music signal into a note-based description and method and device for referencing a music signal in a database
WO2003009273A1 (en) * 2001-07-16 2003-01-30 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e. V. Method and device for characterising a signal and for producing an indexed signal
WO2003054852A2 (en) * 2001-12-06 2003-07-03 Hewlett-Packard Company System and method for music inditification
EP1387514A2 (en) * 2002-07-31 2004-02-04 British Broadcasting Corporation Signal comparison method and apparatus
EP1504445A1 (en) * 2002-04-25 2005-02-09 Shazam Entertainment Limited Robust and invariant audio pattern matching
DE102004023436A1 (en) * 2004-05-10 2005-12-08 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for analyzing an information signal
DE102004028694B3 (en) * 2004-06-14 2005-12-22 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for converting an information signal into a variable resolution spectral representation
US7214870B2 (en) 2001-11-23 2007-05-08 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Method and device for generating an identifier for an audio signal, method and device for building an instrument database and method and device for determining the type of an instrument
DE10232916B4 (en) * 2002-07-19 2008-08-07 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for characterizing an information signal
US7653534B2 (en) 2004-06-14 2010-01-26 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for determining a type of chord underlying a test signal
US7739062B2 (en) 2004-06-24 2010-06-15 Landmark Digital Services Llc Method of characterizing the overlap of two media segments
US7881931B2 (en) 2001-07-20 2011-02-01 Gracenote, Inc. Automatic identification of sound recordings
US7986913B2 (en) 2004-02-19 2011-07-26 Landmark Digital Services, Llc Method and apparatus for identificaton of broadcast source
US8090579B2 (en) 2005-02-08 2012-01-03 Landmark Digital Services Automatic identification of repeated material in audio signals
US8453170B2 (en) 2007-02-27 2013-05-28 Landmark Digital Services Llc System and method for monitoring and recognizing broadcast data
US8725829B2 (en) 2000-07-31 2014-05-13 Shazam Investments Limited Method and system for identifying sound signals
JP2016512610A (en) * 2013-02-04 2016-04-28 テンセント・テクノロジー・(シェンジェン)・カンパニー・リミテッド Method and device for audio recognition
US10354307B2 (en) 2014-05-29 2019-07-16 Tencent Technology (Shenzhen) Company Limited Method, device, and system for obtaining information based on audio input

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5210820A (en) * 1990-05-02 1993-05-11 Broadcast Data Systems Limited Partnership Signal recognition system and method
US5778335A (en) * 1996-02-26 1998-07-07 The Regents Of The University Of California Method and apparatus for efficient multiband celp wideband speech and music coding and decoding
US5874686A (en) * 1995-10-31 1999-02-23 Ghias; Asif U. Apparatus and method for searching a melody

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5210820A (en) * 1990-05-02 1993-05-11 Broadcast Data Systems Limited Partnership Signal recognition system and method
US5874686A (en) * 1995-10-31 1999-02-23 Ghias; Asif U. Apparatus and method for searching a melody
US5778335A (en) * 1996-02-26 1998-07-07 The Regents Of The University Of California Method and apparatus for efficient multiband celp wideband speech and music coding and decoding

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8190435B2 (en) 2000-07-31 2012-05-29 Shazam Investments Limited System and methods for recognizing sound and music signals in high noise and distortion
US6990453B2 (en) 2000-07-31 2006-01-24 Landmark Digital Services Llc System and methods for recognizing sound and music signals in high noise and distortion
US8725829B2 (en) 2000-07-31 2014-05-13 Shazam Investments Limited Method and system for identifying sound signals
US7346512B2 (en) 2000-07-31 2008-03-18 Landmark Digital Services, Llc Methods for recognizing unknown media samples using characteristics of known media samples
US7865368B2 (en) 2000-07-31 2011-01-04 Landmark Digital Services, Llc System and methods for recognizing sound and music signals in high noise and distortion
US8700407B2 (en) 2000-07-31 2014-04-15 Shazam Investments Limited Systems and methods for recognizing sound and music signals in high noise and distortion
WO2002011123A3 (en) * 2000-07-31 2002-05-30 Shazam Entertainment Ltd Method for search in an audio database
US10497378B2 (en) 2000-07-31 2019-12-03 Apple Inc. Systems and methods for recognizing sound and music signals in high noise and distortion
US8386258B2 (en) 2000-07-31 2013-02-26 Shazam Investments Limited Systems and methods for recognizing sound and music signals in high noise and distortion
US9899030B2 (en) 2000-07-31 2018-02-20 Shazam Investments Limited Systems and methods for recognizing sound and music signals in high noise and distortion
JP2004505328A (en) * 2000-07-31 2004-02-19 シャザム エンターテインメント リミテッド System and method for recognizing sound / musical signal under high noise / distortion environment
WO2002011123A2 (en) * 2000-07-31 2002-02-07 Shazam Entertainment Limited Method for search in an audio database
US9401154B2 (en) 2000-07-31 2016-07-26 Shazam Investments Limited Systems and methods for recognizing sound and music signals in high noise and distortion
WO2002073593A1 (en) * 2001-03-14 2002-09-19 International Business Machines Corporation A method and system for the automatic detection of similar or identical segments in audio recordings
DE10117870B4 (en) * 2001-04-10 2005-06-09 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and apparatus for transferring a music signal into a score-based description and method and apparatus for referencing a music signal in a database
US7064262B2 (en) 2001-04-10 2006-06-20 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method for converting a music signal into a note-based description and for referencing a music signal in a data bank
DE10117870A1 (en) * 2001-04-10 2002-10-31 Fraunhofer Ges Forschung Method and device for converting a music signal into a note-based description and method and device for referencing a music signal in a database
US7478045B2 (en) 2001-07-16 2009-01-13 M2Any Gmbh Method and device for characterizing a signal and method and device for producing an indexed signal
WO2003009273A1 (en) * 2001-07-16 2003-01-30 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e. V. Method and device for characterising a signal and for producing an indexed signal
US7881931B2 (en) 2001-07-20 2011-02-01 Gracenote, Inc. Automatic identification of sound recordings
US7214870B2 (en) 2001-11-23 2007-05-08 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Method and device for generating an identifier for an audio signal, method and device for building an instrument database and method and device for determining the type of an instrument
US6995309B2 (en) 2001-12-06 2006-02-07 Hewlett-Packard Development Company, L.P. System and method for music identification
WO2003054852A3 (en) * 2001-12-06 2003-12-04 Hewlett Packard Co System and method for music inditification
WO2003054852A2 (en) * 2001-12-06 2003-07-03 Hewlett-Packard Company System and method for music inditification
EP1504445A4 (en) * 2002-04-25 2005-08-17 Shazam Entertainment Ltd Robust and invariant audio pattern matching
EP1504445A1 (en) * 2002-04-25 2005-02-09 Shazam Entertainment Limited Robust and invariant audio pattern matching
US7627477B2 (en) 2002-04-25 2009-12-01 Landmark Digital Services, Llc Robust and invariant audio pattern matching
DE10232916B4 (en) * 2002-07-19 2008-08-07 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for characterizing an information signal
EP1387514A2 (en) * 2002-07-31 2004-02-04 British Broadcasting Corporation Signal comparison method and apparatus
EP1387514A3 (en) * 2002-07-31 2008-12-10 British Broadcasting Corporation Signal comparison method and apparatus
US8811885B2 (en) 2004-02-19 2014-08-19 Shazam Investments Limited Method and apparatus for identification of broadcast source
US8290423B2 (en) 2004-02-19 2012-10-16 Shazam Investments Limited Method and apparatus for identification of broadcast source
US7986913B2 (en) 2004-02-19 2011-07-26 Landmark Digital Services, Llc Method and apparatus for identificaton of broadcast source
US9225444B2 (en) 2004-02-19 2015-12-29 Shazam Investments Limited Method and apparatus for identification of broadcast source
US9071371B2 (en) 2004-02-19 2015-06-30 Shazam Investments Limited Method and apparatus for identification of broadcast source
DE102004023436A1 (en) * 2004-05-10 2005-12-08 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for analyzing an information signal
US8065260B2 (en) 2004-05-10 2011-11-22 Juergen Herre Device and method for analyzing an information signal
DE102004023436B4 (en) * 2004-05-10 2006-06-14 M2Any Gmbh Apparatus and method for analyzing an information signal
US8017855B2 (en) 2004-06-14 2011-09-13 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for converting an information signal to a spectral representation with variable resolution
DE102004028694B3 (en) * 2004-06-14 2005-12-22 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for converting an information signal into a variable resolution spectral representation
US7653534B2 (en) 2004-06-14 2010-01-26 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for determining a type of chord underlying a test signal
US7739062B2 (en) 2004-06-24 2010-06-15 Landmark Digital Services Llc Method of characterizing the overlap of two media segments
US9092518B2 (en) 2005-02-08 2015-07-28 Shazam Investments Limited Automatic identification of repeated material in audio signals
US8090579B2 (en) 2005-02-08 2012-01-03 Landmark Digital Services Automatic identification of repeated material in audio signals
US8453170B2 (en) 2007-02-27 2013-05-28 Landmark Digital Services Llc System and method for monitoring and recognizing broadcast data
JP2016512610A (en) * 2013-02-04 2016-04-28 テンセント・テクノロジー・(シェンジェン)・カンパニー・リミテッド Method and device for audio recognition
US10354307B2 (en) 2014-05-29 2019-07-16 Tencent Technology (Shenzhen) Company Limited Method, device, and system for obtaining information based on audio input

Also Published As

Publication number Publication date
GR1003625B (en) 2001-08-31
GR990100235A (en) 2001-03-30
EP1147511A1 (en) 2001-10-24

Similar Documents

Publication Publication Date Title
EP1147511A1 (en) Method of automatic recognition of musical compositions and sound signals
Delforouzi et al. Adaptive digital audio steganography based on integer wavelet transform
US6453252B1 (en) Process for identifying audio content
JP4418748B2 (en) System and method for identifying and segmenting media objects repeatedly embedded in a stream
JP5728888B2 (en) Signal processing apparatus and method, and program
JP2006505821A (en) Multimedia content with fingerprint information
JP2000105146A (en) Method and apparatus for specifying sound in composite sound signal
EP1515310A1 (en) A system and method for providing high-quality stretching and compression of a digital audio signal
US10089994B1 (en) Acoustic fingerprint extraction and matching
US20060041753A1 (en) Fingerprint extraction
CA2537328A1 (en) Method of processing and storing mass spectrometry data
EP1451803A2 (en) System and method for music identification
CN110277087B (en) Pre-judging preprocessing method for broadcast signals
Gajic et al. Robust speech recognition using features based on zero crossings with peak amplitudes
CN106716529A (en) Discrimination and attenuation of pre-echoes in a digital audio signal
KR100527002B1 (en) Apparatus and method of that consider energy distribution characteristic of speech signal
Yamashita et al. Spectral subtraction iterated with weighting factors
Richly et al. Short-term sound stream characterization for reliable, real-time occurrence monitoring of given sound-prints
GB2294619A (en) Inaudible insertion of information into an audio signal
Wang et al. Audio fingerprint based on spectral flux for audio retrieval
Adjila et al. Silence Detection and Removal Method Based on the Continuous Average Energy of Speech Signal
Tang Evaluation of double sided periodic substitution (DSPS) method for recovering missing speech in packet voice communications
WO1998022935A9 (en) Formant extraction using peak-picking and smoothing techniques
Jančovič et al. A probabilistic union model with automatic order selection for noisy speech recognition
Jabloun et al. On the use of masking properties of the human ear in the signal subspace speech enhancement approach

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

WWE Wipo information: entry into national phase

Ref document number: 2000940675

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWP Wipo information: published in national office

Ref document number: 2000940675

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 2000940675

Country of ref document: EP