CN104102923A - Nipponia nippon individual recognition method based on MFCC algorithm - Google Patents

Nipponia nippon individual recognition method based on MFCC algorithm Download PDF

Info

Publication number
CN104102923A
CN104102923A CN201410338974.0A CN201410338974A CN104102923A CN 104102923 A CN104102923 A CN 104102923A CN 201410338974 A CN201410338974 A CN 201410338974A CN 104102923 A CN104102923 A CN 104102923A
Authority
CN
China
Prior art keywords
signal
song
network
mfcc
zhu ibis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410338974.0A
Other languages
Chinese (zh)
Inventor
王民
张立材
王佳丽
王稚慧
张炜炜
卫名斐
曹清菁
要趁红
赵伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Architecture and Technology
Original Assignee
Xian University of Architecture and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Architecture and Technology filed Critical Xian University of Architecture and Technology
Priority to CN201410338974.0A priority Critical patent/CN104102923A/en
Publication of CN104102923A publication Critical patent/CN104102923A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a nipponia nippon individual recognition method based on an MFCC (Mel Frequency Cepstrum Coefficient) algorithm. The method comprises the following steps that: (1) collected nipponia nippon warbling signals are subjected to preprocessing including pre-emphasis, framing and windowing, and end point detection; (2) through fast Fourier transform, a power spectrum of the nipponia nippon warbling signals is obtained, and after the power spectrum is smoothened, a measure of combining the MFCC, a Mid MFCC and an IMFCC (Inverse Mel Frequency Cepstrum Coefficient) is used for extracting feature parameters; and (3) an improved wavelet neural network is utilized for carrying out training recognition on the feature parameters of the nipponia nippon warbling signals. The nipponia nippon individual recognition method has good anti-noise performance, the feature parameters which can better express the nipponia nippon warbling features can be extracted out, and the recognition rate of a system can be improved.

Description

A kind of based on MFCC algorithm Zhu Ibis individual discrimination method
[technical field]
The invention belongs to signal processing technology field, particularly a kind of individual discrimination method.
[background technology]
ZhuIbis Shi world today level in imminent danger birds (IUCN grade in imminent danger: in imminent danger, 1996), be that China's one-level is laid special stress on protecting animal [1].Once be distributed widely in history China, Japan, the Korea peninsula and Muscovite the Far East Area, since the fifties in last century, the quantity of killing etc. reason Zhu Ibis due to ecological deterioration and the mankind sharply reduces, its wild stocks becomes extinct in succession on USSR (Union of Soviet Socialist Republics), Japan and the Korea peninsula and other places, and China does not once have the record of Zhu Ibis in 20 years yet.From 1981, in Yang County, rediscover after 7 wild Zhu Ibis, the Chinese government has taked strong safeguard measure, has set up national Zhu Ibis wilderness area.Birdvocalization is playing important impetus in birds are evolved, and is also a kind of important way of carrying out communication between birds individuality.Thereby Dui Zhu Ibis song signal is analyzed and researched and can further be understood the line trace of going forward side by side of the physiology of Zhu Ibis and behavioral mechanism, can constantly grasp the situation of Zhu Ibis, and then protect better Zhu Ibis.
External existing more research aspect bird acoustic communication, and the research work of domestic this area mainly concentrates on Passeriformes kind.In Non-passeriformes birds, domestic the song characteristics to several rare chicken classes has certain research.Zhu Ibis is since 1981 are rediscovered; domestic and international many research workers have successively carried out comparatively comprehensively research to aspects such as its life habit, ecological biology, science of heredity, anatomy, captive breeding and protections, but the research of Dui Zhu Ibis song report is less.Chinese Academy of Sciences animal Sun Yuehua, Bi Zhonglin etc. [2]in 2002, the tweeting sound of Gansu willow warbler has been carried out to voice print analysis.Thunder rich people etc. [3]in 2004, tentatively sum up Bird Songs complicacy and multifarious Some features, studied the song structure of white waist snow finch.The Bai Ya of Shaanxi Normal University in 2005 etc. [4]people carries out the research of Zhu Ibis song characteristics and behavior relation, gather and analyzed 6 kinds of cry signals of Yang County Zhu Ibis: warning cry, mating cry, intend handing over cry, reason plumage cry, flight cry and daily calling cry, made time domain waveform figure, power spectrum chart and the sonagram of various crys.Guo Min etc. [5]in 2007, Louguantai Taoist Temple the raise in cages song characteristics of Zhu Ibis breeding period and different growth and development stages in Zhouzhi County has been done to voice print analysis, pointed out that Zhu Ibis is at remarkable, the different behavior state of feature difference of different its songs of growth phase, its song characteristics difference is also remarkable.The Chen Lixia of Beijing Forestry University in 2012 etc. [6]song characteristics and the mating song of people Dui Zhu Ibis are analyzed.
In prior art, be to be all studied for the relation between Zhu Ibis song characteristics and song characteristics and behavior, but all do not relate to the identification of Zhu Ibis individuality.
The document of quoting in literary composition is as follows:
[1] Liu Bin; Korea Spro's is bright; Liu Yan etc. the Random amplified polymorphic DNA analysis of Zhu Juan and genonomy research. China Association of Wild Animal Protection; birds in China association; Shaanxi Province Wildlife Conservation Society chief editor. rare rare bird one Zhu Juan, 99 international Zhu He protection Conference Papers collection [C]. Beijing: the .ZQ of China Forestry Publishing House with) 2000:31-36.
[2] Sun Yuehua, Bi Zhonglin, Jia Chenxi etc., the voice print analysis of Lianhua Shan Mountain Gansu willow warbler and Breeding notes [J], zoology magazine, 2002,37 (5), 62-65.
[3] thunder rich people, Wang Aizhen, Wang Gang etc., the complicacy [J] of the white waist snow finch song structure in Qinghai-Tibet Platean, animal journal, 2004,50 (3), 348-356.
[4] Bai Ya. the research [D] of Rare Birds Zhu Ibis song characteristics and behavior relation. Xi'an: Shaanxi Normal University, 2005.
[5] Guo Min, Wu Xiaomin, appoints construction etc., and Zhu Ibis that raises in cages breeds the song characteristics [J] of breeding period, animal journal, 53 (5): 819-825,2007.
[6] Chen Lixia. Zhu Ibis song characteristics and mating Acoustical Analysis [D]. Beijing: Beijing Forestry University, 2012.
[summary of the invention]
The object of the present invention is to provide a kind of based on MFCC algorithm Zhu Ibis individual discrimination method, to solve the problems of the technologies described above.
To achieve these goals, the present invention adopts following technical scheme:
Based on a MFCC algorithm Zhu Ibis individual discrimination method, comprise the following steps:
The training identification of A, song signal:
The song signal of the N Zhi Zhu Ibis of A1, admission, two kinds of different song signals of every Zhi Zhu Ibis admission; The song signal of the N Zhi Zhu Ibis of admission is divided into two groups, and one group of song is as training sample, and one group of song is as test sample book;
A2, with digital filter, respectively two groups of speech datas are carried out to pre-emphasis, then carry out end-point detection after dividing frame windowing operation to it, then calculate frame by frame the MFCC coefficient of song signal, and by its preservation;
A3, the structure wavelet neural network of three layers, arrange initial network parameter: network structure is 72-14-10, and learning rate is 0.8, and training error precision is 0.001;
A4, the phonetic feature coefficients by using wavelet neural network algorithm of every Zhi Zhu Ibis is carried out to training network, by genetic algorithm, carry out the weights of optimization neural network simultaneously;
A5, setup parameter: population scale pop_size=100, crossover probability p c=0.6, variation Probability p m=0.01;
A6, produce one group of real-valued string population at random, each individuality consists of the initial weight of network;
A7, the individuality in real-valued string is decoded, generate corresponding network structure, network structure is 72-14-10;
A8, operational network, according to the fitness value of following formula calculating colony individuality, evaluating network performance;
f = 1 E + 1
The output error that wherein E is wavelet neural network, n is training sample sum, y pwith be respectively P individual training sample output and actual output vector;
A9, according to each ideal adaptation degree, from colony, select two fitness maximums with Probability p ccarry out interlace operation, then with probability with genetic probability p mcarry out mutation operation, produce thus two new individualities; Adopt in this way, heredity produces the individuality that makes new advances successively, and so iteration, produces population of future generation, forms next generation network;
A10, until network convergence reach the training error precision 0.001 of expectation stops evolving, preserves optimum weights, Output rusults, the corresponding group network weights of each voice signal, otherwise repetitive operation steps A 7~steps A 9;
A11, training finish the training sample output matrix of all Zhu Ibis of rear acquisition song;
B, Zhu Ibis song signal to be identified is carried out to pre-service;
C, to carrying out pretreated song signal, carry out the extraction of characteristic parameter: adopt MFCC characteristic parameter;
The speech characteristic parameter coefficient that in the input layer of D, the neural network after steps A training, input step C extracts, repeating step A5 to A10 obtains group network weights corresponding to Zhu Ibis song signal to be identified, call this network weight and calculate network output matrix, compare one by one with training sample output matrix, that voice signal of error minimum is recognition result.
Preferably, step B specifically comprises the following steps:
B1, the song signal of the single Zhu Ibis gathering is carried out to pre-emphasis;
B2, by carrying out song signal after pre-emphasis, carry out windowing and divide frame;
B3, the song signal carrying out after windowing divides frame to process is carried out to end-point detection, thus starting point and the terminating point of judgement signal.
Preferably, step B1 adopts zero of order 1 digital filter to realize pre-emphasis, and its form is:
H(z)=1-az -1,0.9<a<1
Wherein, H (z) is system function, and a is pre emphasis factor, leads to and gets 0.95,0.97 or 0.98;
Step B2 carries out windowing and divides frame to process carrying out song signal after pre-emphasis, frame length 32ms, and frame moves 16ms, and window function adopts Hamming window, as follows:
W in formula (n) is window function, and N is length of window, gets 256;
When step B3 carries out end-point detection to Zhu Ibis song signal, employing is carried out end-point detection based on short-time energy and the end-point detection algorithm that average zero-crossing rate combines to signal, thus starting point and the terminating point of judgement signal; The short-time energy E of signal { x (n) } nbe defined as:
E n = &Sigma; m = - 8 &infin; [ x ( m ) w ( n - m ) ] 2
E nbe illustrated in n short-time energy of putting while starting windowed function of signal, n is sampled point superposition number, and m is the number of sampled point, and x (m) is voice signal to be measured, and w (n-m) is window function.
Preferably, step C specifically comprises the following steps:
C1, calculating are composed X (n) through the discrete power of pretreated Zhu Ibis song signal; After having determined the sampling number of the every frame of signal, every frame voice sequence x (n) is done to discrete FFT conversion, then the squared discrete power spectrum X (n) that calculates Zhu Ibis song signal;
C2, utilize moving average filter w (l) to carry out smoothly, obtaining smoothly composing Y (n) to discrete power spectrum X (n), wherein w (l) as shown in the formula:
w ( l ) = N - | l | N 2 , - ( N - 1 ) &le; l &le; ( N - 1 )
N is the width of moving window, and l is current sampled value, and sample frequency is 8kHz, and FFT gets 512 points, N=13;
C3, the maximal value of getting original amplitude spectrum X (n) and the corresponding frequency of level and smooth spectrum Y (n), obtain Z (n), i.e. Z (n)=max (X[n], Y[n]);
C4, repeat after C2, C3 process i time the estimation using the Z (n) of finally acquisition as X (n) spectrum envelope; I gets 5 times;
Performance number parameter after M the bandpass filter that the final Z (n) obtaining of C5, obtaining step C4 combines by MFCC, IMFCC, MidMFCC, wherein this performance number is by calculating Z (n) and H m(n) sum of products on Frequency point obtains, so obtain M performance number parameter p m(m=0,1 ..., M-1);
C6, to P mask natural logarithm, and then do discrete cosine transform (DCT) by parameter transformation to cepstrum domain:
C k = &Sigma; j = 1 24 log ( P j ) cos [ k ( j - 1 2 ) &pi; 24 ] , k = 1,2 , . . . , P
{ C wherein kbe exactly the MFCC parameter of standard, the exponent number that P is MFCC, p jbe j performance number parameter, j is present frame.
Based on a MFCC algorithm Zhu Ibis individual discrimination method, comprise the following steps:
(1) song Signal Pretreatment: carry out pre-emphasis to collecting Zhu Ibis song signal; Then divide frame windowing operation, institute's windowing is Hamming window; End-point detection, adopts the end-point detection combining based on short-time energy and short-time zero-crossing rate;
(2) extraction of song signal characteristic parameter: adopt MFCC characteristic parameter;
(3) training of song signal identification: by characteristic parameter is trained, call respectively the network weight of having preserved and calculate result and draw network output matrix, compare one by one with desired output matrix, that of error minimum is recognition result.
The characteristic extraction step of above-mentioned Zhu Ibis song signal is as follows:
(1) calculate discrete power spectrum X (n), after having determined the sampling number of the every frame of signal, do discrete FFT conversion after every frame voice sequence x (n) is added to Hamming window, then the squared discrete power that just can calculate is composed;
(2) utilize moving average filter w (l) to carry out smoothly, obtaining smoothly composing Y (n) to X (n), wherein w (l) as shown in the formula:
w ( l ) = N - | l | N 2 , - ( N - 1 ) &le; l &le; ( N - 1 )
Sample frequency is 8kHz, when FFT gets at 512, gets N=13, the about 400Hz of non-vanishing interval width;
(3) get the maximal value of original amplitude spectrum X (n) and the level and smooth corresponding frequency of Y (n), obtain Z (n), be i.e. Z (n)=max (X[n], Y[n]);
(4) repeat after (2), (3) process i time the estimation using the Z (n) of finally acquisition as X (n) spectrum envelope.According to experiment, i get 5 times proper;
(5) obtain the performance number parameter after M the bandpass filter that Z (n) combines by MFCC, IMFCC, MidMFCC, this performance number is by calculating Z (n) and H m(n) sum of products on Frequency point obtains, so obtain M performance number parameter p m(m=0,1 ..., M-1);
(6) to P mask natural logarithm, and then do discrete cosine transform (DCT) by parameter transformation to cepstrum domain:
C k = &Sigma; j = 1 24 log ( P j ) cos [ k ( j - 1 2 ) &pi; 24 ] , k = 1,2 , . . . , P
{ C wherein kbe exactly the MFCC parameter of standard, the exponent number that P is MFCC, p jbe j performance number parameter, j is present frame.
The training identification step of the song signal of Zhu Ibis is as follows:
First utilize genetic algorithm to be optimized network weight:
(1) encoding scheme.Coding is very large to the performance of network evolution process and effectiveness affects, and therefore, coding techniques is to connect matter of utmost importance and the committed step that needs solution in weights evolutionary process.Consider the larger of network parameter, if genetic algorithms use binary coding, can cause the problems such as chromosome length is partially long, search volume is large, search efficiency is low, the nodes of wavelet neural network and structure are fixing herein, can adopt real coding scheme, the threshold value of the weights of network and each node is arranged in order and obtains a vector.
(2) selection of fitness function.The leading indicator of weighing network performance is the actual output of network and the error sum of squares between desired output.In neural network, error sum of squares is less, represents that this network performance is better.
Definition fitness function is:
f = 1 E + 1
The output error that wherein E is wavelet neural network, n is training sample sum, y pwith be respectively desired output and the actual output vector of each and every one individuality of P;
(3) genetic manipulation.Select operator: from current population, select winning (being that fitness is high) individuality and superseded worst individual.After the iteration of every generation, it is population of future generation that the new population that a certain alternative solution forms just can be selectedly out used as.New population can either only be chosen filial generation, also can choose together with in parent from filial generation.Selecting operator is in order to guarantee that excellent individuality can be genetic to by the next generation.Crossover operator: intersection is to produce new individuality by the recombinate part-structure of two parent individualities of replacement.In genetic algorithm, interlace operation is topmost genetic manipulation.Crossover probability p crefer to the rear algebraically of intersection generation in each generation and the ratio of population scale, this probability shows there is p cthe individual chromosome of * pop_size (population scale) carries out interlace operation.Crossover probability span is 0.50~0.85.Mutation operator: variation changes the genic value in some gene position of the individuality string in colony exactly.Variation probability refers to that the gene number making a variation in population accounts for the ratio of total gene number, and its value has been controlled the ratio that new gene is introduced, and this probability shows there is p mthe individual chromosome of * pop_size (population scale) carries out mutation operation.The order of magnitude scope of conventional aberration rate is 0.001~0.01.
(4) end condition.The termination that adopts given iterations and fitness to control genetic algorithm.If iterations has been finished or the fitness value of the optimum individual in certain generation is more than or equal to value given in advance, algorithm just finishes.
Then network is trained:
(1) construct the wavelet neural network of three layers, initial network parameter is set, network structure is 72-14-10, and getting learning rate is 0.8, and training error precision is 0.001.
(2) the phonetic feature coefficient of every bird is carried out to training network with wavelet neural network algorithm, by genetic algorithm, carry out the weights of optimization neural network simultaneously, until network convergence reach the training error precision 0.001 of expectation is preserved optimum weights.The corresponding group network weights of each voice signal.
(3) in the input layer of network, input speech characteristic parameter coefficient, call respectively the network weight of having preserved and calculate result and draw network output matrix, compare one by one with desired output matrix, that voice signal of error minimum is recognition result.
With respect to prior art, the present invention has following beneficial effect: the present invention is when extracting characteristic parameter, in conjunction with the feature that Zhu Ibis song population frequency scope is wide, frequency spectrum shows as an envelope, after level and smooth to amplitude spectrum, adopt MFCC, MidMFCC, IMFCC to combine, can extract better the characteristic parameter that can characterize Zhu Ibis song feature, and there is good noise immunity, improve the discrimination of system.
[accompanying drawing explanation]
Fig. 1 is system chart of the present invention;
Fig. 2 is genetic algorithm process flow diagram.
[embodiment]
Below in conjunction with the drawings and specific embodiments, the present invention is described in detail.
As shown in Figure 1, be a kind of individual discrimination method block diagram based on Zhu Ibis song of the present invention.The concrete steps of being somebody's turn to do the individual discrimination method based on Zhu Ibis song are as follows:
1, the pre-service of song signal
(1) the song signal of the single Zhu Ibis gathering is carried out to pre-emphasis
Because the average power of song signal can be subject to the impact of glottal excitation, HFS approximately more than 800Hz is starting significantly decay, while asking voice spectrum, HFS composition is less, than the frequency spectrum difficulty of low frequency part, asks, add low frequency end and be mixed with the interfere information of 50Hz or 60Hz, so carry out pre-emphasis, its object promotes HFS exactly, filtering low-frequency disturbance, what make that the frequency spectrum of signal becomes is smooth, is convenient to carry out the analysis of spectrum analysis and channel parameters.
Way of the present invention is to adopt prior art to use zero of order 1 digital filter to realize pre-emphasis.Its form is:
H(z)=1-az -1,0.9<a<1
Wherein, H (z) is system function, and a is pre emphasis factor, and it is worth close to 1, conventionally gets 0.95,0.97,0.98, and the present invention gets 0.97.
(2) by carrying out song signal after pre-emphasis, carry out windowing and divide frame
The characteristic of Zhu Ibis song signal is a time dependent non-stationary process, and according to the research to song signal, although song signal has time varying characteristic, but in short time range, its property preservation is relatively stable, substantially remains unchanged.Therefore, by carrying out song signal after pre-emphasis, carry out windowing and divide frame to process.Frame length 32ms, frame moves 16ms, and window function adopts Hamming window, as follows:
W in formula (n) is window function, and N is length of window, and the present invention gets 256.
(3) to carrying out windowing, divide the song signal after frame is processed to carry out end-point detection
The present invention adopts prior art when Zhu Ibis song signal is carried out to end-point detection---based on short-time energy and the end-point detection algorithm that average zero-crossing rate combines, signal is carried out to end-point detection, thus starting point and the terminating point of judgement signal.The short-time energy E of signal { x (n) } nbe defined as:
E n = &Sigma; m = - 8 &infin; [ x ( m ) w ( n - m ) ] 2
E nbe illustrated in n short-time energy of putting while starting windowed function of signal, n is sampled point superposition number, and m is the number of sampled point, and x (m) is voice signal to be measured, and w (n-m) is window function.Short-time average zero-crossing rate is defined as:
Z n = 1 2 &Sigma; m = - &infin; + &infin; | sgn [ x ( m ) ] - sgn [ x ( m - 1 ) ] | &CenterDot; w ( n - m )
In formula, sgn is as sign function, that is:
sgn [ x ( n ) ] = 1 , x ( n ) &GreaterEqual; 0 - 1 , x ( n ) < 0
2, the characteristic parameter extraction of song signal
The characteristic parameter extraction of song signal is exactly the parameter that extracts reflection Zhu Ibis song individual character from song signal, and detailed process is as follows:
(1) calculate the discrete power spectrum X (n) through pretreated Zhu Ibis song signal.After having determined the sampling number of the every frame of signal, every frame voice sequence x (n) is done to discrete FFT conversion, then the squared discrete power spectrum X (n) that calculates Zhu Ibis song signal;
(2) utilize moving average filter w (l) to carry out smoothly, obtaining smoothly composing Y (n) to discrete power spectrum X (n), wherein w (l) as shown in the formula:
w ( l ) = N - | l | N 2 , - ( N - 1 ) &le; l &le; ( N - 1 )
N is the width of moving window, and l is current sampled value, and sample frequency is 8kHz, when FFT gets at 512, gets N=13.
(3) get the maximal value of original amplitude spectrum X (n) and the corresponding frequency of level and smooth spectrum Y (n), obtain Z (n), be i.e. Z (n)=max (X[n], Y[n]);
(4) repeat after (2), (3) process i time the estimation using the Z (n) of finally acquisition as X (n) spectrum envelope.According to experiment, i of the present invention gets 5 times;
(5) the performance number parameter after M the bandpass filter that the final Z (n) obtaining of obtaining step (4) combines by MFCC, IMFCC, MidMFCC, wherein this performance number is by calculating Z (n) and H m(n) sum of products on Frequency point obtains, so obtain M performance number parameter p m(m=0,1 ..., M-1);
(6) to P mask natural logarithm, and then do discrete cosine transform (DCT) by parameter transformation to cepstrum domain:
C k = &Sigma; j = 1 24 log ( P j ) cos [ k ( j - 1 2 ) &pi; 24 ] , k = 1,2 , . . . , P
{ C wherein kbe exactly the MFCC parameter of standard, the exponent number that P is MFCC, p jbe j performance number parameter, j is present frame;
3, the training of song signal identification
(1) speech data is comprised of Zhu Ibis song data, and the song of 10 Zhi Zhu Ibis of admission (two kinds of different songs of every Zhi Zhu Ibis admission) is divided into two groups, and one group of song is as training sample, and one group of song is as test sample book.
(2) with digital filter, respectively two groups of speech datas are carried out to pre-emphasis, then it is divided after frame windowing operation, Dui Zhu Ibis song signal carries out end-point detection, then calculates frame by frame the MFCC coefficient of song signal, and by its preservation.
(3) construct the wavelet neural network of three layers, initial network parameter is set: network structure is 72-14-10, learning rate is 0.8, and training error precision is 0.001.
(4) the phonetic feature coefficients by using wavelet neural network algorithm of every Zhi Zhu Ibis (all Zhu Ibis that comprise training sample and test sample book) is carried out to training network, by genetic algorithm, carry out the weights of optimization neural network simultaneously;
(5) setup parameter: population scale pop_size=100, crossover probability p c=0.6, variation Probability p m=0.01;
(6) produce at random one group of real-valued string population, each individuality consists of the initial weight of network;
(7) individuality in real-valued string is decoded, generate corresponding network structure, network structure is 72-14-10;
(8) operational network, according to the fitness value of following formula calculating colony individuality, evaluating network performance;
f = 1 E + 1
The output error that wherein E is wavelet neural network, n is training sample sum, y pwith be respectively P individual training sample output and actual output vector;
(9) according to each ideal adaptation degree, from colony, select two fitness maximums with Probability p c=0.6 carries out interlace operation, then with probability with variation Probability p m=0.01 carries out mutation operation, produces thus two new individualities.Adopt in this way, heredity produces the individuality that makes new advances successively, and so iteration, produces population of future generation, forms next generation network;
(10) until network convergence reach the training error precision 0.001 of expectation stops evolving, preserve optimum weights, Output rusults, the corresponding group network weights of each voice signal, otherwise repetitive operation step (7)~step (9);
(11) training finishes the training sample output matrix of all Zhu Ibis of rear acquisition song;
When a certain Zhu Ibis is carried out to individual identification, repeating step one, two, obtains speech characteristic parameter coefficient { C k; Then in the input layer of the neural network after training, input the speech characteristic parameter coefficient { C that second step extracts k; repeating step (5) to (10) obtains group network weights corresponding to Zhu Ibis song signal to be identified; call this network weight and calculate network output matrix, compare one by one with training sample output matrix, that voice signal of error minimum is recognition result.

Claims (5)

1. Zhu Ibis individual discrimination method based on MFCC algorithm, is characterized in that, comprises the following steps:
The training identification of A, song signal:
The song signal of the N Zhi Zhu Ibis of A1, admission, two kinds of different song signals of every Zhi Zhu Ibis admission; The song signal of the N Zhi Zhu Ibis of admission is divided into two groups, and one group of song is as training sample, and one group of song is as test sample book;
A2, with digital filter, respectively two groups of speech datas are carried out to pre-emphasis, then carry out end-point detection after dividing frame windowing operation to it, then calculate frame by frame the MFCC coefficient of song signal, and by its preservation;
A3, the structure wavelet neural network of three layers, arrange initial network parameter: network structure is 72-14-10, and learning rate is 0.8, and training error precision is 0.001;
A4, the phonetic feature coefficients by using wavelet neural network algorithm of every Zhi Zhu Ibis is carried out to training network, by genetic algorithm, carry out the weights of optimization neural network simultaneously;
A5, setup parameter: population scale pop_size=100, crossover probability p c=0.6, variation Probability p m=0.01;
A6, produce one group of real-valued string population at random, each individuality consists of the initial weight of network;
A7, the individuality in real-valued string is decoded, generate corresponding network structure, network structure is 72-14-10;
A8, operational network, according to the fitness value of following formula calculating colony individuality, evaluating network performance;
f = 1 E + 1
The output error that wherein E is wavelet neural network, n is training sample sum, y pwith be respectively P individual training sample output and actual output vector;
A9, according to each ideal adaptation degree, from colony, select two fitness maximums with Probability p ccarry out interlace operation, then with genetic probability p mcarry out mutation operation, produce thus two new individualities; Adopt in this way, heredity produces the individuality that makes new advances successively, and so iteration, produces population of future generation, forms next generation network;
A10, until network convergence reach the training error precision 0.001 of expectation stops evolving, preserves optimum weights, Output rusults, the corresponding group network weights of each voice signal, otherwise repetitive operation steps A 7~steps A 9;
A11, training finish the training sample output matrix of all Zhu Ibis of rear acquisition song;
B, Zhu Ibis song signal to be identified is carried out to pre-service;
C, to carrying out pretreated song signal, carry out the extraction of characteristic parameter: adopt MFCC characteristic parameter;
The speech characteristic parameter coefficient that in the input layer of D, the neural network after steps A training, input step C extracts, repeating step A5 to A10 obtains group network weights corresponding to Zhu Ibis song signal to be identified, call this network weight and calculate network output matrix, compare one by one with training sample output matrix, that voice signal of error minimum is recognition result.
2. according to claim 1 a kind ofly it is characterized in that based on MFCC algorithm Zhu Ibis individual discrimination method, step B specifically comprises the following steps:
B1, the song signal of the single Zhu Ibis gathering is carried out to pre-emphasis;
B2, by carrying out song signal after pre-emphasis, carry out windowing and divide frame;
B3, the song signal carrying out after windowing divides frame to process is carried out to end-point detection, thus starting point and the terminating point of judgement signal.
3. according to claim 1 a kind ofly it is characterized in that based on MFCC algorithm Zhu Ibis individual discrimination method,
Step B1 adopts zero of order 1 digital filter to realize pre-emphasis, and its form is:
H(z)=1-az -1,0.9<a<1
Wherein, H (z) is system function, and a is pre emphasis factor;
Step B2 carries out windowing and divides frame to process carrying out song signal after pre-emphasis, frame length 32ms, and frame moves 16ms, and window function adopts Hamming window, as follows:
W in formula (n) is window function, and N is length of window, gets 256;
When step B3 carries out end-point detection to Zhu Ibis song signal, employing is carried out end-point detection based on short-time energy and the end-point detection algorithm that average zero-crossing rate combines to signal, thus starting point and the terminating point of judgement signal; The short-time energy E of signal { x (n) } nbe defined as:
E n = &Sigma; m = - 8 &infin; [ x ( m ) w ( n - m ) ] 2
E nbe illustrated in n short-time energy of putting while starting windowed function of signal, n is sampled point superposition number, and m is the number of sampled point, and x (m) is voice signal to be measured, and w (n-m) is window function.
4. according to claim 1 a kind ofly it is characterized in that based on MFCC algorithm Zhu Ibis individual discrimination method, step C specifically comprises the following steps:
C1, calculating are composed X (n) through the discrete power of pretreated Zhu Ibis song signal: after having determined the sampling number of the every frame of signal, every frame voice sequence x (n) is done to discrete FFT conversion, then the squared discrete power spectrum X (n) that calculates Zhu Ibis song signal;
C2, utilize moving average filter w (l) to carry out smoothly, obtaining smoothly composing Y (n) to discrete power spectrum X (n), wherein w (l) as shown in the formula:
w ( l ) = N - | l | N 2 , - ( N - 1 ) &le; l &le; ( N - 1 )
N is the width of moving window, and l is current sampled value, and sample frequency is 8kHz, and FFT gets 512 points, N=13;
C3, get discrete power spectrum X (n) and smoothly compose the maximal value of the corresponding frequency of Y (n), obtaining Z (n), be i.e. Z (n)=max (X[n], Y[n]);
C4, repeat after C2, C3 process i time, the estimation using the Z (n) of finally acquisition as X (n) spectrum envelope, i gets 5 times;
Performance number parameter after M the bandpass filter that the final Z (n) obtaining of C5, obtaining step C4 combines by MFCC, IMFCC, MidMFCC, wherein this performance number is by calculating Z (n) and H m(n) sum of products on Frequency point obtains, so obtain M performance number parameter p m(m=0,1 ..., M-1);
C6, to P mask natural logarithm, and then do discrete cosine transform by parameter transformation to cepstrum domain:
C k = &Sigma; j = 1 24 log ( P j ) cos [ k ( j - 1 2 ) &pi; 24 ] , k = 1,2 , . . . , P
{ C wherein kbe exactly the MFCC parameter of standard, the exponent number that P is MFCC, p jbe j performance number parameter, j is present frame.
5. according to claim 3 a kind ofly it is characterized in that based on MFCC algorithm Zhu Ibis individual discrimination method, a gets 0.95,0.97 or 0.98.
CN201410338974.0A 2014-07-16 2014-07-16 Nipponia nippon individual recognition method based on MFCC algorithm Pending CN104102923A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410338974.0A CN104102923A (en) 2014-07-16 2014-07-16 Nipponia nippon individual recognition method based on MFCC algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410338974.0A CN104102923A (en) 2014-07-16 2014-07-16 Nipponia nippon individual recognition method based on MFCC algorithm

Publications (1)

Publication Number Publication Date
CN104102923A true CN104102923A (en) 2014-10-15

Family

ID=51671063

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410338974.0A Pending CN104102923A (en) 2014-07-16 2014-07-16 Nipponia nippon individual recognition method based on MFCC algorithm

Country Status (1)

Country Link
CN (1) CN104102923A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104392722A (en) * 2014-11-28 2015-03-04 电子科技大学 Sound based biotic population identification method and system
CN104616656A (en) * 2014-12-25 2015-05-13 西安建筑科技大学 Improved ABC (Artificial Bee Colony) algorithm based crested ibis chirp codebook design method
CN107369451A (en) * 2017-07-18 2017-11-21 北京市计算中心 A kind of birds sound identification method of the phenology research of auxiliary avian reproduction phase
CN107665712A (en) * 2017-09-06 2018-02-06 中国科学院声学研究所北海研究站 A kind of marine organisms recognition methods based on dynamic time warping
CN107731235A (en) * 2017-09-30 2018-02-23 天津大学 Sperm whale and the cry pulse characteristicses extraction of long fin navigator whale and sorting technique and device
CN108519149A (en) * 2018-03-28 2018-09-11 长安大学 A kind of tunnel accident monitor and alarm system and method based on sound Time-Frequency Analysis
CN108630228A (en) * 2017-03-20 2018-10-09 比亚迪股份有限公司 Sound quality recognition methods, device, system and vehicle
CN110033777A (en) * 2018-01-11 2019-07-19 深圳市诚壹科技有限公司 Birds recognition methods, device, terminal device and computer readable storage medium
CN110824006A (en) * 2019-11-08 2020-02-21 南通大学 Postweld weld impact quality discrimination method based on intelligent acoustic information identification
CN112750442A (en) * 2020-12-25 2021-05-04 浙江弄潮儿智慧科技有限公司 Nipponia nippon population ecosystem monitoring system with wavelet transformation and wavelet transformation method thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5638487A (en) * 1994-12-30 1997-06-10 Purespeech, Inc. Automatic speech recognition
CN103325382A (en) * 2013-06-07 2013-09-25 大连民族学院 Method for automatically identifying Chinese national minority traditional instrument audio data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5638487A (en) * 1994-12-30 1997-06-10 Purespeech, Inc. Automatic speech recognition
CN103325382A (en) * 2013-06-07 2013-09-25 大连民族学院 Method for automatically identifying Chinese national minority traditional instrument audio data

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
斯芸芸: "嵌入式语音识别系统的设计与实现", 《中国优秀硕学位论文全文数据库 信息科技辑》 *
斯芸芸等: "基于遗传算法和小波神经网络的语音识别研究", 《微型机与应用》 *
熊伟等: "语音识别的MFCC算法研究", 《现代商贸工业》 *
袁正午等: "改进的混合MFCC语音识别算法研究", 《计算机工程与应用》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104392722A (en) * 2014-11-28 2015-03-04 电子科技大学 Sound based biotic population identification method and system
CN104616656A (en) * 2014-12-25 2015-05-13 西安建筑科技大学 Improved ABC (Artificial Bee Colony) algorithm based crested ibis chirp codebook design method
CN104616656B (en) * 2014-12-25 2018-06-12 西安建筑科技大学 It is a kind of based on improve ABC algorithm Zhu Ibis song Codebook Design methods
CN108630228A (en) * 2017-03-20 2018-10-09 比亚迪股份有限公司 Sound quality recognition methods, device, system and vehicle
CN107369451A (en) * 2017-07-18 2017-11-21 北京市计算中心 A kind of birds sound identification method of the phenology research of auxiliary avian reproduction phase
CN107665712A (en) * 2017-09-06 2018-02-06 中国科学院声学研究所北海研究站 A kind of marine organisms recognition methods based on dynamic time warping
CN107731235B (en) * 2017-09-30 2023-09-26 天津大学 Method and device for extracting and classifying sound pulse characteristics of sperm whales and long fin pilot whales
CN107731235A (en) * 2017-09-30 2018-02-23 天津大学 Sperm whale and the cry pulse characteristicses extraction of long fin navigator whale and sorting technique and device
CN110033777A (en) * 2018-01-11 2019-07-19 深圳市诚壹科技有限公司 Birds recognition methods, device, terminal device and computer readable storage medium
CN108519149A (en) * 2018-03-28 2018-09-11 长安大学 A kind of tunnel accident monitor and alarm system and method based on sound Time-Frequency Analysis
CN110824006B (en) * 2019-11-08 2021-12-28 南通大学 Postweld weld impact quality discrimination method based on intelligent acoustic information identification
CN110824006A (en) * 2019-11-08 2020-02-21 南通大学 Postweld weld impact quality discrimination method based on intelligent acoustic information identification
CN112750442A (en) * 2020-12-25 2021-05-04 浙江弄潮儿智慧科技有限公司 Nipponia nippon population ecosystem monitoring system with wavelet transformation and wavelet transformation method thereof
CN112750442B (en) * 2020-12-25 2023-08-08 浙江弄潮儿智慧科技有限公司 Crested mill population ecological system monitoring system with wavelet transformation and method thereof

Similar Documents

Publication Publication Date Title
CN104102923A (en) Nipponia nippon individual recognition method based on MFCC algorithm
CN112509564B (en) End-to-end voice recognition method based on connection time sequence classification and self-attention mechanism
CN103236260B (en) Speech recognition system
CN101599271B (en) Recognition method of digital music emotion
CN105488466B (en) A kind of deep-neural-network and Acoustic Object vocal print feature extracting method
CN104217722B (en) A kind of dolphin whistle signal time-frequency spectrum contour extraction method
CN103531205A (en) Asymmetrical voice conversion method based on deep neural network feature mapping
CN102486922B (en) Speaker recognition method, device and system
CN105023580A (en) Unsupervised noise estimation and speech enhancement method based on separable deep automatic encoding technology
CN107293306B (en) A kind of appraisal procedure of the Objective speech quality based on output
CN106531174A (en) Animal sound recognition method based on wavelet packet decomposition and spectrogram features
CN102779510A (en) Speech emotion recognition method based on feature space self-adaptive projection
CN104485103A (en) Vector Taylor series-based multi-environment model isolated word identifying method
Mallidi et al. Novel neural network based fusion for multistream ASR
CN102982351A (en) Porcelain insulator vibrational acoustics test data sorting technique based on back propagation (BP) neural network
Himawan et al. Deep Learning Techniques for Koala Activity Detection.
CN107731235B (en) Method and device for extracting and classifying sound pulse characteristics of sperm whales and long fin pilot whales
CN109308903A (en) Speech imitation method, terminal device and computer readable storage medium
CN109741759B (en) Acoustic automatic detection method for specific bird species
CN113111786A (en) Underwater target identification method based on small sample training image convolutional network
CN114565828A (en) Feature countermeasure enhancement underwater target recognition method based on acoustic embedded memory space encoder model
CN110211568A (en) A kind of audio recognition method and device
Zhang et al. A hybrid speech recognition training method for hmm based on genetic algorithm and baum welch algorithm
CN102610234A (en) Method for selectively mapping signal complexity and code rate
CN113948067B (en) Voice countercheck sample repairing method with hearing high fidelity characteristic

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20141015