高级检索
    谢将剑, 李文彬, 张军国, 丁长青. 基于Chirplet语图特征和深度学习的鸟类物种识别方法[J]. 北京林业大学学报, 2018, 40(3): 122-127. DOI: 10.13332/j.1000-1522.20180008
    引用本文: 谢将剑, 李文彬, 张军国, 丁长青. 基于Chirplet语图特征和深度学习的鸟类物种识别方法[J]. 北京林业大学学报, 2018, 40(3): 122-127. DOI: 10.13332/j.1000-1522.20180008
    Xie Jiangjian, Li Wenbin, Zhang Junguo, Ding Changqing. Bird species recognition method based on Chirplet spectrogram feature and deep learning[J]. Journal of Beijing Forestry University, 2018, 40(3): 122-127. DOI: 10.13332/j.1000-1522.20180008
    Citation: Xie Jiangjian, Li Wenbin, Zhang Junguo, Ding Changqing. Bird species recognition method based on Chirplet spectrogram feature and deep learning[J]. Journal of Beijing Forestry University, 2018, 40(3): 122-127. DOI: 10.13332/j.1000-1522.20180008

    基于Chirplet语图特征和深度学习的鸟类物种识别方法

    Bird species recognition method based on Chirplet spectrogram feature and deep learning

    • 摘要:
      目的深度学习在鸟类物种识别的应用是目前的研究热点,为了进一步提高识别效果,提出一种基于鸟鸣声的Chirplet语图特征和深度卷积神经网络的鸟类物种识别方法。
      方法引入线性调频小波变换(Chirplet transform,CT)计算鸟鸣声信号的语图,输入深度卷积神经网络VGG16模型中,通过对语图进行分类实现鸟类物种的识别。以北京市松山国家自然保护区实地采集的18种鸟类为研究对象,利用Chirplet变换、短时傅里叶变换(short-time fourier transform,STFT)和梅尔频率倒谱变换(Mel frequency cepstrum transform,MFCT)计算得到3个不同的语图样本集,对比分别采用不同的语图样本集作为输入时鸟类物种识别模型的性能。
      结果结果表明:Chirplet语图作为输入时,测试集的平均识别准确率(mean average precision,MAP)达到0.9871,相对于其他两种输入,得到了更高的MAP值,而且在训练时达到最大MAP值的迭代次数最小。
      结论采用不同的语图特征作为输入,直接影响深度学习模型的分类性能。本文计算的Chirplet语图的鸣声区域相比STFT语图和Mel语图更为集中,特征更明显。因此,Chirplet语图更适合于基于VGG16模型的鸟类物种识别,可以得到更高的MAP值和更快的识别效率。

       

      Abstract:
      ObjectiveThe application of deep learning in bird species recognition is the research hotspot at present. To improve the performance of recognition, a bird species recognition method based on Chirplet spectrogram feature and VGG16 model was proposed.
      MethodAcoustic signal spectrograms were calculated by the Chirplet transform firstly, then spectrograms were inputted in the VGG16 model to realize the recognition of bird species. Taking eighteen bird species in Beijing Songshan National Nature Reserve as examples, through Chirplet transform, Fourier transform and Mel cepstrum transform, three spectrogram sample sets were calculated respectively, then using three kinds of spectrogram sample sets to train the recognition model, the performances of each input were compared.
      ResultResults showed that with the Chirplet diagram input, the highest mean average precision (MAP) of the test set was 0.9871 compared with the other two inputs. Also, the epochs of the highest trainning MAP was the smallest.
      ConclusionThe choice of input affects the classification performance of deep learning model. The vocalization zone of Chirplet spectrogram is more concentrate and obvious than STFT spectrogram and Mel spectrogram, which means Chirplet spectrogram is more suitable for the bird recognition based on VGG16 model, higher MAP and efficiency of recognition can be achieved.

       

    /

    返回文章
    返回