高级检索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于Chirplet语图特征和深度学习的鸟类物种识别方法

谢将剑 李文彬 张军国 丁长青

谢将剑, 李文彬, 张军国, 丁长青. 基于Chirplet语图特征和深度学习的鸟类物种识别方法[J]. 北京林业大学学报, 2018, 40(3): 122-127. doi: 10.13332/j.1000-1522.20180008
引用本文: 谢将剑, 李文彬, 张军国, 丁长青. 基于Chirplet语图特征和深度学习的鸟类物种识别方法[J]. 北京林业大学学报, 2018, 40(3): 122-127. doi: 10.13332/j.1000-1522.20180008
Xie Jiangjian, Li Wenbin, Zhang Junguo, Ding Changqing. Bird species recognition method based on Chirplet spectrogram feature and deep learning[J]. Journal of Beijing Forestry University, 2018, 40(3): 122-127. doi: 10.13332/j.1000-1522.20180008
Citation: Xie Jiangjian, Li Wenbin, Zhang Junguo, Ding Changqing. Bird species recognition method based on Chirplet spectrogram feature and deep learning[J]. Journal of Beijing Forestry University, 2018, 40(3): 122-127. doi: 10.13332/j.1000-1522.20180008

基于Chirplet语图特征和深度学习的鸟类物种识别方法

doi: 10.13332/j.1000-1522.20180008
基金项目: 

中央高校基本科研业务费专项 2017JC14

国家重点研发项目 2017YFC1403503

详细信息
    作者简介:

    谢将剑,博士,讲师。主要研究方向:林业信息监测、信号处理、模式识别。Email : shyneforce@bjfu.edu.cn 地址:100083北京市海淀区清华东路35号北京林业大学工学院

    责任作者:

    李文彬,博士,教授。主要研究方向:森林环境与信息监测。Email: leewb@bjfu.edu.cn 地址:同上

    丁长青,博士,教授。主要研究方向:鸟类学。Email: cqding@bjfu.edu.cn 地址:100083北京市海淀区清华东路35号北京林业大学自然保护区学院

  • 中图分类号: TP181

Bird species recognition method based on Chirplet spectrogram feature and deep learning

  • 摘要: 目的深度学习在鸟类物种识别的应用是目前的研究热点,为了进一步提高识别效果,提出一种基于鸟鸣声的Chirplet语图特征和深度卷积神经网络的鸟类物种识别方法。方法引入线性调频小波变换(Chirplet transform,CT)计算鸟鸣声信号的语图,输入深度卷积神经网络VGG16模型中,通过对语图进行分类实现鸟类物种的识别。以北京市松山国家自然保护区实地采集的18种鸟类为研究对象,利用Chirplet变换、短时傅里叶变换(short-time fourier transform,STFT)和梅尔频率倒谱变换(Mel frequency cepstrum transform,MFCT)计算得到3个不同的语图样本集,对比分别采用不同的语图样本集作为输入时鸟类物种识别模型的性能。结果结果表明:Chirplet语图作为输入时,测试集的平均识别准确率(mean average precision,MAP)达到0.9871,相对于其他两种输入,得到了更高的MAP值,而且在训练时达到最大MAP值的迭代次数最小。结论采用不同的语图特征作为输入,直接影响深度学习模型的分类性能。本文计算的Chirplet语图的鸣声区域相比STFT语图和Mel语图更为集中,特征更明显。因此,Chirplet语图更适合于基于VGG16模型的鸟类物种识别,可以得到更高的MAP值和更快的识别效率。

     

  • 图  1  识别模型结构

    Figure  1.  Structure of recognition model

    图  2  典型的鸣声语图

    Figure  2.  Typical spectrogram

    图  3  识别模型的训练和验证

    Figure  3.  Modeling flow of recognition model

    图  4  损失函数值随迭代次数的变化

    Figure  4.  Variation of loss with epochs increasing

    图  5  平均识别准确率随迭代次数的变化

    Figure  5.  Variation of MAP with epochs increasing

    表  1  18种鸟类鸣声信号的信息

    Table  1.   Vocalisation signal details of 18 kinds of bird

    目Order 科Family 种Species 时间
    Time/s
    语图数Size of spectrogram feature
    鸡形目Galloformes 雉科Phasianidae 环颈雉Phasianus colchicus 12 19
    鹃形目Cuculiformes 杜鹃科Cuculidae 四声杜鹃Cuculus micropterus 13 20
    中杜鹃Cuculus saturatus 52 87
    鹰鹃Cuculus sparverioides 34 90
    雀形目Passeriformes 鸦科Corvidae 大嘴乌鸦Corvus macrorhynchos 27 36
    红嘴蓝鹊Urocissa erythroryncha 96 199
    鸫科Turdidae 北红尾鸲Phoenicurus auroreus 37 89
    鹟科Muscicapidae 白眉姬鹟Ficedula zanthopygia 61 66
    黄眉姬鹟Ficedula narcissina 82 144
    绿背姬鹟Ficedula elisae 49 99
    山雀科Paridae 大山雀Parus major 54 100
    沼泽山雀Parus palustris 33 47
    褐头山雀Parus montanus 38 72
    黄腹山雀Parus venustulus 26 63
    䴓科Sittidae 黑头䴓 Sitta villosa 29 80
    普通䴓 Sitta europaea 36 131
    鹀科Emberizidae 灰眉岩鹀Emberiza godlewskii 23 57
    黄喉鹀Emberiza elegans 71 134
    下载: 导出CSV

    表  2  计算参数

    Table  2.   Calculation parameters

    参数类型
    Parameter type
    值或方法
    Value or method
    初始化Initialization 正态分布的随机初始化Random initialization of normal distribution
    优化算法Optimizer Adam
    学习率Learning rate 0.001
    损失函数Loss function 交叉熵函数Cross entropy function
    下载: 导出CSV

    表  3  不同输入时的MAP

    Table  3.   MAP with different inputs

    语图
    Spectrogram
    最大MAP(验证)
    Max MAP (validation)
    MAP(测试
    )MAP (test)
    Chirplet语图
    Chirplet spectrogram
    0.999 5 0.987 1
    梅尔语图
    Mel spectrogram
    0.973 3 0.942 1
    STFT语图
    STFT spectrogram
    0.950 1 0.896 2
    下载: 导出CSV
  • [1] 范宗骥, 董大颖, 郑然, 等.北京静福寺侧柏古树林鸟类群落多样性研究[J].北京林业大学学报, 2013, 35(5):46-55. http://j.bjfu.edu.cn/article/id/9946

    Fan Z J, Dong D Y, Zheng R, et al. Avian community diversity in Platycladus orientalis ancient trees at the Jingfu Temple in Beijing[J]. Journal of Beijing Forestry University, 2013, 35(5): 46-55. http://j.bjfu.edu.cn/article/id/9946
    [2] Green S, Marler P. The analysis of animal communication[M]. New York: Springer US, 1979.
    [3] Xia C, Huang R, Wei C, et al. Individual identification on the basis of the songs of the Asian stubtail (Urosphena squameiceps)[J]. Chinese Birds, 2011, 2(3):132-139. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=zgnl201103003
    [4] Tan L N, Abeer A, George K, et al. Dynamic time warping and sparse representation classification for birdsong phrase classification using limited training data[J]. Journal of the Acoustical Society of America, 2015, 137(3): 1069-1080. doi: 10.1121/1.4906168
    [5] Lee C H, Hsu S B, Shih J L, et al. Continuous birdsong recognition using gaussian mixture modeling of image shape features[J]. IEEE Transactions on Multimedia, 2012, 15(2): 454-464. http://cn.bing.com/academic/profile?id=e4bbf99759b51b973a3e5c45e7dd4003&encoded=0&v=paper_preview&mkt=zh-cn
    [6] Kalan A K, Mundry R, Wagner O J J, et al. Towards the automated detection and occupancy estimation of primates using passive acoustic monitoring[J]. Ecological Indicators, 2015, 54: 217-226. doi: 10.1016/j.ecolind.2015.02.023
    [7] Stowell D, Plumbley M D. Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning[J]. PeerJ, 2014, 2(4): 1-24. http://cn.bing.com/academic/profile?id=c20d2857a0134c74d381699d4fe15859&encoded=0&v=paper_preview&mkt=zh-cn
    [8] 程金魁.基于鸣声的鸟类物种个体识别及鸣声关系分析[D].北京: 中国科学院大学, 2012.

    Cheng J K. Automatic bird species and individual recognition and the analysis of bird vocalizations[D]. Beijing: University of Chinese Academy of Sciences, 2012.
    [9] Koops H V, van Baben J, Wiering F, et al. A deep neural network approach to the LifeCLEF 2014 bird task[J]. CLEF Working Notes, 2014, 1180:1-9. http://cn.bing.com/academic/profile?id=0bb5fd3074758e9e3c0d71db28b2cf5c&encoded=0&v=paper_preview&mkt=zh-cn
    [10] Piczak K J. Recognizing bird species in audio recordings using deep convolutional neural networks[J]. CLEF Working Notes, 2016, 1609: 534-543. http://cn.bing.com/academic/profile?id=1daede1d019bd15cd65f166b76e64554&encoded=0&v=paper_preview&mkt=zh-cn
    [11] TÓth B P, Czeba B. Convolutional neural networks for large-scale bird song classification in noisy environment[C]. Évora, Portugal: Conference and Labs of the Evaluation Forum, 2016: 1-9.
    [12] 张帅, 淮永建.基于分层卷积深度学习系统的植物叶片识别研究[J].北京林业大学学报, 2016, 38(9):108-115. doi: 10.13332/j.1000-1522.20160035

    Zhang S, Huai Y J. Leaf image recognition based on layered convolutions neural network deep learning[J]. Journal of Beijing Forestry University, 2016, 38(9):108-115. doi: 10.13332/j.1000-1522.20160035
    [13] 刘念, 阚江明.基于多特征融合和深度信念网络的植物叶片识别[J].北京林业大学学报, 2016, 38(3):110-119. doi: 10.13332/j.1000-1522.20150267

    Liu N, Kan J M. Plant leaf identification based on the multi feature fusion and deep belief networks method[J]. Journal of Beijing Forestry University, 2016, 38(3):110-119. doi: 10.13332/j.1000-1522.20150267
    [14] Chen C, Liu M, Liu H, et al. Multi-temporal depth motion maps-based local binary patterns for 3-D human action recognition[J]. IEEE Access, 2017, 5:22590-22604. doi: 10.1109/ACCESS.2017.2759058
    [15] 周飞燕, 金林鹏, 董军, 卷积神经网络研究综述[J].计算机学报, 2017, 40 (7): 1-23. http://d.old.wanfangdata.com.cn/Periodical/jsjxb201706001

    Zhou F Y, Jin L P, Dong J. Review of convolutional neural network journal of computer applications[J]. Chinese Journal of Computers, 2017, 40 (7): 1-23. http://d.old.wanfangdata.com.cn/Periodical/jsjxb201706001
    [16] Hou R, Chen C, Shah M. Tube convolutional neural network (T-CNN) for action detection in videos[J]. IEEE International Conference on Computer Vision, 2017: 1-11. http://cn.bing.com/academic/profile?id=6a4f72d6728bcc8ce81219f3b6718b07&encoded=0&v=paper_preview&mkt=zh-cn
    [17] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv, 2014: 1-14. http://cn.bing.com/academic/profile?id=9a83dddfc646cd21a3e38737d303a369&encoded=0&v=paper_preview&mkt=zh-cn
    [18] Zou J, Li W, Chen C, et al. Scene classification using local and global features with collaborative representation fusion[J]. Information Sciences, 2016, 348:209-226. doi: 10.1016/j.ins.2016.02.021
    [19] Triantafyllidou D, Nousi P, Tefas A. Fast deep convolutional face detection in the wild exploiting hard sample mining[J]. Big Data Research, 2017, 3:1-24. http://cn.bing.com/academic/profile?id=3c1228dfffdc126a1ea8dc2633aedfd0&encoded=0&v=paper_preview&mkt=zh-cn
    [20] Uricchio T, Ballan L, Seidenari L, et al. Automatic image annotation via label transfer in the semantic space[J]. Pattern Recognition, 2017, 6: 1-15. http://cn.bing.com/academic/profile?id=423cea00e0ba24bcdb4bcf40c1cf3ce9&encoded=0&v=paper_preview&mkt=zh-cn
    [21] Bultan A. A four-parameter atomic decomposition of Chirplets[J]. IEEE Transactions on Signal Processing, 2002, 47(3):731-745. http://cn.bing.com/academic/profile?id=a00d5e651922ddf7fe4864eb641d7751&encoded=0&v=paper_preview&mkt=zh-cn
    [22] Glotin H, Ricard J, Balestriero R. Fast Chirplet transform to enhance CNN machine listening-validation on animal calls and speech[J]. arXiv, 2017: 1-22. http://cn.bing.com/academic/profile?id=0dcb90d45913b47851b3f80464eb30e6&encoded=0&v=paper_preview&mkt=zh-cn
    [23] Potamitis I, Ntalampiras S, Jahn O, et al. Automatic bird sound detection in long real-field recordings: applications and tools[J]. Applied Acoustics, 2014, 80(4): 1-9. http://cn.bing.com/academic/profile?id=ec7738dfaa30c2c81477679e08f86bf9&encoded=0&v=paper_preview&mkt=zh-cn
  • 加载中
图(5) / 表(3)
计量
  • 文章访问数:  1800
  • HTML全文浏览量:  396
  • PDF下载量:  84
  • 被引次数: 0
出版历程
  • 收稿日期:  2018-01-05
  • 修回日期:  2018-01-17
  • 刊出日期:  2018-03-01

目录

    /

    返回文章
    返回