Application of ANN-BiLSTM model to long-term gap-filling of carbon flux data in temperate desert shrub
-
摘要:
目的 为提高净生态系统碳交换量(NEE)在长期缺失下的插补精度,利用人工神经网络(ANN)和双向长短期记忆网络(Bi-LSTM)将NEE的环境因子和时序特征相结合,提出了ANN-BiLSTM模型。 方法 以宁夏盐池观测站NEE数据及微气象数据为研究对象,通过随机剔除连续7、15、30、45和90 d的5类缺失情景来评估ANN-BiLSTM模型、随机森林(RF)、人工神经网络(ANN)、K最邻近(KNN)、支持向量回归(SVR)和边际分布采样法(MDS)在NEE长期缺失下的插值结果。 结果 当NEE缺失天数≤30 d时,各模型的插值精度相对可靠,ANN-BiLSTM模型的插值精度最高,决定系数(R2)均值在0.48 ~ 0.56之间,均方根误差(RMSE)和平均绝对误差(MAE)分别在0.68 ~ 1.92 μmol/(m2·s)、0.45 ~ 1.3 μmol/(m2·s)之间。当数据缺失天数 ≥ 45 d时,MDS不能对缺失值进行处理,ANN-BiLSTM模型的插值精度明显高于机器学习模型,R2均值 > 0.45,RMSE和MAE分别在0.79 ~ 1.95 μmol/(m2·s)、0.50 ~ 1.32 μmol/(m2·s)之间。 结论 当温带荒漠灌丛生态系统的NEE数据缺失长度 > 30 d时,建议应用ANN-BiLSTM模型对缺失数据进行插补,可以在一定程度上提高NEE长期插值结果的精度。 -
关键词:
- 碳通量 /
- ANN-BiLSTM /
- 机器学习 /
- 长期插值
Abstract:Objective In order to improve the gap-filling accuracy of net ecosystem productivity(NEE) under long-term missing, this study used the Artificial Neural Network(ANN) and Bi-directional Long Short-Term Memory(Bi-LSTM) to combine the environmental factors and temporal characteristics of NEE, proposing the ANN-BiLSTM model. Method This study took the NEE data and micro-meteorological data of Yanchi Observatory in Ningxia of northwestern China as the research object, and evaluated the gap-filling results of the ANN-BiLSTM model, Random Forest(RF), ANN, K-Nearest Neighbor(KNN), Support Vector Regression(SVR) and Marginal Distribution Sampling(MDS) under long-term absence of NEE by randomly eliminating five kinds of missing scenarios for 7, 15, 30, 45 and 90 d. Result When the number of missing days was ≤ 30 d, the gap-filling accuracy of each model was relatively reliable. The ANBiLSTM model had the highest gap-filling accuracy. The mean coefficient of determination (R2) was 0.48−0.56. The root mean squares of errors (RMSE) and mean absolute error(MAE) were 0.68−1.92 μmol/(m2·s) and 0.45−1.3 μmol/(m2·s). When the missing data days were ≥ 45 d, MDS cannot process missing values. The gap-filling accuracy of ANN-BiLSTM model was significantly higher than machine learning. The mean value of R2 > 0.45, RMSE and MAE were 0.79−1.95 μmol/(m2·s) and 0.50−1.31 μmol/(m2·s). Conclusion When the length of missing NEE data in temperate desert shrub ecosystems is > 30 d, we suggest to use ANN-BiLSTM to interpolate the missing data, which can improve the accuracy of long-term NEE gap-filling results to a certain extent. -
Key words:
- carbon flux /
- ANN-BiLSTM /
- machine learning /
- long-term gap-filling
-
图 1 ANN结构图
该图引自文献[36]。Xn.输入因子;Ta. 空气温度;Ts. 土壤温度;Par. 光合有效辐射;VPD. 饱和水汽压。f(·)表示激活函数。The figure is from literature[36]. Input factor; Ta, air temperature; Ts, soil temperature; Par, photosynthetically active radiation; VPD, vapor pressure deficit. f(·) expresses activation function.
Figure 1. Diagram of ANN structure
图 2 神经元模型示意图
该图引自文献[36]。$ {\theta }_{j} $为第j个神经元的阈值,$ {\omega }_{ij} $为第i个输入信号与第j个神经元之间的连接权值, $ {\omega }_{nj} $为第n个输入信号与第j个神经元之间的连接权值,$ {y}_{j} $为输出结果。The figure is cited from literature[36]. $ {\theta }_{j} $ is the threshold of the neuron.$ {\omega }_{ij} $ is the connection weight between the input signal of i and the neuron of j. $ {\omega }_{nj} $ is the connection weight between the input signal of n and the neuron of j. $ {y}_{j} $ is the output.
Figure 2. Schematic diagram of neuron model
图 3 LSTM单元结构
该图引自文献[37]。$ {x}_{t} $为t时刻NEE的输入值;$ {c}_{t} $为t时刻记忆单元状态;$ {c}_{t-1} $为t−1时刻记忆单元状态;σ为sigmoid函数;ht为t时刻输出;tanh为神经网络中的激活函数。The figure is cited from literature[37]. $ {x}_{t} $ is the input value of NEE at timet. $ {c}_{t} $ is the state of memory unit at time t. $ {c}_{t-1} $ is the state of memory unit at time t−1. σ is sigmoid function. $ {h}_{t} $ is the current output; tanh is activation function of neural network.
Figure 3. Unit construction of LSTM
图 5 6种NEE插补模型在不同缺失天数下的插补效果对比
NEE.生态系统碳交换量(μmol/(m2·s),以CO2物质的量计);RMSE. 插补所得NEE与实测NEE的均方根误差;MAE. 平均绝对误差;R2. 决定系数。图中箱体内的横线为插值结果的中位线,●为插值结果的均值,◆为插值结果中的异常值。箱体的上边缘线为第三分位数(Q3),下边缘线为第一分位数(Q1),Q3与Q1的差距为四分位间距(IQR)。箱体外的上横线代表插值结果的最大值(Q3 + 1.5IQR),下横线代表插值结果的最小值(Q1-1.5IQR)。下同。NEE, net ecosystem carbon exchange(μmol/(m2·s), based on the amount of CO2 substances); RMSE, root mean squared error between measured NEE and predicted NEE; MAE, mean absolute error; R2, coefficient of determination. The horizontal line in the box is the median line of gap-filling result, ● is the mean value of gap-filling result, ◆ is the outliers of gap-filling result. The upper edge line of the box body is the third quantile (Q3), the lower edge line is the first quantile (Q1), and the gap between Q3 and Q1 is interquartile range (IQR). The upper horizontal line on the outside of the box body represents the maximum value of the gap-filling result (Q3 + 1.5IQR), and the lower horizontal line represents the minimum value of the gap-filling result (Q1-1.5IQR). The same as below.
Figure 5. Comparison of gap-filling effects of six NEE gap-filling models under different missing days
表 1 净生态系统交换量和环境因子缺失数据统计
Table 1. Statistical analysis of missing data of net ecosystem exchange and environmental factors
季节
Season数据总数/条
Total number of data/piece有效数据/条
Valid data/piece缺失数据/条
Missing data/piece缺失率
Rate of loss/%春季 Spring 7 171 5 129 2 042 28.5 夏季 Summer 8 832 6 316 2 516 28.5 秋季 Autumn 8 736 5 268 3 468 39.7 冬季 Winter 5 808 4 166 1 642 28.3 表 2 五类数据缺失情景
Table 2. Description of five gap scenarios
缺失片段长度
Missing fragment length/d单位缺失片段数据条数
Number of missing pieces of data per unit缺失片段重复次数
Number of repeats in unit gap总缺失片段数据条数
Total number of missing pieces of data7 336 12 4 032 15 720 6 4 320 30 1 440 3 4 320 45 2 160 2 4 320 90 4 320 1 4 320 表 3 5种模型主要超参数设置
Table 3. Primary hyper-parameter setting of five models
模型 Model 超参数 Hyperparameter 含义 Meaning 设定值 Setting value ANN hidden_layer_sizes 隐藏层节点数量 Number of hidden nodes 11 activation 激活函数 Activation function sigmoid solver 权重优化器 Weight optimizer Adam learning_rate 学习率 Learning rate 0.095 batch_size 批量大小 Bach size 64 epoch 迭代次数 Iteration 1 000 Bi-LSTM hidden_layer_sizes 隐藏层节点数量 Number of hidden nodes 20、6 activation 激活函数 Activation function linear、relu solver 权重优化器 Weight optimizer Adam learning_rate 学习率 Learning rate 0.088 5 batch_size 批量大小 Bach size 64 epoch 迭代次数 Iteration 500 随机森林Randomforest(RF) n_estimators 决策树数量 Number of decision trees 193 max_depth 树的最大深度 Maximum depth of the tree 35 max_features 最大特征个数 Maximum number of features auto K最邻近K-nearestneighbor(KNN) n_neighbors K值 K value 4 weights 样本权重 Sample weight distance p 距离度量 Distance measure 1 支持向量回归Support vector regression(SVR) kernel 核函数 Kernel function rbf C 惩罚系数 Penalty coefficient 1 表 4 ANN-BiLSTM模型在不同季节下的插值结果
Table 4. Gap-filling results of ANN-BiLSTM model under different seasons
季节
SeasonRMSE/(μmol·m−2·s−1) MAE/(μmol·m−2·s−1) R2 春季 Spring 1.25 0.74 0.29 夏季 Summer 1.85 1.31 0.58 秋季 Autumn 1.18 0.72 0.46 冬季 Winter 0.64 0.39 0.00 注:RMSE、MAE和R2为均值结果。Notes: RMSE, MAE and R2 are the mean results. -
[1] Baldocchi D. How eddy covariance flux measurements have contributed to our understanding of global change biology[J]. Global Change Biology, 2020, 26: 242−260. doi: 10.1111/gcb.14807 [2] 陈世苹, 游翠海, 胡中民, 等. 涡度相关技术及其在陆地生态系统通量研究中的应用[J]. 植物生态学报, 2020, 44(4): 291−304. doi: 10.17521/cjpe.2019.0351Chen S P, You C H, Hu Z M, et al. Eddy covariance technique and its applications in flux observations of terrestrial ecosystems[J]. Chinese Journal of Plant Ecology, 2020, 44(4): 291−304. doi: 10.17521/cjpe.2019.0351 [3] Chu H S, Luo X Z, Ouyang Z T, et al. Representativeness of eddy-covariance flux footprints for areas surrounding AmeriFlux sites[J/OL]. Agricultural and Forest Meteorology, 2021, 301/302: 108350[2022−12−10]. https://doi.org/10.1016/j.agrformet.2021.108350. [4] Richardson A D, Braswell B H, Hollinger D Y, et al. Comparing simple respiration models for eddy flux and dynamic chamber data[J]. Agricultural and Forest Meteorology, 2006, 141(2): 219−234. [5] 苏荣瑞, 刘凯文, 耿一风, 等. 江汉平原稻−油连作系统冠层CO2通量变化特征[J]. 中国农业气象, 2012, 33(3): 362−367.Su R R, Liu K W, Geng Y F, et al. CO2 flux variation over canopy rice rape succession system in Jianghan Plain[J]. Chinese Journal of Agrometeorology, 2012, 33(3): 362−367. [6] 徐小军, 周国模, 杜华强, 等. 缺失数据插补方法及其参数估计窗口大小对毛竹林CO2通量估算的影响[J]. 林业科学, 2015, 51(9): 141−149.Xu X J, Zhou G M, Du H Q, et al. Effects of interpolation and window sizes in Phyllostachys edulis forest for parameter estimation on calculation of CO2 flux[J]. Scientia Silvae Sinicae, 2015, 51(9): 141−149. [7] Foltynová L, Fischer M, McGloin R P. Recommendations for gap-filling eddy covariance latent heat flux measurements using marginal distribution sampling[J]. Theoretical and Applied Climatology, 2020, 139(1): 677−688. [8] Lucas-Moffat A M, Schrader F, Herbst M, et al. Multiple gap-filling for eddy covariance datasets[J/OL]. Agricultural and Forest Meteorology, 2022, 325[2022−12−10]. https://doi.org/10.1016/j.agrformet.2022.109114. [9] Safa B, Arkebauer T J, Zhu Q, et al. Net ecosystem exchange (NEE) simulation in maize using artificial neural networks[J/OL]. Ifac Journal of Systems & Control, 2019, 7: 100036[2022−12−10]. https://doi.org/10.1016/j.ifacsc.2019.100036. [10] Teng D X, He X M, Wang J Z, et al. Uncertainty in gap filling and estimating the annual sum of carbon dioxide exchange for the desert Tugai forest, Ebinur Lake Basin, Northwest China[J/OL]. PeerJ, 2020, 8[2022−12−10]. https://doi.org/10.7717/peerj.8530. [11] Soloway A D, Amiro B D, Dunn A L, et al. Carbon neutral or a sink? Uncertainty caused by gap-filling long-term flux measurements for an old-growth boreal black spruce forest[J]. Agricultural and Forest Meteorology, 2017, 233: 110−121. doi: 10.1016/j.agrformet.2016.11.005 [12] Du Q, Liu H Z, Feng J W, et al. Effects of different gap filling methods and land surface energy balance closure on annual net ecosystem exchange in a semiarid area of China[J]. Science China Earth Sciences, 2014, 57(6): 1340−1351. doi: 10.1007/s11430-013-4756-5 [13] Zhu S Y, McCalmont J, Cardenas L M, et al. Gap-filling carbon dioxide, water, energy, and methane fluxes in challenging ecosystems: comparing between methods, drivers, and gap-lengths[J/OL]. Agricultural and Forest Meteorology, 2023, 332[2022−12−10]. https://doi.org/10.1016/j.agrformet.2023.109365. [14] Falge E, Baldocchi D, Olson R, et al. Gap filling strategies for defensible annual sums of net ecosystem exchange[J]. Agricultural and Forest Meteorology, 2001, 107(1): 43−69. doi: 10.1016/S0168-1923(00)00225-2 [15] Falge E, Baldocchi D, Olson R, et al. Gap filling strategies for long term energy flux data sets[J]. Agricultural and Forest Meteorology, 2001, 107(1): 71−77. doi: 10.1016/S0168-1923(00)00235-5 [16] Zhao X, Huang Y. A comparison of three gap filling techniques for eddy covariance net carbon fluxes in short vegetation ecosystems[J]. Advances in Meteorology, 2015: 1−12. [17] Hui D, Wan S Q, Su B, et al. Gap-filling missing data in eddy covariance measurements using multiple imputation(MI) for annual estimations[J]. Agricultural and Forest Meteorology, 2004, 121(1−2): 93−111. doi: 10.1016/S0168-1923(03)00158-8 [18] Reichstein M, Falge E, Baldocchi D, et al. On the separation of net ecosystem exchange into assimilation and ecosystem respiration: review and improved algorithm[J]. Global Change Biology, 2005, 11(9): 1424−1439. doi: 10.1111/j.1365-2486.2005.001002.x [19] Kang M, Ichii K, Kim J, et al. New gap-filling strategies for long-period flux data gaps using a data-driven approach[J/OL]. Atmosphere, 2019, 10(10): 568[2022−12−10]. https://doi.org/10.3390/atmos10100568. [20] 周宇, 黄辉, 张劲松, 等. 森林生态系统涡度相关法碳通量长时间连续性缺失数据插补方法的比较[J]. 中国农业气象, 2021, 42(4): 330−343.Zhou Y, Huang H, Zhang J S, et al. Comparison of gap-filling methods for long-term continuous missing data in carbon flux observation by eddy covariance method of forest ecosystem[J]. Chinese Journal of Agrometeorology, 2021, 42(4): 330−343. [21] Zhu S Y, Clement R, McCalmont J, et al. Stable gap-filling for longer eddy covariance data gaps: a globally validated machine-learning approach for carbon dioxide, water, and energy fluxes[J/OL]. Agricultural and Forest Meteorology, 2022, 314: 108777[2022−12−10]. https://doi.org/10.1016/j.agrformet.2021.108777. [22] Kim Y, Johnson M S, Knox S H, et al. Gap-filling approaches for eddy covariance methane fluxes: a comparison of three machine learning algorithms and a traditional method with principal component analysis[J]. Global Change Biology, 2020, 26(3), 1499−1518. doi: 10.1111/gcb.14845 [23] 窦兆一, 刘建军. 人工神经网络在通量观测资料插补中的应用[J]. 西北林学院学报, 2009, 24(3): 58−62.Dou Z Y, Liu J J. Application of artificial neural networks to interpolation and extrapolation of flux data[J]. Journal of Northwest Forestry University, 2009, 24(3): 58−62. [24] Biederman J A, Scott R L, Bell T W, et al. CO2 exchange and evapotranspiration across dryland ecosystems of southwestern north America[J]. Global Change Biology, 2017, 23(10), 4204–4221. doi: 10.1111/gcb.13686 [25] Jia X, Zha T S, Gong J N, et al. Multi-scale dynamics and environmental controls on net ecosystem CO2 exchange over a temperate semiarid shrubland[J]. Agricultural and Forest Meteorology, 2018, 259: 250−259. doi: 10.1016/j.agrformet.2018.05.009 [26] Jia X, Mu Y, Zha T S, et al. Seasonal and interannual variations in ecosystem respiration in relation to temperature, moisture, and productivity in a temperate semi-arid shrubland[J/OL]. Science of the Total Environment, 2020, 709, 136210[2022−12−10]. https://doi.org/10.1016/j.scitotenv.2019.136210. [27] Wutzler T, Lucas-Moffat A, Migliavacca M, et al. Basic and extensible post-processing of eddy covariance flux data with REddyProc[J]. Biogeoences, 2018, 15(16):5015−5030. [28] 申冲, 王春林, 赵晓松, 等. 人工红树林碳通量变化特征及其影响因素分析[J]. 南京信息工程大学学报(自然科学版), 2022, 14(1): 11−20.Shen C, Wang C L, Zhao X S, et al. Variations and controlling factors of carbon fluxes from a restored mangrove wetland[J]. Journal of Nanjing University of Information Science and Technology (Natural Science Edition) , 2022, 14(1): 11−20. [29] 彭丽, 赵仲辉, 项文化, 等. 辐射变化对中亚热带杉木人工林净CO2交换的影响[J]. 应用生态学报, 2022, 33(1): 17−24.Peng L, Zhao Z H, Xiang W H, et al. Effects of radiation changes on net ecosystem exchange of carbon dioxide in a middle subtropical Chinese fir plantation[J]. Chinses Journal of Applied Ecology, 2022, 33(1): 17−24. [30] 龚婷婷. 中国北方荒漠区水碳通量变化规律研究[D]. 北京: 清华大学, 2017.Gong T T. Variations of water and carbon fluxes on the dryland of north China[D]. Beijing: Tsinghua University, 2017. [31] 张悦, 冯会丽, 王维枫,等. 洪泽湖地区杨树人工林碳水通量昼夜和季节变化特征[J]. 南京林业大学学报(自然科学版), 2019, 43(5): 113−120.Zhang Y, Feng H L, Wang W F, et al. Diurnal and seasonal changes of fluxes over a poplar plantation in Hongze Lake Basin[J]. Journal of Nanjing Forestry University (Natural Science Edition), 2019, 43(5): 113−120. [32] Xu L, Baldocchi D D. Seasonal variation in carbon dioxide exchange over a mediterranean annual grassland in California[J]. Agricultural and Forest Meteorology, 2004, 123(1−2): 79−96. doi: 10.1016/j.agrformet.2003.10.004 [33] 吴东星, 李国栋, 亢琼琼,等. 华北平原冬小麦农田生态系统CO2通量特征及其影响因素[J]. 应用生态学报, 2018, 29(3): 827−838.Wu D X, Li G D, Kang Q Q, et al. Characteristics of CO2 flux and its influence factors over winter wheat agroecosystem in the north China plain[J]. Chinses Journal of Applied Ecology, 2018, 29(3): 827−838. [34] 齐建东, 黄金泽, 贾昕. 基于XGBoost-ANN的城市绿地净碳交换模拟与特征响应[J]. 农业机械学报, 2019, 50(5): 269−278.Qi J D, Huang J Z, Jia X. Simulation of NEE and characterization of urban green-land ecosystem responses to climatic controls based on XGBoost-ANN[J]. Transactions of the Chinese Society for Agricultural Machinery, 2019, 50(5): 269−278. [35] Dannenberg M P, Barnes M L, Smith W K, et al. Upscaling dryland carbon and water fluxes with artificial neural networks of optical, thermal, and microwave satellite remote sensing[J]. Biogeosciences, 2023, 20(2):383–404. doi: 10.5194/bg-20-383-2023 [36] 张琨, 朱高峰, 白岩, 等. 基于人工神经网络的涡度相关仪观测蒸散量的数据插补方法[J]. 兰州大学学报(自然科学版), 2014, 50(3): 348−355. doi: 10.13885/j.issn.0455-2059.2014.03.009Zhang K, Zhu G F, Bai Y, et al. Gap filling for evapotranspiration based on BP artificial neural networks[J]. Journal of Lanzhou University (Natural Sciences), 2014, 50(3): 348−355. doi: 10.13885/j.issn.0455-2059.2014.03.009 [37] 齐建东, 黄俊尧. 基于深度学习的草地生态系统净碳交换模拟[J]. 农业机械学报, 2020, 51(6): 152−161. doi: 10.6041/j.issn.1000-1298.2020.06.016Qi J D, Huang J Y. Simulation of NEE in grassland ecosystems based on deep learning[J]. Transactions of the Chinese Society for Agricultural Machinery, 2020, 51(6): 152−161. doi: 10.6041/j.issn.1000-1298.2020.06.016 [38] 黄婕, 张丰, 杜震洪, 等. 基于RNN-CNN集成深度学习模型的PM2.5小时浓度预测[J]. 浙江大学学报(理学版), 2019: 46 (3): 370−379.Huang J, Zhang F, Du Z H, et al. Hourly concentration of PM2.5 based on RNN-CNN ensemble deep learning model[J]. Journal of Zhejiang University (Science Edition), 2019: 46 (3): 370−379. [39] Chang X G, Xing Y Q, Gong W S, et al. Evaluating gross primary productivity over 9 ChinaFlux sites based on random forest regression models, remote sensing, and eddy covariance data[J]. Science of the Total Environment, 2023, 875: 162601. doi: 10.1016/j.scitotenv.2023.162601 [40] Ellsäßer F, Röll A, Ahongshangbam J, et al. Predicting tree sap flux and stomatal conductance from drone-recorded surface temperatures in a mixed agroforestry system: a machine learning approach[J/OL]. Remote Sensing, 2020, 12(24): 4070[2022−12−10]. https://doi.org/10.3390/rs12244070. [41] Adjuik T A, Davis S C. Machine learning approach to simulate soil CO2 dluxes under cropping systems[J/OL]. Agronomy, 2022, 12(1): 197[2022−12−10]. https://doi.org/10.3390/agronomy12010197. [42] La Puma I P, Philippi T E, Oberbauer S F. Relating NDVI to ecosystem CO2 exchange patterns in response to season length and soil warming manipulations in arctic Alaska[J]. Remote Sensing of Environment, 2007, 109(2): 225−236. doi: 10.1016/j.rse.2007.01.001 [43] Wylie B K, Johnson D A, Laca E, et al. Calibration of remotely sensed, coarse resolution NDVI to CO2 fluxes in a sagebrush-steppe ecosystem[J]. Remote Sensing of Environment, 2003, 85(2): 243−255. doi: 10.1016/S0034-4257(03)00004-X [44] 王少影, 张宇, 孟宪红,等. 机器学习算法对涡动相关缺失通量数据的插补研究[J]. 高原气象, 2020, 39(6): 1348−1360. doi: 10.7522/j.issn.1000-0534.2019.00142Wang S Y, Zhang Y, Meng X H, et al. Fill the gaps of eddy covariance fluxes using machine learning algorithms[J]. Plateau Meteorology, 2020, 39(6): 1348−1360. doi: 10.7522/j.issn.1000-0534.2019.00142 [45] Moffat A M, Papale D, Reichstein M, et al. Comprehensive comparison of gap-filling techniques for eddy covariance net carbon fluxes[J]. Agricultural and Forest Meteorology, 2007, 147(3−4): 209−232. doi: 10.1016/j.agrformet.2007.08.011 [46] Suykens J A K, Vandewalle J. Least squares support vector machine classifiers[J]. Neural Processing Letters, 1999, 9: 293−300. doi: 10.1023/A:1018628609742 [47] 黄金泽. 神经网络模型在碳通量数据模拟的研究与应用[D]. 北京: 北京林业大学, 2019.Huang J Z. Application of artificial neural networks in modelling carbon flux[D]. Beijing: Beijing Forestry University, 2019. -