高级检索

    基于3种时间序列模型的北京市每日花粉浓度预测

    Prediction of daily pollen concentration in Beijing based on three time series models

    • 摘要:
      目的 分析花粉高峰期持续时间和浓度峰值,构建北京市每日花粉浓度的最优预测模型,为科学预测未来每日花粉浓度提供数据支持。
      方法 采用多重插补法处理2015—2020年北京市每日花粉浓度时间序列中的缺失数据,2015—2019年数据用于建立SARIMA、LSTM和Prophet 3种时间序列模型,预测未来一年(2020年,共计182 d)的花粉浓度变化。
      结果 (1)随机森林法、贝叶斯线性回归法、观测值中随机取样法和加权预测均值匹配法4种多重插补法中,随机森林法的第3个插补数据集P值最小(P = 0.002),为最优插补数据集。(2)2015—2020年每日平均花粉浓度数据显示,春季高峰期集中在3—6月,4月初达到峰值(792粒/(103 mm2));秋季高峰期集中在8月至9月末,在9月初达到峰值(449粒/(103 mm2))。2015—2019年花粉浓度总体呈逐年下降趋势,2020年呈现阶跃式上升;其中,2015年高峰期持续时间最长(春季107 d,秋季65 d),2018年最短(春季60 d,秋季46 d);2020年花粉浓度峰值达到最高水平,而2019年花粉浓度峰值最低。(3)3种时间序列模型中,LSTM模型对北京市每日花粉浓度时间序列的描述和预测效果最佳。当LSTM模型的时间步长(look_back)为60时,模型预测效果最佳,RMSE、MAE均为最小,R2 = 0.78。相比之下,Prophet模型效果较差,无法灵敏捕捉浓度峰值,预测值存在负数情况,预测效果不佳。SARIMA模型拟合效果尚可,但预测效果不理想,预测值存在为负的情况。
      结论 与SARIMA和Prophet模型相比,LSTM模型更适用于北京市每日花粉浓度时间序列模型的建立与长期预测。未来研究应完善3月份的花粉浓度数据,优化模型性能,以更准确地预测花粉高峰期的起止时间、持续时间及高峰浓度,为过敏性疾病的防控提供更可靠的依据。

       

      Abstract:
      Objective The duration and concentration peak of pollen were analyzed, and the optimal prediction models of daily pollen concentration in Beijing were built to provide data support for scientific prediction of future daily pollen concentration.
      Method Based on the time series data of daily pollen concentration in Beijing from 2015 to 2020, multiple interpolation methods were used to handle missing data, and three time series models (SARIMA, LSTM and Prophet) were used to predict pollen concentration changes in the next year (year 2020, totaling 182 d).
      Result (1) Among the four multiple interpolation methods, including random forest imputation, Bayesian linear regression imputation, simple random sampling imputation, and predictive mean matching with distance-aided donor selection, the third imputed dataset from the random forest method had the smallest value (P = 0.002), which was the best imputation dataset. (2) Average daily pollen concentration data from 2015 to 2020 showed that the spring peak occurred between March and June, reaching it’s peak in early April (792 grain/(103 mm2)); the autumn peak occurred between August and the end of September, reaching it’s peak in early September (449 grain/(103 mm2)). From 2015 to 2019, the overall pollen concentration showed a decreasing trend, while 2020 saw a step upward trend. The peak duration was the longest in 2015 (107 d in spring, 65 d in autumn), while it was the shortest in 2018 (60 d in spring, 46 d in autumn). The peak value of pollen concentration was the highest in 2020, while the peak value was the lowest in 2019. (3) Among the three models, the LSTM model was the best in describing and predicting the time series of daily pollen concentration in Beijing. When the time step (look_back) of the LSTM model was 60, the model provided the best prediction, with the smallest RMSE and MAE, and R2 of 0.78. In contrast, the Prophet model perforned poorly, unable to capture the peak concentration sensitively, with predicted values sometimes negative. The SARIMA model had a reasonable fit but did not perform well in prediction, with some negative predicted values.
      Conclusion Compared to SARIMA and Prophet models, the LSTM model is more suitable for the establishing and long-term predicting the time series model of daily pollen concentration in Beijing. Future research should focus on improving the pollen concentration data for March and optimizing model performance to more accurately predict the starting and ending times, duration, and peak concentration of pollen periods.

       

    /

    返回文章
    返回