Abstract:
Objective The duration and concentration peak of pollen were analyzed, and the optimal prediction models of daily pollen concentration in Beijing were built to provide data support for scientific prediction of future daily pollen concentration.
Method Based on the time series data of daily pollen concentration in Beijing from 2015 to 2020, multiple interpolation methods were used to handle missing data, and three time series models (SARIMA, LSTM and Prophet) were used to predict pollen concentration changes in the next year (year 2020, totaling 182 d).
Result (1) Among the four multiple interpolation methods, including random forest imputation, Bayesian linear regression imputation, simple random sampling imputation, and predictive mean matching with distance-aided donor selection, the third imputed dataset from the random forest method had the smallest value (P = 0.002), which was the best imputation dataset. (2) Average daily pollen concentration data from 2015 to 2020 showed that the spring peak occurred between March and June, reaching it’s peak in early April (792 grain/(103 mm2)); the autumn peak occurred between August and the end of September, reaching it’s peak in early September (449 grain/(103 mm2)). From 2015 to 2019, the overall pollen concentration showed a decreasing trend, while 2020 saw a step upward trend. The peak duration was the longest in 2015 (107 d in spring, 65 d in autumn), while it was the shortest in 2018 (60 d in spring, 46 d in autumn). The peak value of pollen concentration was the highest in 2020, while the peak value was the lowest in 2019. (3) Among the three models, the LSTM model was the best in describing and predicting the time series of daily pollen concentration in Beijing. When the time step (look_back) of the LSTM model was 60, the model provided the best prediction, with the smallest RMSE and MAE, and R2 of 0.78. In contrast, the Prophet model perforned poorly, unable to capture the peak concentration sensitively, with predicted values sometimes negative. The SARIMA model had a reasonable fit but did not perform well in prediction, with some negative predicted values.
Conclusion Compared to SARIMA and Prophet models, the LSTM model is more suitable for the establishing and long-term predicting the time series model of daily pollen concentration in Beijing. Future research should focus on improving the pollen concentration data for March and optimizing model performance to more accurately predict the starting and ending times, duration, and peak concentration of pollen periods.