Objective Based on multi-source remote sensing data, the accuracy of forest stock volume estimation models constructed by combining different feature selection methods and machine learning algorithms was evaluated, and their synergistic and complementary potentials were explored to effectively improve the estimation accuracy of forest stock volume.
Method Based on the data of the 9th National Forest Resources Continuous Inventory in Hebei Province of northern China, this study combined four types of remote sensing data, i.e., GF-1, Sentinel-2, Sentinel-1 and ASTER GDEM, and employed three types of feature selection methods, i.e., variable selection using random forests (VSURF), recursive feature elimination (RFE) and Boruta, and five types of machine learning algorithms, i.e., support vector egression (SVR), K-nearest neighbor (KNN), random forest (RF), categorical boosting (CatBoost) and extreme gradient boosting (XGBoost) to construct forest stock volume model and screen the optimal model. In addition, the effects of three factors, i.e., dataset, feature selection and machine learning algorithms, on the estimation of forest stock volume were quantified by analysis of variance (ANOVA).
Result (1) The results of ANOVA showed that the dataset, feature selection and machine learning algorithms all had a significant impact on performance of forest stock volume estimation. (2) The combination of multi-source remote sensing data can effectively improve the performance of forest stock volume estimation. Compared with other datasets, the model constructed by combining the GF-1, Sentinel-2, Sentinel-1 and ASTER GDEM data showed higher estimation accuracy. On the whole, the Boruta feature selection method was superior to VSURF and RFE. CatBoost outperformed other algorithms (SVR, KNN, RF and XGBoost) in modeling. (3) Based on the combination of GF-1, Sentinel-2, Sentinel-1, and ASTER GDEM, the estimation model built using Boruta for feature selection and CatBoost machine learning algorithm achieved the highest accuracy (R2 = 0.638 5, RMSE = 13.305 3 m3/ha).
Conclusion In the estimation of forest stock volume in Baoding City of Hebei Province based on multi-source remote sensing data, the combination of feature selection and machine learning algorithm can effectively improve the model estimation effect, obtain better forest stock volume estimation results. The results of this study not only improve the current method of estimating forest stock volume based on multi-source remote sensing data, but also provide a new idea and reference basis for large-scale forest stock volume monitoring.