高级检索

    GEE多源数据下J-M距离-ReliefF-RFE并行混合特征优选提升高海拔森林树种分类

    Parallel hybrid feature selection with J-M distance, ReliefF, and RFE using multi-source GEE data: enhancing high-altitude forest tree species classification

    • 摘要:
      目的 高海拔地区森林资源动态监测面临云雾干扰、训练样本匮乏及树种光谱相似性高等多重瓶颈,严重制约了优势树种空间分布的精准制图。本研究以香格里拉市典型纯林为对象,旨在利用多源数据与多策略特征优选方法提升树种识别精度与模型泛化能力。
      方法 研究基于GEE平台获取Sentinel-2光学时序、Sentinel-1雷达数据及SRTM地形数据,提取光谱、纹理、植被指数、雷达极化、地形及时序特征,构建基础特征集。采用随机森林(RF)模型确定特征优选前的最优方案后,并行J-M距离、ReliefF和RFE算法构建单一特征集,同时对这3种特征集进行并集融合构建并行混合特征集。将单一优选与并行混合特征集分别代入RF模型重新分类,对比优选前后方案确定最优分类方案。采用生产者精度(PA)、用户精度(UA)、调和平均值(F1)、总体精度(OA)和Kappa系数评价分类精度。
      结果 (1)基于J-M距离、ReliefF和RFE并行混合的特征优选方案9精度最高(OA为94.82%,Kappa系数为0.94),优于特征优选前的最优方案5。(2)多源数据协同分类效果优于单一数据源,仅使用Sentinel-2数据的OA为83.35%(Kappa系数0.79);依次引入Sentinel-1雷达特征、Sentinel-1的纹理特征、地形特征和Sentinel-2时序特征后,OA分别提升了0.87、6.28、8.08、10.18个百分点(Kappa系数分别为0.81、0.86、0.90、0.92),其中Sentinel-2时序特征的引入使分类精度提升了2.10个百分点。(3)植被指数时序曲线分析表明,优势树种在秋冬季节差异显著,可分离性强。
      结论 基于GEE平台多源数据协同J-M距离-ReliefF-RFE并行混合特征优选有效提升了香格里拉森林优势树种的识别精度,系统揭示了其空间分布格局,为高海拔地区森林资源的精准监测提供了技术支撑。

       

      Abstract:
      Objective High-altitude forest monitoring faces multiple constraints, including cloud interference, limited training samples, and high spectral similarity among tree species. These factors severely restrict accurate mapping of dominant species distributions. This study, focusing on typical pure forests in Shangri-La, aimed to enhance species identification accuracy and model generalization through multi-source data and multi-strategy feature optimization.
      Method Sentinel-2 optical time-series, Sentinel-1 radar, and SRTM topographic data were retrieved via Google Earth Engine. We extracted spectral, texture, vegetation index, radar polarization, topographic, and temporal features to construct a baseline feature set. A Random Forest (RF) model first established a pre-selection benchmark. J-M distance, ReliefF, and RFE algorithms were then executed in parallel to generate three individual feature subsets. These subsets were merged via union fusion to create a parallel hybrid feature set. Both individual and hybrid feature sets were input into RF models for classification. The optimal scheme was identified by comparing results across all feature sets. Accuracy was evaluated using Producer’s Accuracy (PA), User’s Accuracy (UA), F1-score, Overall Accuracy (OA), and Kappa coefficient.
      Result (1) Scheme 9, based on the parallel hybrid of J-M distance, ReliefF, and RFE, achieved the highest accuracy (OA = 94.82%, Kappa = 0.94), surpassing the pre-selection baseline (Scheme 5). (2) Multi-source data integration outperformed single-source data. Using Sentinel-2 data alone yielded an OA of 83.35% (Kappa = 0.79). Adding Sentinel-1 radar features, Sentinel-1 texture features, topographic features, and Sentinel-2 temporal features increased OA by 0.87%, 6.28%, 8.08%, and 10.18%, respectively (Kappa = 0.81, 0.86, 0.90, 0.92). Notably, Sentinel-2 temporal features alone contributed a 2.10 percentage point improvement. (3) Temporal vegetation index curves revealed significant inter-species differences and strong separability during autumn and winter.
      Conclusion The parallel hybrid feature selection approach, integrating multi-source GEE data, effectively improved identification accuracy of dominant forest species in Shangri-La. It systematically revealed their spatial distribution patterns and provides robust technical support for precision monitoring of forest resources in high-altitude regions.

       

    /

    返回文章
    返回