Abstract:
Objective Net ecosystem exchange (NEE) is an important indicator for evaluating the role of terrestrial ecosystems in the global carbon cycle. The accuracy of imputation of missing values in NEE raw observation data directly affects the reliability and precision of key ecosystem parameters. To enhance the imputation accuracy of net ecosystem exchange (NEE) in scenarios of continuous long-term data gaps across different vegetation types, a TSIT-PatchTST model was proposed based on deep learning.
Methods Using carbon flux factor data from sites within the global long-term flux observation network as the research object, three types of random continuous data gap scenarios were constructed, including short missing (1 d), medium missing (7 d), and long missing (30 d). The imputation results of the marginal distribution sampling (MDS) method, PatchTST model, TS2Vec-PatchTST model, and TSIT-PatchTST model under eight different vegetation types were evaluated.
Results In the scenario of short missing, all imputation methods demonstrated optimal performance. As the number of consecutive missing days increased, the imputation accuracy of the MDS method gradually declined, and it was no longer effective for imputing NEE in the long missing scenario. In contrast, the three deep learning models were capable of effectively imputing missing NEE data. Considering all three missing scenarios, the TSIT-PatchTST model exhibited the best imputation performance, particularly with a high accuracy in long missing scenarios. In the long missing scenario, the TSIT-PatchTST model achieved an average mean squared error (MSE) of 0.942 μmol/(m2·s), an average mean absolute error (MAE) of 0.628 μmol/(m2·s), and an average R2 of 0.457 across 31 sites. Compared to the PatchTST model, the TSIT-PatchTST model reduced the average MSE by 53.3% and the average MAE by 39.7% and the average R2 remains unchanged.
Conclusion Integrating the performance across eight vegetation types and three missing scenarios, the TSIT-PatchTST model demonstrated the best imputation effect and adaptability. It can be applied to the problem of missing data in time series to improve the accuracy of data imputation.