Novel Algorithm Proposed for Selecting Variables Efficiently
|
The results were published in Infrared Physics & Technology.
Spectral technology, with the help of spectral analysis detection and spectrometers, is extensively used in various fields. Extracting feature information from complex high-dimensional spectral data plays a crucial role in qualitative and quantitative analysis, enhancing predictive capabilities, and facilitating the development of cost-effective, multi-channel spectral detection instruments. Nonetheless, selecting an optimal wavelength combination from the high-dimensional variable space for building spectral prediction models remains a challenging task due to its NP-hard nature.
To further improve the effectiveness of variable selection, the research team proposed the MWO-BOSS algorithm based on the BOSS algorithm framework. The algorithm combines six weight vectors - Selectivity Ratio , variable importance in projection , The frequency vector, Reciprocal of residual variance vector , Regression coefficient, and significance multivariate correlation - and uses a threshold search strategy to seek the optimal weight vector to extract useful information from the spectrum.
The algorithm's performance was tested on publicly available datasets such as corn, soil, and beer, and multiple high-performance variable selection algorithms. The results showed that the algorithm can efficiently select variables and significantly improve the model's predictive ability.
This work was supported by the National Natural Science Foundation of China, the Major Project of Anhui Province, and other projects.
Novel comprehensive variable selection algorithm based on multi-weight vector optimal selection and bootstrapping soft shrinkage (Image by ZHANG Pengfei)