Data-driven prediction of CIMS signal intensity for pesticide detection via dibromomethane reagent using machine learning techniques
Аннотация
This research introduces a model utilizing machine learning for forecasting Chemical Ionization Mass Spectrometry (CIMS) signal intensity in pesticide detection, using dibromomethane (DBrMe) as a reagent. Accurate detection of pesticides is crucial for agricultural safety and compliance. The model explores the relationship between signal intensity and ten molecular features, including molar mass, COO, N-O, N-N, N-S, C-C, S, Cl, P, and pesticide concentration in DBrMe (ppm), using algorithms like Decision Tree, AdaBoost, Random Forest, and Ensemble Learning. A dataset of 2460 samples was used for training and validation. Among the features, pesticide concentration had the strongest influence, followed by N-O, COO, and molar mass. SHAP analysis confirmed these trends, while a Leverage-based method was used to identify and remove outliers, improving model reliability. Random Forest outperformed other models, achieving the highest R 2 (0.401) and lowest error. In contrast, Decision Tree and AdaBoost showed overfitting issues. Sensitivity analysis demonstrated that all variables contribute to the prediction, highlighting the model's robustness. This approach offers a cost-effective, accurate alternative to traditional experimental methods for estimating CIMS signal intensity across various pesticides and conditions, supporting faster and more efficient chemical analysis in agricultural monitoring.