Stochastic optimization of Gradient Boosting Decision Trees for interpretable prediction of heavy metal adsorption onto biochar
Abstract
Predicting the heavy metal adsorption capacity of biochar is a significant challenge due to complex physicochemical mechanisms and the limitations of traditional experimental approaches. This study aimed to develop and validate a robust, interpretable machine learning framework by optimizing Gradient Boosting Decision Trees (GBDT) for this predictive task. Using a comprehensive dataset of 359 experimental points, we compared four hyperparameter optimization heuristics and found that Gaussian Process Optimization (GPO) yielded a model with superior generalization performance. The final GBDT-GPO model achieved a coefficient of determination (R 2 ) of 0.9784 and a mean squared error (MSE) of 0.0035 on an unseen test set, in contrast to other methods like Evolution Strategies, which showed significant overfitting. Furthermore, Shapley Additive Explanations (SHAP) analysis identified initial metal concentration and solution pH as the dominant factors governing adsorption, outweighing physical properties like surface area. This research establishes a highly accurate and interpretable computational strategy that can guide the rational design of biochar and optimize its application in water treatment.