Maqola

Machine learning prediction of hydrochar adsorption capacity for methylene blue with limited data: Inspired by generative adversarial network-based augmentation

Chong LiuDepartment of Chemical & Materials Engineering, University of Auckland, 1010, New ZealandJingxian AnDepartment of Food Science, Purdue University, 745 Agriculture Mall Drive, West Lafayette, IN, 47907, United StatesXuan Cuong NguyenCenter for Advanced Chemistry, Institute of Research and Development, Duy Tan University, Da Nang, 550000, Viet NamP. BalasubramanianDepartment of Biotechnology & Medical Engineering, National Institute of Technology Rourkela, 769008, India

2025en

ABI

Annotatsiya

In this digital and green era, machine learning (ML) offers powerful tools for environmental remediation, yet its efficacy can be undermined by small or skewed datasets. To overcome this challenge, this study introduces a novel framework for predicting the equilibrium adsorption capacity of hydrochar for methylene blue from limited data. This approach uses a generative adversarial network (GAN)-based strategy to augment a sparse experimental dataset, thereby significantly enhancing data volume and diversity. For robust modeling, potential multicollinearity among the 12 input features was mitigated using a combined variance inflation factor and Pearson correlation coefficient (VIF–PCC) filter. Next, twelve ML regression algorithms were systematically benchmarked, with tree-based models consistently outperforming kernel-based and neural network models. The top performer, a Histogram Gradient Boosting regressor, achieved excellent test accuracy (R 2 = 0.9561; RMSE = 0.0906 mmol/g), and stability tests along with residual diagnostics confirmed its reliability. To ensure interpretability, Shapley additive explanations (SHAP) were coupled with generalized additive models (GAMs), which unveiled critical adsorption thresholds (initial dye concentration C 0 ≈ 7.5 mmol/g; near-neutral pH) and quantified the relative importance of key factors (adsorption environment ∼63.4 % > synthesis conditions ∼25.2 % > material properties ∼11.4 %). Finally, the optimized model was deployed as a user-friendly Streamlit web application, enabling instant capacity predictions from twelve routine inputs and export of results for laboratory documentation. Overall, this integrated workflow—combining data augmentation, multicollinearity-aware feature selection, robust ensemble learning, and explainable ML—represents a novel and transferable blueprint for adsorption studies constrained by sparse data. The outcomes advance data-driven optimization of water treatment processes and exemplify how digital innovation can amplify the impact of machine learning on sustainable environmental remediation in the green era. • A GAN-inspired strategy generated synthetic adsorption data from limited experimental records. • Multicollinearity among features was mitigated using a VIF–PCC hybrid filter. • Ensemble ML models outperformed kernel and neural methods; HGB achieved R 2 = 0.9561. • SHAP + GAMs enabled model interpretability and revealed key adsorption thresholds. • A GUI using Streamlit allows real-time MB adsorption prediction and CSV export.

Hali tarjima qilinmagan

Identifikatorlar

DOI: 10.1016/j.eesus.2025.100043

Iqtiboslar va manbalar

3 ta iqtibos0 ta foydalanilgan manba

Koʻrsatkichlar — AkademScholar