A robust machine learning framework for predicting CO2 solubility in polyethylene glycol using advanced optimization strategies
Аннотация
The accurate estimation of carbon dioxide (CO 2 ) solubility in polyethylene glycol (PEG) is essential for optimizing carbon capture and storage processes, yet complex thermodynamic behaviors challenge conventional modeling approaches. This study aims to develop a robust intelligent framework by systematically evaluating the efficacy of Gradient Boosting Decision Trees (GBDT) coupled with four advanced hyperparameter optimization algorithms: Bayesian Batch Optimizer (BBO), Bayesian Probability Improvement (BPI), Gaussian Process Optimizer (GPO), and Evolutionary Strategies (ES). A dataset comprising 161 experimental data points representing diverse temperature, pressure, and PEG molar mass conditions was curated and subjected to rigorous outlier detection using the Hat matrix method. The modeling framework employed a 5-fold cross-validation strategy to assess generalization capabilities, utilizing metrics such as R 2 and MSE. The comparative analysis revealed that while evolutionary and batch-based strategies achieved near-perfect training accuracy, they suffered from significant overfitting. Conversely, the GBDT-GPO hybrid model demonstrated superior robustness, achieving the highest testing coefficient of determination (R 2 = 0.849) and the lowest average absolute relative error (15.74%), effectively balancing bias and variance. Furthermore, SHAP analysis elucidated the model’s decision-making process, confirming pressure as the dominant governing factor followed by molar mass and temperature, aligning with Henry’s law and thermodynamic principles. The proposed GPO-optimized framework offers a reliable computational tool for predicting gas solubility in polymer solvents.
Ҳали таржима қилинмаган