Article

Development of robust machine learning models to estimate hydrochar higher heating value and yield based upon biomass proximate analysis

Guoliang HouSchool of Mathematics, Changchun Normal University, Changchun, 130032, Jilin, China. [email protected]Ahmed AlkhayyatDepartment of Computers Techniques Engineering, College of Technical Engineering, The Islamic University, Najaf, Iraq. [email protected]Ahmad AlmalkawiCenter for ESL & Academic Preparation, Modern College of Business and Science, Muscat, OmanAnupam YadavDepartment of Computer Engineering and Application, GLA University, Mathura, 281406, IndiaH S ShreenidhiDepartment of Computer Science and Engineering, School of Engineering and Technology, JAIN (Deemed to Be University), Bangalore, Karnataka, IndiaVishnu SainiSharda School of Engineering and Sciences, Sharda University, Knowledge Park III, Greater Noida, 201310, Uttar Pradesh, IndiaShirin ShomurotovaDepartment of Chemistry Teaching Methods, National Pedagogical University of Uzbekistan, Bunyodkor Street 27, Tashkent, UzbekistanDevendra SinghDepartment of Computer Science & Engineering, Uttaranchal Institute of Technology, Uttaranchal University, Dehradun, Uttarakhand, 248007, IndiaVatsal JainCentre for Research Impact & Outcome, Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, 140401, IndiaAseel SmeratDepartment of Biosciences, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences, Chennai, 602105, IndiaAhmad KhalidFaculty of Engineering, Sana'a University, Sanaa, Yemen. [email protected]

Bioresources and Bioprocessingjournal2025en

ABI

Abstract

Abstract This study introduces a robust machine learning framework for predicting hydrochar yield and higher heating value (HHV) using biomass proximate analysis. A curated dataset of 481 samples was assembled, featuring input variables such as fixed carbon, volatile matter, ash content, reaction time, temperature, and water content. Hydrochar yield and HHV served as the target outputs. To enhance data quality, Monte Carlo Outlier Detection (MCOD) was employed to eliminate anomalous entries. Thirteen machine learning algorithms, including convolutional neural networks (CNN), linear regression, decision trees, and advanced ensemble methods (CatBoost, LightGBM, XGBoost) were systematically compared. CatBoost demonstrated superior performance, achieving an R 2 of 0.98 and mean squared error (MSE) of 0.05 for HHV prediction, and an R 2 of 0.94 with MSE of 0.03 for yield estimation. SHAP analysis identified ash content as the most influential feature for HHV prediction, while temperature, water content, and fixed carbon were key drivers of yield. These results validate the effectiveness of gradient boosting models, particularly CatBoost, in accurately modeling hydrothermal carbonization outcomes and supporting data-driven biomass valorization strategies. Graphical abstract

Topics

Thermochemical Biomass Conversion Processes Biofuel production and bioconversion Supercapacitor Materials and Fabrication

Identifiers

DOI: 10.1186/s40643-025-00979-1

Citations and references

Cited by 074 references

Metrics — AkademScholar · Coming soon