Article

Lightweight Multimodal Fusion for Urban Tree Health and Ecosystem Services

Abror Shavkatovich BuriboevDepartment of AI-Software, Gachon University, Sujeong-Gu, Seongnam-si 13120, Republic of KoreaDjamshid SultanovDepartment of Infocommunication Engineering, Tashkent University of Information Technologies Named After Muhammad Al-Khwarizmi, Tashkent 100200, UzbekistanI.R. RahmatullaevDepartment of Exact Sciences, Kimyo International University in Tashkent, Tashkent 100121, UzbekistanO.R. YusupovDepartment of Software Engineering, Samarkand State University, Samarkand 140104, UzbekistanErali EshonqulovDepartment of Software Engineering, Samarkand State University Named After Sharof Rashidov, Samarkand 140104, UzbekistanDilshod BekmuradovDepartment of Digital Technologies and AI, Tashkent Institute of Irrigation and Agricultural Mechanization Engineers, National Research University, Tashkent 100000, UzbekistanNodir EgamberdievDepartment of Convergence of Digital Technologies, Tashkent University of Information Technologies Named After Muhammad Al-Khwarizmi, Tashkent 100200, UzbekistanAndrew Jaeyong ChoiDepartment of AI-Software, Gachon University, Sujeong-Gu, Seongnam-si 13120, Republic of Korea

Sensorsjournal2025en

ABI

Abstract

Rapid urban expansion has heightened the demand for accurate, scalable, and real-time methods to assess tree health and the provision of ecosystem services. Urban trees are the major contributors to air-quality improvement and climate change mitigation; however, their monitoring is mostly constrained to inherently subjective and inefficient manual inspections. In order to break this barrier, we put forward a lightweight multimodal deep-learning framework that fuses RGB imagery with environmental and biometric sensor data for a combined evaluation of tree-health condition as well as the estimation of the daily oxygen production and CO2 absorption. The proposed architecture features an EfficientNet-B0 vision encoder upgraded with Mobile Inverted Bottleneck Convolutions (MBConv) and a squeeze-and-excitation attention mechanism, along with a small multilayer perceptron for sensor processing. A common multimodal representation facilitates a three-task learning set-up, thus allowing simultaneous classification and regression within a single model. Our experiments with a carefully curated dataset of segmented tree images accompanied by synchronized sensor measurements show that our method attains a health-classification accuracy of 92.03% while also lowering the regression error for O2 (MAE = 1.28) and CO2 (MAE = 1.70) in comparison with unimodal and multimodal baselines. The proposed architecture, with its 5.4 million parameters and an inference latency of 38 ms, can be readily deployed on edge devices and real-time monitoring platforms.

Topics

Remote Sensing in Agriculture Remote Sensing and LiDAR Applications Urban Heat Island Mitigation

Identifiers

DOI: 10.3390/s26010007

Citations and references

Cited by 025 references

Metrics — AkademScholar · Coming soon