Advancing glaucoma diagnosis: Multi-modal deep learning with vision transformer architectures
Annotatsiya
One of the significant causes of irreversible blindness is glaucoma which develops without symptoms and is only revealed in time when the case is very severe. The existing diagnostic models that use single-modes imaging and convolutional neural networks (CNNs) have limitations of local features dependence, less interpretability, and a lack of accuracy. This paper suggests a multi-modal deep learning model combining retinal fundus images and optical coherence tomography (OCT) scans with Vision Transformer (ViT) networks that could improve the detection and progression analysis of glaucoma. In this work, multimodal refers to an architectural and representational fusing through a hybrid Vision Transformer design and not a concurrent multi-sensor data acquisition. The structural and contextual information across modalities provide the framework with the ability to capture subtle pathological changes than CNN baselines. The benchmark experiments prove that the suggested model achieves both an accuracy of 94.5 percent and AUC-ROC of 91.7 percent, being better than VGG16, ResNet-50, and InceptionV3. Such findings highlight the promise of transformer-based multi-modal solutions to enhance early detection of glaucoma and assist with more feasible and interpretable clinical judgment.
Hali tarjima qilinmagan