Lesion-Aware Ordinal Transformer for Diabetic Retinopathy Classification from Fundus Images
Аннотация
Diabetic retinopathy (DR) is a major cause of preventable blindness, and early detection is essential for preserving sight. Manual grading of retinal fundus images is time-consuming and depends on specialist expertise. This study presents a method for automatic five-level DR grading that pays specific attention to lesion regions and respects the ordered nature of disease severity. The approach begins by enhancing fundus images and identifying small lesion areas, such as microaneurysms and exudates, using vessel suppression and morphological filters. These lesion areas are converted into compact features and combined with standard image patches to guide the network towards medically relevant regions. A ConvNeXt-V2 stem is used to keep local texture details, while a Swin-V2 transformer head captures broader retinal structure. The model is trained with an ordinal loss to reflect the progression of DR and uses focal weighting to reduce the effect of class imbalance. After training, temperature scaling is applied to improve prediction confidence. Experiments on the APTOS-2019 dataset show an accuracy of 84.6%, a macro-F1 score of 0.707, and a quadratic weighted kappa of 0.812. The method performs especially well on advanced disease stages and produces clear visual explanations that match known lesion patterns. These results indicate that combining lesion-focused features with ordinal learning leads to more reliable and clinically meaningful DR grading.