Real-Time Polyp Detection and Classification in Colonoscopy Videos Using Lightweight Transformer Networks
Аннотация
Colorectal cancer remains one of the leading causes of cancer-related deaths worldwide, with early detection through colonoscopy being critical for prevention. This paper presents a novel lightweight transformer-based architecture for real-time polyp detection and classification in colonoscopy videos. Our proposed method, termed LightPolyp-Former, combines the efficiency of depthwise separable convolutions with the global attention mechanisms of transformers to achieve superior performance while maintaining computational efficiency. We introduce a multi-scale feature aggregation module and a temporal consistency constraint to handle the challenges of varying polyp sizes and video frame continuity. Extensive experiments on five benchmark datasets (Kvasir-SEG, CVC-ClinicDB, ETIS-LaribPolypDB, CVC-ColonDB, and EndoScene) demonstrate that our method achieves state-of-the-art detection accuracy (mAP of 94.7%) and classification performance (F1-score of 93.2%) while operating at 67 FPS on a single GPU, making it suitable for real-time clinical deployment. The model size is reduced by 73% compared to existing transformer-based methods while maintaining comparable accuracy.