A Hybrid CNN-Transformer Architecture for Vision-Based EEG State Classification
Аннотация
This paper presents a comparative analysis of deep learning architectures for electroencephalogram (EEG) signal classification, focusing on the detection of eyes open/closed states. We introduce a novel approach where raw EEG signals are converted into composite image representations, enabling the application of state-of-the-art computer vision models. Our comprehensive evaluation benchmarks modern Convolutional Neural Networks (CNNs) like EfficientNetV2, ResNet50V2, and ConvNeXt against a Vision Transformer (ViT) and a proposed hybrid CNN-ViT architecture. Experiments were conducted on two public datasets: OpenNeuro ds005420 and the PhysioNet EEG Motor/Imagery corpus. The proposed hybrid CNN-ViT model demonstrates superior performance, achieving an accuracy of 73.75% and an AUC of 0.787 on the primary dataset. Notably, this is accomplished with only 1.5 million parameters, significantly outperforming larger models in efficiency. Our findings highlight that a hybrid approach, leveraging the local feature extraction of CNNs and the global context modeling of transformers, offers a robust and computationally efficient solution for EEG analysis. This work validates the efficacy of treating EEG signals as images and underscores the potential of transformer-based architectures for advancing brain-computer interface applications.