Статья

GAN-Based Novel Approach for Generating Synthetic Medical Tabular Data

Rashid NasimovArtificial Intelligence, Tashkent State University of Economics, Tashkent 100066, UzbekistanNigorakhon NasimovaDepartment of Software Information Technologies, Tashkent University of Information Technologies Named After Muhammad Al-Khwarizmi, Tashkent 100200, UzbekistanSanjar MirzakhalilovDepartment of Software Information Technologies, Tashkent University of Information Technologies Named After Muhammad Al-Khwarizmi, Tashkent 100200, UzbekistanGül TokdemirDepartment of Computer Engineering, Faculty of Engineering, Cankaya University, 06790 Ankara, TurkeyM. RizwanCentre of Excellence for Electric Vehicle and Related Technologies, Department of Electrical Engineering, Delhi Technological University, Delhi 110042, IndiaAkmalbek AbdusalomovDepartment of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 461-701, Gyeonggi-do, Republic of KoreaYoung Im ChoDepartment of Computer Engineering, Gachon University, Sujeong-gu, Seongnam-si 461-701, Gyeonggi-do, Republic of Korea

Bioengineeringjournal2024en

ABI

Аннотация

The generation of synthetic medical data has become a focal point for researchers, driven by the increasing demand for privacy-preserving solutions. While existing generative methods heavily rely on real datasets for training, access to such data is often restricted. In contrast, statistical information about these datasets is more readily available, yet current methods struggle to generate tabular data solely from statistical inputs. This study addresses the gaps by introducing a novel approach that converts statistical data into tabular datasets using a modified Generative Adversarial Network (GAN) architecture. A custom loss function was incorporated into the training process to enhance the quality of the generated data. The proposed method is evaluated using fidelity and utility metrics, achieving "Good" similarity and "Excellent" utility scores. While the generated data may not fully replace real databases, it demonstrates satisfactory performance for training machine-learning algorithms. This work provides a promising solution for synthetic data generation when real datasets are inaccessible, with potential applications in medical data privacy and beyond.

Темы

Privacy-Preserving Technologies in Data Machine Learning in Healthcare Imbalanced Data Classification Techniques

Идентификаторы

DOI: 10.3390/bioengineering11121288

Цитирования и источники

Цитирований: 2 Использованных источников: 28

Показатели — AkademScholar · Скоро