Статья

Complete Data Using Exploratory Data Analysis and ML Algorithms

Sudhanshu TripathiAmity University,Department of IT and Engineering,Tashkent,UzbekistanLeena SinghShyam Lal College, University of Delhi,Department of Computer Science,IndiaJaisurya SermanrajaAmity University,Department of CSE,Noida,India

2024en

ABI

Аннотация

Often real-world data contains missing values/ information which is not easily obtainable. These missing values can be because of any reason. Some people might not be comfortable providing a response to a particular question in a survey, for example, and that cell may have been therefore left empty. In medical emergency situations, maybe the form to be filled out was not filled completely, and therefore some details had been missing which were not filled afterwards due to some unforeseen circumstances. These gaps in information create problems while working on them. Especially, for a Data Scientist or Analyst, it's difficult to work with such incomplete data since many models or estimators, algorithms, etc. may not be able to work on such data. In such situations, the Analyst/ Scientist has to manually deal with the missing values. Such missing values are labelled as NaN or Null in most cases. In removing the entire record, much information is lost and the model may result in being inaccurate. Another method of dealing with missing values is imputation, i.e., replacing the missing label with a value. This is the main focus of this paper: To find a way to automatically impute missing values in any dataset using an amalgamation of algorithms which follow a generalized approach to find accurate values to fill the missing parts. This is will help many Analysts, Researchers or Data Scientists as the manual labour will be lessened and more focus and time could be allotted to their main tasks and goals related to the data. This report presents different algorithms which are combined together to form a generalized model to impute with maximum accuracy a machine can provide based on the information provided.

Перевод пока недоступен

Темы

Neural Networks and Applications Data Mining Algorithms and Applications

Идентификаторы

DOI: 10.1109/ictacs62700.2024.10840870

Цитирования и источники

Цитирований: 0Использованных источников: 10

Показатели — AkademScholar