Complete Data Using Exploratory Data Analysis and ML Algorithms
Аннотация
Often real-world data contains missing values/ information which is not easily obtainable. These missing values can be because of any reason. Some people might not be comfortable providing a response to a particular question in a survey, for example, and that cell may have been therefore left empty. In medical emergency situations, maybe the form to be filled out was not filled completely, and therefore some details had been missing which were not filled afterwards due to some unforeseen circumstances. These gaps in information create problems while working on them. Especially, for a Data Scientist or Analyst, it's difficult to work with such incomplete data since many models or estimators, algorithms, etc. may not be able to work on such data. In such situations, the Analyst/ Scientist has to manually deal with the missing values. Such missing values are labelled as NaN or Null in most cases. In removing the entire record, much information is lost and the model may result in being inaccurate. Another method of dealing with missing values is imputation, i.e., replacing the missing label with a value. This is the main focus of this paper: To find a way to automatically impute missing values in any dataset using an amalgamation of algorithms which follow a generalized approach to find accurate values to fill the missing parts. This is will help many Analysts, Researchers or Data Scientists as the manual labour will be lessened and more focus and time could be allotted to their main tasks and goals related to the data. This report presents different algorithms which are combined together to form a generalized model to impute with maximum accuracy a machine can provide based on the information provided.
Перевод пока недоступен