Data preprocessing on input
Аннотация
Many factors affect the success of Machine Learning on a given task. First of all, we need quality data. Data preprocessing is a factor that directly affects the quality of the intellectual analysis process because solving problems with the initial unprocessed sample does not give the expected result, which can lead to erroneous conclusions [1]. This can be caused by a number of errors, such as the repetition of data, the impossible value attributes, missing values, and so on. Such data can occur for a variety of reasons, such as entering data, using different formats or units of measurement, incorrectly deleting recurring value records, and so on [2]. The results of the algorithms used after the detection of errors in the initial sample, logically impossible values of features in the description of objects, and data preprocessing by removing such objects from the sample gives more reliable results.This article proposes to define a range of possible values for each pair of quantitative features. A incorrect object’s data can be identified on entering by the values of a pair of features, that do not fall into the appropriate ranges.
Перевод пока недоступен