Статья

FSRF:An Improved Random Forest for Classification

Wenxian FengDepartment of software engineering, Jilin University, Changchun, ChinaChenkai MaDepartment of software engineering, Jilin University, Changchun, ChinaGuozhang ZhaoDepartment of software engineering, Jilin University, Changchun, ChinaRui ZhangDepartment of Computer Science, Jilin University, Changchun, China

2020en

ABI

Аннотация

Random forest algorithm is a flexible and easy-to-use machine learning algorithm, which is widely used in classification problems. However, the traditional random forest has some limitations. Because the randomness added by random forest to decision trees almost only occurs in the feature selection when the decision trees are generated, the fixity of decision trees generation rules will lead to relatively serious over fitting. At the same time, in the face of data with high and unbalanced feature dimensions, the performance of algorithm is seriously weakened because high-dimensional data usually contains many irrelevant and redundant features. To solve these problems, we propose an improved random forest algorithm FSRF. Based on the traditional random forest algorithm, we use the feature selection methods to preprocess the data and get the feature subset with the best classification performance to construct the random forest. At the same time, we introduce sparse matrix projection to improve the generation of the random forest. Experiments show that our method reduces the influence of redundant features on classification and improves the accuracy.

Перевод пока недоступен

Идентификаторы

DOI: 10.1109/aeeca49918.2020.9213456

Цитирования и источники

Цитирований: 4Использованных источников: 0

Показатели — AkademScholar