Обзорная статья

Machine Learning and Its Applications to Biology

Adi L. TarcaHarvard Medical School, Channing Laboratory, Boston, Massachusetts, United States of America. Xue-wen Chen is with the Bioinformatics and Computational Life Sciences Laboratory, Department of Electrical Engineering and Computer Science,Vincent J. CareyHarvard Medical School, Channing Laboratory, Boston, Massachusetts, United States of America. Xue-wen Chen is with the Bioinformatics and Computational Life Sciences Laboratory, Department of Electrical Engineering and Computer Science,Xuewen ChenFran Lewitter, Whitehead Institute, United States of AmericaRoberto RomeroHarvard Medical School, Channing Laboratory, Boston, Massachusetts, United States of America. Xue-wen Chen is with the Bioinformatics and Computational Life Sciences Laboratory, Department of Electrical Engineering and Computer Science,Sorin DrăghiciHarvard Medical School, Channing Laboratory, Boston, Massachusetts, United States of America. Xue-wen Chen is with the Bioinformatics and Computational Life Sciences Laboratory, Department of Electrical Engineering and Computer Science,

2007en

ABI

Аннотация

The term machine learning refers to a set of topics dealing with the creation and evaluation of algorithms that facilitate pattern recognition, classification, and prediction, based on models derived from existing data. Two facets of mechanization should be acknowledged when considering machine learning in broad terms. Firstly, it is intended that the classification and prediction tasks can be accomplished by a suitably programmed computing machine. That is, the product of machine learning is a classifier that can be feasibly used on available hardware. Secondly, it is intended that the creation of the classifier should itself be highly mechanized, and should not involve too much human input. This second facet is inevitably vague, but the basic objective is that the use of automatic algorithm construction methods can minimize the possibility that human biases could affect the selection and performance of the algorithm. Both the creation of the algorithm and its operation to classify objects or predict events are to be based on concrete, observable data. The history of relations between biology and the field of machine learning is long and complex. An early technique [1] for machine learning called the perceptron constituted an attempt to model actual neuronal behavior, and the field of artificial neural network (ANN) design emerged from this attempt. Early work on the analysis of translation initiation sequences [2] employed the perceptron to define criteria for start sites in Escherichia coli. Further artificial neural network architectures such as the adaptive resonance theory (ART) [3] and neocognitron [4] were inspired from the organization of the visual nervous system. In the intervening years, the flexibility of machine learning techniques has grown along with mathematical frameworks for measuring their reliability, and it is natural to hope that machine learning methods will improve the efficiency of discovery and understanding in the mounting volume and complexity of biological data. This tutorial is structured in four main components. Firstly, a brief section reviews definitions and mathematical prerequisites. Secondly, the field of supervised learning is described. Thirdly, methods of unsupervised learning are reviewed. Finally, a section reviews methods and examples as implemented in the open source data analysis and visualization language R (http://www.r-project.org).

Перевод пока недоступен

Идентификаторы

DOI: 10.1371/journal.pcbi.0030116

Цитирования и источники

Цитирований: 2Использованных источников: 0

Показатели — AkademScholar