Статья

Gradient-Based Optimization of Hyperparameters

Yoshua BengioDépartement d'informatique et recherche opérationnelle, Université de Montréal, Montréal, Québec, Canada, H3C 3J7

2000en

ABI

Аннотация

Many machine learning algorithms can be formulated as the minimization of a training criterion that involves a hyperparameter. This hyperparameter is usually chosen by trial and error with a model selection criterion. In this article we present a methodology to optimize several hyperparameters, based on the computation of the gradient of a model selection criterion with respect to the hyperparameters. In the case of a quadratic training criterion, the gradient of the selection criterion with respect to the hyperparameters is efficiently computed by backpropagating through a Cholesky decomposition. In the more general case, we show that the implicit function theorem can be used to derive a formula for the hyperparameter gradient involving second derivatives of the training criterion.

Перевод пока недоступен

Идентификаторы

DOI: 10.1162/089976600300015187

Цитирования и источники

Цитирований: 2Использованных источников: 0

Показатели — AkademScholar