Перейти к основному содержанию
AkademIndex

Продукты

Для разработчиков

AkademBaseОткрытый API экосистемы
Статья

A Design-of-Experiments-Based Approach for Efficient Estimation of Bimodal Gaussian Mixture Weights

Gustavo LealInstitute of Industrial Engineering, Federal University of Itajubá, Itajubá, BrazilLupércio França BessegatoInstitute of Exact Sciences, Universidade Federal de Juiz de Fora, Juiz de Fora, BrazilYasmin Silva MartinsSchool of Engineering and Sciences, São Paulo State University, Guaratinguetá, BrazilFarid MelganiDepartment of Information Engineering and Computer Science, University of Trento, Trento, ItalyPedro Paulo BalestrassiInstitute of Industrial Engineering, Federal University of Itajubá, Itajubá, Brazil
IEEE Accessjournal2025en
ABI

Аннотация

Normal mixture models are widely used to represent data arising from latent subpopulations. We propose a Design-of-Experiments (DOE) and Response Surface Methodology (RSM) framework to estimate the weights of a bimodal Gaussian mixture when component families are known. The procedure is non-iterative: rather than alternating Expectation Maximization (EM) steps, it performs a double-stage method - fit a quadratic response surface to the sample log-likelihood over the weight simplex and solve one constrained optimization - followed by a final Maximum Likelihood re-estimation of means and variances. This yields predictable runtime (driven by design size) and reduced sensitivity to initialization. The pipeline uses (i) k-medians to obtain preliminary component parameters and 99% confidence intervals (CIs) for component proportions; (ii) builds a simplex-lattice mixture design within those CI bounds; (iii) fits a quadratic response surface to log-likelihood; and (iv) optimizes this surface under sum-to-one constraints. We validate the method in 27 Monte Carlo scenarios (n = 100, 500, 1000; low/medium/high differentiation and three weight settings). In medium/high separation, it attains comparable likelihoods to EM while achieving more favorable BIC in multiple scenarios and indistinguishable AIC in many, whereas EM is preferable under low separation. Two real data sets - Old Faithful (Waiting variable) and Photovoltaic Energy (Production variable) - further confirm applicability, with lower AIC/BIC in Old Faithful and lower BIC in PV; clustering agreement is high (κ ≈ 0.99 - 1.00). Overall, DOE-RSM offers a simple, interpretable, and often more parsimonious method, and constitutes a non-iterative alternative for mixture-weight estimation.

Перевод пока недоступен

Темы

Идентификаторы

Цитирования и источники