Learning Adaptive Routing Policies Under Stochastic Service Times
Аннотация
Field service routing is strongly affected by uncertainty in on-site service durations, where deviations from nominal times propagate through the schedule and induce tardiness. We study the Multi-Vehicle Routing Problem with Stochastic Service Times (MVRP-SST), in which customer service times are revealed only upon arrival and follow customer-specific Gamma distributions. We formulate the problem as a cooperative Dec-POMDP and propose a Multi-Agent Proximal Policy Optimization (MAPPO) approach with centralized training and decentralized execution to learn adaptive routing policies that react to realized service times. The objective minimizes a weighted cost comprising travel distance and tardiness (with an additional penalty for unserved customers). Experiments on synthetic instances with <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$K=3$</tex> vehicles and problem sizes <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$N \in\{30,50,100\}$</tex> show that MAPPO consistently improves the overall objective compared with classical constructive heuristics and simulated annealing, with particularly strong gains at larger scales. Statistical testing confirms significant differences across methods via the Friedman test, and post-hoc Wilcoxon signedrank tests with Holm correction indicate that MAPPO significantly outperforms all baselines for <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$N=50$</tex> and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$N=100$</tex>, and all except simulated annealing for <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$N=30$</tex>.
Перевод пока недоступен