Мақола

Confidence-Aware Reward Shaping for Crypto Trading: A Comparative Study of Lightweight Uncertainty Estimation Methods

Farkhod AkhmedovDepartment of Computer Engineering, Gachon University, Seongnam 13120, Republic of KoreaYoung Im ChoDepartment of Computer Engineering, Gachon University, Seongnam 13120, Republic of KoreaOtabek SattarovDepartment of Data Communication Networks and Systems, Tashkent University of Information Technologies, Tashkent 100084, UzbekistanYusupov Sarvarbek SodikovichDepartment of Mechanical Engineering, Kimyo International University in Tashkent, Tashkent 100121, UzbekistanMallayev OybekHalimjon KhujamatovDepartment of Data Communication Networks and Systems, Tashkent University of Information Technologies, Tashkent 100084, UzbekistanRăzvan CrăciunescuTelecommunications Department, Faculty of Electronics, Telecommunications and Information Technology, National University of Science and Technology POLITEHNICA, 060042 Bucharest, Romania

Mathematicsjournal2026en

ABI

Аннотация

Reinforcement learning agents for financial trading typically optimize reward functions that directly map profit and loss to learning signals, without accounting for the agent’s own decision certainty. This paper investigates whether modulating reward signals by a confidence estimate, without modifying network architecture, training procedures, or data pipelines, can meaningfully improve trading performance. We formalize five lightweight confidence estimation methods, each targeting a distinct uncertainty dimension: critic agreement (value estimation), temporal direction consistency (behavioral stability), state novelty (distributional familiarity), action magnitude stability (position sizing), and state-transition surprise (environmental predictability). Using a Twin Delayed Deep Deterministic Policy Gradient agent trained on hourly OHLCV data for Bitcoin, Litecoin, and Ethereum over five years encompassing diverse market regimes, we conduct a controlled experiment in which the confidence method is the sole variable across 18 experimental conditions. State novelty achieves the strongest improvement, raising mean test-period ROI from 5.7% to 24.9%, increasing Sharpe ratio (SR) from 0.34 to 1.57, and reducing maximum drawdown from 28.0% to 15.0% across the three cryptocurrencies. Four of the five methods reach statistical significance at p<0.05 on all assets; only state-transition surprise, the sole method requiring an auxiliary network, fails to distinguish itself from the baseline due to signal saturation. The proposed confidence-aware reward-shaping framework is plug-and-play, algorithm-agnostic, and directly applicable to other RL-based trading systems.

Ҳали таржима қилинмаган

Мавзулар

Stock Market Forecasting Methods Blockchain Technology Applications and Security Complex Systems and Time Series Analysis

Идентификаторлар

DOI: 10.3390/math14122075

Иқтибослар ва манбалар

0 та иқтибос36 та фойдаланилган манба

Кўрсаткичлар — AkademScholar · Тез орада