Confidence-Aware Reward Shaping for Crypto Trading: A Comparative Study of Lightweight Uncertainty Estimation Methods
Аннотация
Reinforcement learning agents for financial trading typically optimize reward functions that directly map profit and loss to learning signals, without accounting for the agent’s own decision certainty. This paper investigates whether modulating reward signals by a confidence estimate, without modifying network architecture, training procedures, or data pipelines, can meaningfully improve trading performance. We formalize five lightweight confidence estimation methods, each targeting a distinct uncertainty dimension: critic agreement (value estimation), temporal direction consistency (behavioral stability), state novelty (distributional familiarity), action magnitude stability (position sizing), and state-transition surprise (environmental predictability). Using a Twin Delayed Deep Deterministic Policy Gradient agent trained on hourly OHLCV data for Bitcoin, Litecoin, and Ethereum over five years encompassing diverse market regimes, we conduct a controlled experiment in which the confidence method is the sole variable across 18 experimental conditions. State novelty achieves the strongest improvement, raising mean test-period ROI from 5.7% to 24.9%, increasing Sharpe ratio (SR) from 0.34 to 1.57, and reducing maximum drawdown from 28.0% to 15.0% across the three cryptocurrencies. Four of the five methods reach statistical significance at p<0.05 on all assets; only state-transition surprise, the sole method requiring an auxiliary network, fails to distinguish itself from the baseline due to signal saturation. The proposed confidence-aware reward-shaping framework is plug-and-play, algorithm-agnostic, and directly applicable to other RL-based trading systems.
Ҳали таржима қилинмаган