A Large-Scale Benchmark on Geological Fault Delineation Models: Domain Shift, Training Dynamics, Generalizability, Evaluation, and Inferential Behavior
Аннотация
Machine learning has taken a critical role in seismic interpretation workflows, especially in fault delineation tasks. However, despite the recent proliferation of pretrained models and synthetic datasets, the field still lacks a systematic understanding of the generalizability limits of these models across seismic data representing a variety of geologic, acquisition and processing settings. Distributional shifts between different data sources, limitations in fine-tuning strategies and labeled data accessibility, and inconsistent evaluation protocols all represent major roadblocks in the deployment of reliable and robust models in real-world exploration settings. In this paper, we present the first large-scale benchmarking study explicitly designed to provide answers and guidelines for domain shift strategies in seismic interpretation. Our benchmark encompasses over 200 combinations of model architectures, datasets and training strategies, across three diverse datasets (synthetic and real data) including FaultSeg3D, CRACKS, and Thebe. We systematically assess pretraining, fine-tuning, and joint training strategies under varying degrees of domain shift. Our analysis highlights that commonly used fine-tuning practices can lead to catastrophic forgetting, especially when source and target datasets are distributionally disjoint, and that larger models such as Segformer tend to be more robust to adaptation than smaller architectures. Interestingly, we also find that common domain adaptation methods outperform fine-tuning when the distributional shift is large, yet underperform when source and target domains are similar. Finally, we complement conventional segmentation metrics with a novel analysis based on fault characteristic descriptors, revealing how models absorb structural biases from different training datasets. Overall, we establish a robust experimental baseline to provide insights into the tradeoffs inherent to current fault delineation workflows, and shed light on directions for developing more generalizable, interpretable and effective machine learning models for seismic interpretation. The insights and analyses reported provide a set of guidelines on the deployment of fault delineation models within seismic interpretation workflows.