Comparison with Baseline Methods
Optimization Process
Applied to MoGe-2
CAPA improves both accuracy and (temporal) consistency beyond the base model.
Quantitative Results
Citation
TBD
TL;DR: CAPA is a framework for depth completion that adapts pre-trained depth models in test time. Given sparse geometric cues, it freezes the model backbone and uses parameter-efficient fine-tuning (like LoRA and VPT) to adapt to a specific sample (image or video). It works with any ViT-based model, and achieves state-of-the-art depth accuracy and temporal consistency.
TBD