Semantic Resonance Maps: Cross-Modal Oscillations for Explainable Vision-Language Model Interpretability

Nevasini Sasikumar

Paper

Abstract

We introduce Semantic Resonance Maps (SRMs), a novel approach to explainable AI that leverages the natural oscillatory dynamics between vision and language modalities in foundation models. Unlike traditional attribution methods that provide static heatmaps, SRMs capture the iterative refinement process that occurs when visual and textual representations interact, revealing how models progressively align cross-modal understanding. Our method exploits the phenomenon of ``semantic resonance’’ – the amplification of relevant features through recursive cross-modal attention cycles. We demonstrate that these resonance patterns not only provide more faithful explanations than gradient-based methods but also uncover latent conceptual hierarchies that emerge during model inference. Preliminary experiments on vision-language models show that SRMs can identify compositional reasoning pathways and detect when models rely on spurious correlations versus genuine semantic understanding.

Type

Non-Proceedings Track

Publication

Non-Proceedings Track