The Myth of Robust Classes: How Shielding Skews Perceived Stability

Abstract

We introduce an information-geometric framework for understanding class robustness of deep vision classifier models, leveraging the manifold structure of a classifier’s predictive distribution. Our approach quantifies class robustness using Fisher-Rao (FR) Margins, which measure the distance from an input to it’s decision boundaries minimizing a given loss function. Furthermore, this reveals class shielding effects, where unintended intermediate classes impede transitions between source and target classes, offering insights into model vulnerabilities. We hypothesize that classes deemed “robust” often exhibit high shielding frequency, suggesting apparent robustness can stem from large input-volume mapping rather than a faithful understanding of the decision boundary. We perform experiments on CIFAR-10 test images to understand the geometry of the pre-trained classifier, including class transitions, shielding phenomena, and differential class stability.

Publication
Non-Proceedings Track