TY - JOUR
T1 - GPify
T2 - leveraging the combined strength of normalizing flow and softmax for an out-of-distribution aware confidence score
AU - Kristoffersson Lind, Simon
AU - Triebel, Rudolph
AU - Krueger, Volker
PY - 2026
Y1 - 2026
N2 - In order for any learning-based model to be considered reliable, it needs a well-behaved uncertainty or confidence estimate. Most modern neural networks do produce a confidence estimate in the form of their softmax output probability. However, the softmax probability is invalid for out-of-distribution data. Gaussian processes are known to produce a well-behaved confidence estimate that is aware of out-of-distribution samples. Inspired by Gaussian processes, we propose GPify, which combines the softmax probability with a Normalizing Flow in order to add out-of-distribution awareness to the confidence estimate from a neural network. The resulting confidence from GPify is an uncertainty measure that is interpretable and intuitive, while also being probabilistically sound. We evaluate GPify in a selective classification framework, and conclude that it achieves comparable performance to state-of-the-art methods. In addition, we show that GPify has capabilities for detecting adversarial examples, which is a direct improvement over softmax confidence.
AB - In order for any learning-based model to be considered reliable, it needs a well-behaved uncertainty or confidence estimate. Most modern neural networks do produce a confidence estimate in the form of their softmax output probability. However, the softmax probability is invalid for out-of-distribution data. Gaussian processes are known to produce a well-behaved confidence estimate that is aware of out-of-distribution samples. Inspired by Gaussian processes, we propose GPify, which combines the softmax probability with a Normalizing Flow in order to add out-of-distribution awareness to the confidence estimate from a neural network. The resulting confidence from GPify is an uncertainty measure that is interpretable and intuitive, while also being probabilistically sound. We evaluate GPify in a selective classification framework, and conclude that it achieves comparable performance to state-of-the-art methods. In addition, we show that GPify has capabilities for detecting adversarial examples, which is a direct improvement over softmax confidence.
M3 - Article
SN - 1573-1405
VL - 134
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
M1 - 185
ER -