MC4 Metrics of Success in Signal Processing and Machine Learning


Consider algorithms that improve signals by removing noise, or machine learning applications that detect moving objects in videos. How can we evaluate the success of such methods? Can we reliably compare the quality of two algorithms that were developed to perform the same task? What does it mean if an algorithm for object recognition correctly classifies 98 % of the images in a test set but consistently mistakes guns for tomatoes?

Measuring the success of an algorithm is crucial for both the development and the application of (learning-based) tools for signal processing. In the first part of this course, we will review fundamental mathematical and statistical concepts that will allow us to a) measure how closely the output of an algorithm matches the desired outcome, b) evaluate the overall performance of a method with respect to a given set of test inputs, and c) determine whether an algorithm significantly outperforms a competing method. We will furthermore consider several pathological cases to illustrate and discuss the pitfalls and paradoxes that are often associated with statistical measures of success.

The second part of the course will contain a detailed discussion of the properties of a wide range of measures and methods that are often used to determine and quantify success in signal processing. We will consider simple but popular metrics such as the mean squared error (MSE), the peak signal to noise ratio (PSNR), measures of correlation like Pearson or Spearman correlation coefficients (SROCC), as well as significance tests. We will furthermore discuss measures that are inspired by the properties of human perception such as the structural similarity index (SSIM).

In the case of machine learning, metrics of success are not only important to evaluate the final performance of a method but also play a crucial role in the learning process itself. Typically, a so-called objective function is used to evaluate how closely the outcome of an algorithm fits the considered training data. In the learning stage, the free parameters of a machine learning architecture are then optimized to yield the best possible fit in terms of the objective function. In the final part of this course, we will discuss the special mathematical properties of different types of objective functions and how choosing a certain objective function can influence the final outcome.


To obtain an understanding of the specific properties of different measures of success, and how they can best be applied in the (learning-based) development of signal processing algorithms, respectively to decide whether a given algorithm can be reliably used to perform a certain task.


Significance tests and statistical measures of similarity and correlation
- Cowan, G. (1998). Statistical data analysis. Oxford university press.
- Blogpost on correlation vs. causation.
- Blogpost If Correlation Doesn’t Imply Causation, Then What Does?.
- Website on spurious correlations.
Perceptual measures of image similarity
- Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600-612.
- Larson, E. C., & Chandler, D. M. (2010). Most apparent distortion: full-reference image quality assessment and the role of strategy. Journal of Electronic Imaging, 19(1), 011006.
- Berardino, A., Laparra, V., Ballé, J., & Simoncelli, E. (2017). Eigen-distortions of hierarchical representations. In Advances in neural information processing systems (pp. 3530-3539).
- Reisenhofer, R., Bosse, S., Kutyniok, G., & Wiegand, T. (2018). A Haar wavelet-based perceptual similarity index for image quality assessment. Signal Processing: Image Communication, 61, 33-43.
ITU guidelines for evaluating quality prediction models
- International Telecommunication Union (2012). ITU-T P.1401, methods, metrics and procedures for statistical evaluation, qualification and comparison of objective quality prediction models.
Objective functions in machine learning
- Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning (Vol. 1, No. 10). New York, NY, USA: Springer series in statistics.
- Sra, S., Nowozin, S., & Wright, S. J. (Eds.). (2012). Optimization for machine learning. Mit Press.
- Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge university press.
- Blogpost on metrics to evaluate machine learning algorithms in python.

Course location


Course requirements


Instructor information.

Rafael Reisenhofer

University of Bremen


Rafael Reisenhofer obtained a Ph.D. in Mathematics from the University of Bremen. He is currently working as a FWF-funded postdoctoral researcher at the University of Vienna. There, he mathematically investigates the relationship between depth and the discriminability properties of deep learning architectures. His main research goal is to utilize a profound understanding of mathematical tools and concepts to gain a deeper insight into the mysteries of human cognition.