This doctorate thesis focuses on sparse regression, a statistical modeling tool for selecting valuable predictors in underdetermined linear models. By imposing different constraints on the structure of the variable vector in the regression problem, one obtains estimates which have sparse supports, i.e., where only a few of the elements in the response variable have non-zero values. The thesis collects six papers which, to a varying extent, deals with the applications, implementations, modifications, translations, and other analysis of such problems. Sparse regression is often used to approximate additive models with intricate, non-linear, non-smooth or otherwise problematic functions, by creating an underdetermined model consisting of candidate values for these functions, and linear response variables which selects among the candidates. Sparse regression is therefore a widely used tool in applications such as, e.g., image processing, audio processing, seismological and biomedical modeling, but is also frequently used for data mining applications such as, e.g., social network analytics, recommender systems, and other behavioral applications. Sparse regression is a subgroup of regularized regression problems, where a fitting term, often the sum of squared model residuals, is accompanied by a regularization term, which grows as the fit term shrinks, thereby trading off model fit for a sought sparsity pattern. Typically, the regression problems are formulated as convex optimization programs, a discipline in optimization where first-order conditions are sufficient for optimality, a local optima is also the global optima, and where numerical methods are abundant, approachable, and often very efficient. The main focus of this thesis is structured sparsity; where the linear predictors are clustered into groups, and sparsity is assumed to be correspondingly group-wise in the response variable.
The first three papers in the thesis, A-C, concerns group-sparse regression for temporal identification and spatial localization, of different features in audio signal processing. In Paper A, we derive a model for audio signals recorded on an array of microphones, arbitrarily placed in a three-dimensional space. In a two-step group-sparse modeling procedure, we first identify and separate the recorded audio sources, and then localize their origins in space. In Paper B, we examine the multi-pitch model for tonal audio signals, such as, e.g., musical tones, tonal speech, or mechanical sounds from combustion engines. It typically models the signal-of-interest using a group of spectral lines, located at some integer multiple of a fundamental frequency. In this paper, we replace the regularizers used in previous works by a group-wise total variation function, promoting a smooth spectral envelope. The proposed combination of regularizers thereby avoids the common suboctave error, where the fundamental frequency is incorrectly classified using half of the fundamental frequency. In Paper C, we analyze the performance of group-sparse regression for classification by chroma, also known as pitch class, e.g., the musical note C, independent of the octave.
The last three papers, D-F, are less application-specific than the first three; attempting to develop the methodology of sparse regression more independently of the application. Specifically, these papers look at model order selection in group-sparse regression, which is implicitly controlled by choosing a hyperparameter, prioritizing between the regularizer and the fitting term in the optimization problem. In Papers D and E, we examine a metric from array processing, termed the covariance fitting criterion, which is seemingly hyperparameter-free, and has been shown to yield sparse estimates for underdetermined linear systems. In the paper, we propose a generalization of the covariance fitting criterion for group-sparsity, and show how it relates to the group-sparse regression problem. In Paper F, we derive a novel method for hyperparameter-selection in sparse and group-sparse regression problems. By analyzing how the noise propagates into the parameter estimates, and the corresponding decision rules for sparsity, we propose selecting it as a quantile from the distribution of the maximum noise component, which we sample from using the Monte Carlo method.