- Optimization algorithms on Riemannian manifolds
This library concerns solving the problem, $\min_{x\in \mathcal{M}} f(x)$, where $\mathcal{M}$ is a Riemannian manifold, using the state-of-the-art algorithms. The implemented algorithms include Riemannian quasi-Newton methods, Riemannian speed descent methods, Riemannian Newton methods and Riemannian nonlinear conjugate gradient methods with line search/trust region strategy.
The references that are related to the implemented algorithms are:
- P.-A. Absil, C. G. Baker, and K. A. Gallivan. Trust-region methods on Riemannian manifolds. Foundations of Computational Mathematics, 7(3):303-330, 2007.
- P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization algorithms on matrix manifolds. Princeton University Press, Princeton, NJ, 2008.
- W. Huang, P.-A. Absil, and K. A. Gallivan. A Riemannian symmetric rank-one trust region method. Mathematical Programming, February 2014. doi:10.1007/s10107-014-0765-1..
- W. Huang, K. A. Gallivan, and P.-A. Absil. A Broyden Class of Quasi-Newton Methods for Riemannian Optimization. SIAM Journal on Optimization, 25(3):1660-1685, 2015.
- J. Nocedal and S. J. Wright. Numerical optimization. Springer, second edition, 2006.
- W. Ring and B. Wirth. Optimization methods on Riemannian manifolds and their application to shape space. SIAM Journal on Optimization, 22(2):596-627, January 2012.
- Eigenvalue problem
We consider to find the $p$ smallest eigenvalues and corresponding eigenvectors of a given symmetric matrix $B \in \mathbb{R}^{n \times n}$ using the Brokett cost function
\begin{equation*}
f: \mathrm{St}(p, n) \rightarrow \mathbb{R}: X \mapsto f(X) = \mathrm{tr}(X^T B X D),
\end{equation*}
where $\mathrm{St}(p, n) := \{X \in \mathbb{R}^{n \times p} \mid X^T X = I_p\}$, $D = \mathrm{diag}(\mu_1, \mu_2, \ldots, \mu_p)$ and $\mu_1 > \mu_2 > \ldots > \mu_p$.It is known that a global minimizer of $f$ is the eigenvectors of $p$ smallest eigenvalues. If there is a gap between the $p$-th and $p+1$-st smallest eigenvalues, then a global minimizer is isolated. Therefore, the superlinear of Riemannian quasi-Newton/Newton method can be obtained.
The properties of the minimizers of the cost function can be found in e.g., [AMS2008, Section 4.8]. The experimental results of this problem can be found in [Hua2013,HGA2015].
- [AMS2008] P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization algorithms on matrix manifolds. Princeton University Press, Princeton, NJ, 2008.
- [HGA2015] W. Huang, K. A. Gallivan, and P.-A. Absil. A Broyden Class of Quasi-Newton Methods for Riemannian Optimization. SIAM Journal on Optimization, 25(3):1660-1685, 2015.
- [Hua2013] W. Huang. Optimization algorithms on Riemannian manifolds with applications. PhD thesis, Florida State University, Department of Mathematics, 2013
- Joint diagonalization problem
The cost function is
\begin{equation*}
f: \mathrm{St}(p, n) \rightarrow \mathbb{R}: X \mapsto f(X) = - \sum_{i = 1}^N \|\mathrm{diag}(X^T C_i X)\|_2^2,
\end{equation*}
where $\mathrm{St}(p, n) := \{X \in \mathbb{R}^{n \times p} \mid X^T X = I_p\}$, $C_i, i = 1, \ldots, n$ are symmetric matrices, $\mathrm{diag}(M)$ denotes a vector formed by the diagonal entry of matrix $M$ and $\|\cdot\|_2$ denotes the 2-norm. If $C_i, i = 1, \ldots, N$ are covariance matrices, then this cost function attempts to find a few independent components from a large number of covariance matrices.
The joint diagonalization has been used for blind signal processing. Many cost function have been proposed, see, e.g., [AG2006,TCA2009]. The cost function above is given in [TCA2009]. The preformance of some Riemannian algorithms on this cost function can be found in [HAG2014].
- [AG2006] P.-A. Absil, K. A. Gallivan. Joint diagonalization on the oblique manifold for independent component analysis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Volume 5. 2006
- [HAG2014] W. Huang, P.-A. Absil, and K. A. Gallivan. A Riemannian symmetric rank-one trust region method. Mathematical Programming, February 2014. doi:10.1007/s10107-014-0765-1..
- [TCA2009] F. J. Theis, T. P. Cason, and P.-A. Absil. Soft dimension reduction for ICA by joint diagonalization on the Stiefel manifold. In Proceedings of the 8th International Conference on Independent Component Analysis and Signal Separation, 5441:354-361, 2009
- Sparse principal component analysis
In order to obtain a weakly correlated sparse components with nearly orthonormal loadings, the cost function is defined as follows:
\begin{equation*}
f:\mathrm{OB}(r, p) \rightarrow \mathbb{R}: A \mapsto f(A) = \|A\|_1 + \mu \|A^T R A - D^2\|_F^2
\end{equation*}
where $\mathrm{OB}(r, p) = \{X \in \mathbb{R}^{p \times r} | \mathrm{diag}(X^T X) = I_r \}$, $R = X^T X$, $X \in \mathbb{R}^{n \times p}$ is the given data matrix, $D^2$ is diagonal matrix whose diagonal entries are dominant eigenvalues of $R$, $\|A\|_1 = \sum_{i = 1}^p \sum_{j=1}^r|A_{i j}|$, and $A_{i j}$ is $i$-th row $j$-th column entry of the matrix of component loading $A$.
It can be shown that if $\mu$ is large enough such that the $\|A\|_1$ is negligible, the loading matrix $A$ basically is the result from the stand PCA.
If $\mu$ is small, then the cost function tends to produce sparse matrix $A$. Therefore, the parameter $\mu$ is used to tone the trade off between the sparseness and preservation of covariance It is shown in [GHT2015] that when similar sparseness and covariance are obtained, this cost function prefers an orthonormal loading matrix. The details about this cost function and its properties can be found in [GHT2015].
- [GHT2015] M. Genicot, W. Huang, and N. T. Trendafilov. Weakly correlated sparse components with nearly orthonormal loadings. In Proceedings of Geometric Science of Information (GSI2015)
- Karcher mean of symmetric positive definite matrices with affine invariant metric
The cost function is
\begin{equation*}
f:\mathrm{S}_+(n) \rightarrow \mathbb{R}: X \mapsto f(X) = \frac{1}{2K} \sum_{i = 1}^K \mathrm{dist}^2(X, A_i),
\end{equation*}
where $\mathrm{S}_+(n)$ is the set of $n$-by-$n$ symmetric positive definite (SPD) matrices, $A_i, i = 1, \ldots, K$ are SPD matrices and $\mathrm{dist}(X, Y) = \|\log(X^{-1/2} Y X^{-1/2})\|_F$ is the distance using the affine invariant metric $g(\eta_x, \xi_x) = \mathrm{tr}(\eta_x X^{-1} \xi_x X^{-1})$.
Geometric mean of the symmetric positive definite matrices is important and have applications, e.g., medical imaging, elasticity, and machine learning. However, it is not direct to define geometric mean of SPD matrices. ALM list [ALM2004] gives the desired properties of a geometric mean. One of the important definition of the geometric mean is the karcher mean of SPD matrices using affine invariant metric since this definition satisfies the ALM list.
It is known that the cost function is convex and the minimizer is unique. Among the Riemannian methods, the previous results in [JVV2012,RA2011] suggest Riemannian steepest descent. Bini and Iannazzo [BI2013] proposed another approach to efficiently find the minimizer. Our group uses the recent proposed limited-memory Riemannian BFGS method in [HGA2015] yields most efficient and effective method among all the existing state-of-the-art methods. A paper is in preparation.
- [ALM2004] T. Ando, C.-K. Li, and R. Mathias, Geometric means, Linear Algebra Appl., 385 (2004), pp. 305-334.
- [BI2013] D. A. Bini and B. Iannazzo. Computing the karcher mean of symmetric positive definite matrices. Linear Algebra and its Applications, 438(4):1700-1710, 2013.
- [HGA2015] W. Huang, K. A. Gallivan, and P.-A. Absil. A Broyden Class of Quasi-Newton Methods for Riemannian Optimization. SIAM Journal on Optimization, 25(3):1660-1685, 2015.
- [JVV2012] B. Jeuris, R. Vandebril, and B. Vandereycken. A survey and comparison of contemporary algorithms for computing the matrix geometric mean. Electronic Transactions on Numerical Analysis, 39(EPFL-ARTICLE-197637):379-402, 2012.
- [RA2011] Q. Rentmeesters and P -A Absil. Algorithm comprison for karcher mean computation of rotation matrices and diffusion tensors. 19th European Signal Processing Conference (EUSIPCO2011).
- Weighted low-rank approximation problem
The consider cost function is
\begin{equation*}
f:\mathbb{R}_r^{m \times n} \rightarrow \mathbb{R}: X \mapsto f(X) = \|A - X\|_W^2 := \mathrm{vec}(A - X)^T W \mathrm{vec}(A - X),
\end{equation*}
where $\mathrm{vec}(M)$ denotes the vector by stacking the columns of $M$, $\mathbb{R}_r^{m \times n}$ is the manifold of all $m$-by-$n$ matrices with rank $r$, $W \in \mathbb{R}^{nm \times nm}$ is a symmetric positive definite matrix and $A \in \mathbb{R}^{m \times n}$ is given.
The application of weighted low-rank problem can be found in, e.g., [LPW1997]. The Riemannian approach has been used in [ZHGVA2015] and it is shown therein that the Riemannian approach is faster than the previous approaches [LPW1997,BM2006]. Note that [ZHGVA2015] uses an adaptive rank method and the implemented problem in ROPTLIB is only for a fixed rank problem. The adaptive rank solver is under development.
- [LPW1997] W.-S. Lu, S.-C. Pei, and P.-H. Wang. Weighted low-rank approximation of general complex matrices and its application in the design of 2-d digital filters. IEEE Transactions on Circuits and Systems I, 44:650-655, 1997.
- [BM2006] I. Brace and J. H. Manton. An improved BFGS-on-manifold algorithm for computing low-rank weighted approximations. In Proceedings of 17th International Symposium on Mathematical Theory of Networks and Systems, pages 1735-1738, 2006.
- [ZHGVA2015] G. Zhou, W. Huang, K. A. Gallivan, P. Van Dooren, P.-A. Absil. "Rank-Constrained Optimization: A Riemannian Manifold Approach", In Proceeding of European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2015).
- Registration of curves in elastic shape analysis
The cost function is
\begin{equation*}
L: \mathcal{O}_n \times \mathbb{R} \times \mathbb{L}^2([0, 1], \mathbb{R}) \rightarrow \mathbb{R}: (O, m, l) \mapsto L(O, m, l) = \int_0^{1} \left\|O q_1(t) - q_2\left(\int_0^t l^2(s) d s + m \mod 1\right) l(t)\right\|_2^2 d t + \omega \int_0^{1} \left(l^2(t)+\frac{1}{l^2(t)}\right) \sqrt{1+l^4(t)} dt,
\end{equation*}
where $\omega$ is a positive constant, and $q_1$ and $q_2$ are in $\mathbb{L}^2([0, 1], \mathbb{R}^n)$. In the implementation of ROPTLIB, a continuous function is represented by $N$ discrete points, and the integral is computed by composited trapezoidal rule.
In elastic shape analysis, a representation of a shape is invariant to translation, scaling, rotation and reparameterization, and important problems such as computing the distance and geodesic between two curves, the mean of a set of curves, and other statistical analyses require finding a best rotation and reparameterization between two curves. A global minimizer of the above cost function defines the best rotation and reparemetrization between two closed curves under the square root velocity function framework given in [SKJJ2011]. The detailed description of the motivation of the above cost function and the performance of Riemannian approach is given in [HGSA2015].
- [HGSA2015] W. Huang, K. A. Gallivan, A. Srivastava, P.-A. Absil. "Riemannian Optimization for Registration of Curves in Elastic Shape Analysis", Journal of Mathematical Imaging and Vision, DOI: 10.1007/s10851-015-0606-8. 2015
- [SKJJ2011] A. Srivastava, E. Klassen, S. H. Joshi, and I. H. Jermyn. Shape analysis of elastic curves in Euclidean spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(7):1415–1428, September 2011.
- Phase retrieval problem
The cost function is
\begin{equation*}
f_p:\mathbb{C}_*^{n \times p} / \mathcal{O}_p \rightarrow \mathbb{R}: [Y] \mapsto f_p([Y]) = \frac{\|b - \mathrm{diag}(Z Y Y^* Z^*)\|_2^2}{\|b\|_2^2} + \kappa \mathrm{tr}(X),
\end{equation*}
where $\mathbb{C}_*^{n \times p}$ is the set of $n$-by-$p$ complex matrices with full column rank, $\mathcal{O}_p$ is the set of unitary matrices, $\mathbb{C}_*^{n \times p} / \mathcal{O}_p$ is the quotient manifold, i.e., $\mathbb{C}_*^{n \times p} / \mathcal{O}_p = \{[Y] \mid Y \in \mathbb{C}_*^{n \times p}\}$, and $[Y] = \{Y O \mid O \in \mathcal{O}_p\}$.
Recovering a signal given the modulus of its Fourier or wavelet transform is an important task in the phase retrieval problem. It is a key problem for many important applications, e.g., X-ray crystallography imaging, diffraction imaging, optics and microscopy. The recent proposed framework, PhaseLift [CESV2013], uses multiple structured illuminations combined with convex programming to recover the phase exactly. The above cost function is motivated by Riemannian approach and focus on the Fourier transform. It is shown in [HGZ] that combining the Riemannian approach with rank reduce strategy yields a signifantly faster algorithm than the existing approach in [CESV2013].
- [HGZ] W. Huang, K. A. Gallivan, X. Zhang. "Solving PhaseLift by low-rank Riemannian optimization methods for complex semidefinite constraints", in preparation
- [CESV2013] E. J. Candes, Y. C. Eldar, T. Strohmer, and V. Voroninski. Phase retrieval via matrix completion. SIAM Journal on Imaging Sciences, 6(1):199-225, 2013.
|