# 3.4.5 Iteration strategy and convergence

Author(s): Alex Bombrun, David Hobbs

AGIS is a hybrid iterative solver with a ‘simple iteration’ (SI) scheme that was the starting point for a long development towards a fully functional scheme with much improved convergence properties. The main stages in this development were the ‘accelerated simple iteration’ (ASI), the conjugate gradients (CG), and finally the fully flexible ‘hybrid scheme’ (SI–CG) to be used in the final implementation of AGIS. As much of this development has at most historical interest, only a brief outline is given here.

Already in the very early implementation of the simple iteration scheme it was observed that convergence was slower than (naively) expected, and that after some iterations, the updates always seemed to go in the same direction, forming a geometrically (exponentially) decreasing series. This behaviour was very easily understood: the persistent pattern of updates is roughly proportional to the eigenvector of the largest eigenvalue of the iteration matrix, and the (nearly constant) ratio of the sizes of successive updates is the corresponding eigenvalue. From this realization it was natural to test an acceleration method based on a Richardson-type extrapolation of the updates. The idea is simply that if the updates in two successive iterations are roughly proportional to each other, $\boldsymbol{d}^{(k+1)}\simeq\lambda\boldsymbol{d}^{(k)}$, with $|\lambda|<1$, then we can infer that the next update is again a factor $\lambda$ smaller than $\boldsymbol{d}^{(k+1)}$, and so on. The sum of all the updates after iteration $k$ can therefore be estimated as $\boldsymbol{d}^{(k+1)}+\lambda\boldsymbol{d}^{(k+1)}+\lambda^{2}\boldsymbol{d}% ^{(k+1)}+\dots=(1-\lambda)^{-1}\boldsymbol{d}^{(k+1)}$. Thus, in iteration $k+1$ we apply an acceleration factor $1/(1-\lambda)$ based on the current estimate of the ratio $\lambda$. This accelerated simple iteration (ASI) scheme is seen to be a variant of the well-known successive over-relaxation method (Axelsson 1996). The factor $\lambda$ is estimated by statistical analysis of the parallax updates for a small fraction of the sources; the parallax updates are used for this analysis, since they are unaffected by a possible change in the frame orientation between successive iterations. With this simple device, the number of iterations for full convergence was reduced roughly by a factor 2.

Both the simple iteration and the accelerated simple iteration belongs to a much more general class of solution methods known as Krylov subspace approximations. The sequence of updates $\boldsymbol{d}^{(k)}$, $k=0\dots K-1$ generated by the first $K$ simple iterations constitute the basis for the $K$-dimensional subspace of the solution space, known as the Krylov subspace for the given matrix and right-hand side (e.g., Greenbaum (1997); van der Vorst (2003)). Krylov methods compute approximations that, in the $k$th iteration, belongs to the $k$-dimensional Krylov subspace. But whereas the simple and accelerated iteration schemes, in the $k$th iteration, use updates that are just proportional to the $k$th basis vector, more efficient algorithms generate approximations that are (in some sense) optimal linear combinations of all $k$ basis vectors. Conjugate gradients (CG) is one of the best-known such methods, and possibly the most efficient one for general symmetric positive-definite matrices (e.g., Axelsson (1996); Björck (1996); van der Vorst (2003)). Its implementation within the AGIS framework is more complicated, but has been considered in detail by Bombrun et al. (2012). As it provides significant advantages over the SI and ASI schemes in terms of convergence speed, this algorithm has been chosen as the baseline method for the astrometric core solution of Gaia (see below however). From practical experience, we have found that CG is roughly a factor 2 faster than ASI, or a factor 4 faster than the SI scheme. Like SI, the CG algorithm uses a preconditioner and can be formulated in terms of the S, A, C and G blocks, so the subsequent description of these blocks remains valid. In the terminology of Bombrun et al. (2012) the process of solving the preconditioner system $\boldsymbol{K}\boldsymbol{d}=\boldsymbol{b}$ is the kernel operation common to all these solution methods, which only differ in how the updates are applied according to the various iteration schemes. The main difference compared with the simple iteration scheme is that the updates suggested by the preconditioner are modified in view of the previous updates to optimize the convergence in a certain sense (for details, see Bombrun et al. (2012)).

The CG algorithm assumes that the normal matrix is constant in the course of the iterations. This is not strictly true if the observation weights are allowed to change as functions of the residuals, as will be required for efficient outlier elimination. Using the CG algorithm together with the weight-adjustment scheme described below could therefore lead to instabilities, i.e., a reduced convergence rate or even non-convergence. On the other hand, the SI scheme is extremely stable with respect to all such modifications in the course of the iterations, as can be expected from the interpretation of the SI scheme as the successive and independent application of the different solution blocks. The finally adopted algorithm is therefore a hybrid scheme combining SI (or ASI) and CG, where SI is used initially, until the weights have settled, after which CG is turned on. A temporary switch back to SI, with an optional re-adjustment of the weights, may be employed after a certain number of CG iterations; this could avoid some problems due to the accumulation of numerical rounding errors in CG.

The convergence can be controlled using a web based monitor looking at the distribution of the residuals, at the distribution of the excess noise and at the distribution of the updates.