#
3.4.5 Iteration strategy and convergence

Author(s): Alex Bombrun, David Hobbs

AGIS is a hybrid iterative solver with a ‘simple iteration’ (SI) scheme that was the
starting point for a long development towards a fully functional scheme with much
improved convergence properties. The main stages in this development were the
‘accelerated simple iteration’ (ASI), the conjugate gradients (CG), and finally
the fully flexible ‘hybrid scheme’ (SI–CG) to be used in the final implementation
of AGIS. As much of this development has at most historical interest, only a brief
outline is given here.

Already in the very early implementation of the simple iteration scheme it was observed
that convergence was slower than (naively) expected, and that after some iterations,
the updates always seemed to go in the same direction, forming a geometrically
(exponentially) decreasing series. This behaviour was very easily understood:
the persistent pattern of updates is roughly proportional to the eigenvector of the
largest eigenvalue of the iteration matrix, and the (nearly constant) ratio of the sizes
of successive updates is the corresponding eigenvalue. From this realization it was natural
to test an acceleration method based on a Richardson-type extrapolation of the
updates. The idea is simply that if the updates in two successive iterations are
roughly proportional to each other, ${\bm{d}}^{(k+1)}\simeq \lambda {\bm{d}}^{(k)}$,
with $$, then we can infer that the next update is again a factor
$\lambda $ smaller than ${\bm{d}}^{(k+1)}$, and so on. The sum of all the
updates after iteration $k$ can therefore be estimated as ${\bm{d}}^{(k+1)}+\lambda {\bm{d}}^{(k+1)}+{\lambda}^{2}{\bm{d}}^{(k+1)}+\mathrm{\dots}={(1-\lambda )}^{-1}{\bm{d}}^{(k+1)}$. Thus, in iteration $k+1$ we apply an
acceleration factor $1/(1-\lambda )$ based on the current estimate of the ratio
$\lambda $. This accelerated simple iteration (ASI) scheme is seen
to be a variant of the well-known successive over-relaxation method (Axelsson 1996).
The factor $\lambda $ is estimated by statistical analysis of the parallax updates
for a small fraction of the sources; the parallax updates are used for this analysis,
since they are unaffected by a possible change in the frame orientation between
successive iterations. With this simple device, the number of iterations for full
convergence was reduced roughly by a factor 2.

Both the simple iteration and the accelerated simple iteration belongs to a much
more general class of solution methods known as Krylov subspace approximations.
The sequence of updates ${\bm{d}}^{(k)}$, $k=0\mathrm{\dots}K-1$ generated by the first
$K$ simple iterations constitute the basis for the $K$-dimensional subspace of the
solution space, known as the Krylov subspace for the given matrix and
right-hand side (e.g., Greenbaum (1997); van der Vorst (2003)). Krylov methods
compute approximations that, in the $k$th iteration, belongs to the $k$-dimensional
Krylov subspace. But whereas the simple and accelerated iteration schemes, in the
$k$th iteration, use updates that are just proportional to the $k$th basis vector,
more efficient algorithms generate approximations that are (in some sense) optimal
linear combinations of all $k$ basis vectors. Conjugate gradients (CG) is one of the
best-known such methods, and possibly the most efficient one for general
symmetric positive-definite matrices
(e.g., Axelsson (1996); Björck (1996); van der Vorst (2003)). Its
implementation within the AGIS framework is more complicated, but
has been considered in detail by Bombrun et al. (2012). As it
provides significant advantages over the SI and ASI schemes in terms of
convergence speed, this algorithm has been
chosen as the baseline method for the astrometric core solution of Gaia (see below however). From practical experience, we have found that CG
is roughly a factor 2 faster than ASI, or a factor
4 faster than the SI scheme. Like SI, the CG algorithm uses a preconditioner
and can be formulated in terms of the S, A, C and G blocks, so the subsequent
description of these blocks remains valid. In the terminology of Bombrun et al. (2012)
the process of solving the preconditioner system $\bm{K}\bm{d}=\bm{b}$
is the *kernel* operation common to all these solution methods, which
only differ in how the updates are applied according to the various
*iteration schemes*.
The main difference compared with the simple iteration
scheme is that the updates suggested by the preconditioner are modified
in view of the previous updates to optimize the convergence in a certain
sense (for details, see Bombrun et al. (2012)).

The CG algorithm assumes that the normal matrix
is constant in the course of the iterations. This is not strictly true if
the observation weights are allowed to change as functions of the
residuals, as will be required for efficient outlier elimination. Using
the CG algorithm together with the weight-adjustment scheme described below
could therefore lead to instabilities, i.e., a reduced convergence rate or
even non-convergence. On the other hand, the SI scheme is extremely stable with
respect to all such modifications in the course of the iterations, as can be
expected from the interpretation of the SI scheme as the successive and independent
application of the different solution blocks. The finally adopted algorithm
is therefore a *hybrid scheme* combining SI (or ASI) and CG, where
SI is used initially, until the weights have settled, after which CG is turned on.
A temporary switch back to SI, with an optional re-adjustment of the
weights, may be employed after a certain number of CG iterations; this
could avoid some problems due to the accumulation of numerical rounding
errors in CG.

The convergence can be controlled using a web based monitor looking at the distribution of the
residuals, at the distribution of the excess noise and at the distribution of the updates.