The astrometric principles for Gaia were outlined already in the Hipparcos Catalogue (ESA 1997, Vol. 3, Ch. 23) where, based on the
accumulated experience of the Hipparcos mission and the general principle of
a global astrometric data analysis was succinctly formulated as the minimization problems (see Lindegren et al. (2012)):

$$\underset{\bm{s},\bm{n}}{\mathrm{min}}{\parallel {\bm{f}}^{\text{obs}}{\bm{f}}^{\text{calc}}(\bm{s},\bm{n})\parallel}_{\mathcal{M}}.$$ 

(3.89) 
Here $\bm{s}$ is the vector of unknowns (parameters) describing the
barycentric motions of the ensemble of sources used in the astrometric solution,
and $\bm{n}$ is a vector of ‘nuisance parameters’ describing the instrument
and other incidental factors which are not of
direct interest for the astronomical problem but are nevertheless required
for realistic modelling of the data.
The observations are represented by the vector ${\bm{f}}^{\text{obs}}$ which
could for example contain the measured detector coordinates
of all the stellar images at specific times. ${\bm{f}}^{\text{calc}}(\bm{s},\bm{n})$
is the observation model, e.g., the expected detector coordinates calculated as functions of
the astrometric and nuisance parameters. The norm is calculated in a metric
$\mathcal{M}$ defined by the statistics of the data; in practise the minimization
will correspond to a weighted leastsquares solution with due consideration
of robustness issues. The statistical weight ${W}_{l}={w}_{l}/({\sigma}_{l}^{2}+{\u03f5}_{l}^{2})$ of
individual observations $l$ is composed of a contribution from the formal standard
uncertainty of the observation ${\sigma}_{l}$ and the excess noise ${\u03f5}_{l}$ represents
modelling errors and should ideally be zero. However, it is unavoidable that some
sources do not behave exactly according to the adopted astrometric model
(Section 3.3.3), or that the attitude (Section 3.3.5)
sometimes cannot be represented by the model used to sufficient accuracy.
The excess noise term ${\u03f5}_{l}$ is introduced to allow these cases to be handled in a
reasonable way, i.e., by effectively reducing the statistical weight of the
observations affected. It should be noted that these modelling errors are
assumed to affect all the observations of a particular star, or
all the observations in a given time interval. (By contrast, the
downweighting factor ${w}_{l}$ is intended to take care of isolated outliers,
for example due to a cosmicray hit in one of the CCD samples.) This
is reflected in the way the excess noise is modelled as the sum of two
components,

$${\u03f5}_{l}^{2}={\u03f5}_{i}^{2}+{\u03f5}_{a}^{2}({t}_{l}),$$ 

(3.90) 
where ${\u03f5}_{i}$ is the excess noise associated with source $i$ (if
$l\in i$, that is, $l$ is an observation of source $i$), and ${\u03f5}_{a}(t)$
is the excess attitude noise, being a function of time. For a ‘good’ primary
source, we should have ${\u03f5}_{i}=0$, and for a ‘good’ stretch of attitude data
we may have ${\u03f5}_{a}(t)=0$. Calibration modelling errors are not
explicitly introduced in Equation 3.90, but could be regarded as
a more or less constant part of the excess attitude noise. The estimation of
${\u03f5}_{i}$ is described in Section 3.3.3, and the estimation of
${\u03f5}_{a}(t)$ in Section 3.3.5.