Author(s): Craig Stephenson, David Hobbs

The Gaia mission aims to achieve astrometric accuracies at the microarcsecond ( $\mu $as) level in stellar positions. For this, we require an accurate determination of the spacecraft’s attitude, which includes the effects of near-instantaneous disturbances. The attitude specifies the orientation of the spacecraft with respect to the celestial reference frame, that is, the rotation between the CoMRS, defined by $\bm{C}=[\bm{X}\bm{Y}\bm{Z}]$, and the SRS, defined by $\bm{S}=[\bm{x}\bm{y}\bm{z}]$, as a function of time (see Section 3.1.3). The spacecraft is controlled to follow a scanning law, which for most of the mission will be the Nominal Scanning Law (NSL), designed to provide good coverage of the entire celestial sphere whilst satisfying a number of other requirements (de Bruijne et al. 2010). The model used to accurately determine the attitude of Gaia is briefly described in Lindegren et al. (2012, Section 3.3).

The Gaia attitude model makes use of attitude quaternions (see Lindegren et al. 2012, Appendix A). Quaternions were discovered by Hamilton (1843) and form a four-dimensional real division algebra. Unlike the other two finite-dimensional associative real division algebras, namely the real and complex number systems, quaternion multiplication is non-commutative. This property turns out to be useful as it enables quaternions (those of unit norm) to be used to represent other non-commutative things, such as rotations in three (and four) dimensions. For this reason, their use is now ubiquitous in spacecraft attitude determination and control and in computer graphics.

A quaternion may be written as $\mathbf{q}={q}_{x}\mathbf{i}+{q}_{y}\mathbf{j}+{q}_{z}\mathbf{k}+{q}_{w}\mathbf{\hspace{0.17em}1}$, where the basis vectors satisfy ${\mathrm{\U0001d7cf}}^{2}=1$ and ${\mathbf{i}}^{2}={\mathbf{j}}^{2}={\mathbf{k}}^{2}=\mathbf{i}\mathbf{j}\mathbf{k}=-1$, the latter being the equations famously inscribed by Hamilton on Brougham (Broom) Bridge, in Dublin. The restriction to quaternions of unit magnitude (also known as versors) is achieved by imposing the constraint ${q}_{x}^{2}+{q}_{y}^{2}+{q}_{z}^{2}+{q}_{w}^{2}=1$. The coordinates of any arbitrary vector, $\bm{v}$, expressed with respect to the CoMRS, ${\U0001d5a2}^{\prime}\bm{v}$, may be transformed to those of the SRS, ${\U0001d5b2}^{\prime}\bm{v}$, simply by representing the former as a purely imaginary quaternion and conjugating it by the unit quaternion $\mathbf{q}$. That is, in the notation of (Lindegren et al. 2012, Appendix A.3),

$$\{{\U0001d5b2}^{\prime}\bm{v},0\}={\mathbf{q}}^{-1}\{{\U0001d5a2}^{\prime}\bm{v},0\}\mathbf{q},$$ | (3.105) |

where ${\mathbf{q}}^{-1}$ is the inverse (or conjugate) of $\mathbf{q}$. Similarly, the inverse operation is $\{{\U0001d5a2}^{\prime}\bm{v},0\}=\mathbf{q}\{{\U0001d5b2}^{\prime}\bm{v},0\}{\mathbf{q}}^{-1}$.

To model the variation of the attitude quaternion with time, use is made of cubic B-splines. That is, the attitude quaternion, expressed as the 4-tuple, $({q}_{x}(t),{q}_{y}(t),{q}_{z}(t),{q}_{w}(t))$, is written as:

$$({q}_{x}(t),{q}_{y}(t),{q}_{z}(t),{q}_{w}(t))=\u27e8{\sum}_{n=\mathrm{\ell}-3}^{\mathrm{\ell}}{\mathbf{a}}_{n}{B}_{n}(t)\u27e9,$$ | (3.106) |

where ${\mathbf{a}}_{n}\equiv ({a}_{n}^{x},{a}_{n}^{y},{a}_{n}^{z},{a}_{n}^{w})$, $n=0,\mathrm{\dots},N-1$, are the coefficients of cubic
B-splines, ${B}_{n}(t)$, defined on the knot sequence
${\left\{{\tau}_{k}\right\}}_{k=0}^{N+3}$, $\u27e8\u27e9$ is the normalization operator and $\mathrm{\ell}$ is the *left index* of $t$, i.e. $$. The attitude parameter ‘vector’ $\bm{a}$ is the $4N$-tuple $({a}_{0}^{x},{a}_{0}^{y},{a}_{0}^{z},{a}_{0}^{w},\mathrm{\dots},{a}_{N-1}^{x},{a}_{N-1}^{y},{a}_{N-1}^{z},{a}_{N-1}^{w})$ formed from the elements of the 4-tuples ${\mathbf{a}}_{n}$.

The use of cubic splines means that each component of the quaternion on the right-hand side of Equation 3.106 is (prior to normalization) a ${C}^{2}$ function, i.e. its first three time derivatives exist for all values of the time but only the first two are guaranteed to be continuous at the knots. The software also admits the use of higher-order splines. The latter would provide improved smoothness, but at the expense of making the fitting less local and thus more prone to undesirable oscillatory behaviour. The flexibility of a spline is, in principle, governed by its number of degrees of freedom (which, in practice, amounts to the knot frequency). AGIS nominally uses a 30 s knot spacing for the primary attitude (see Section 3.3.5).

The excess attitude noise ${\u03f5}_{\text{a}}(t)$ is a function of the time $t$ and is used to account for errors in the modelling of the spacecraft attitude (Lindegren et al. 2012, Sections 3.6, 5.2.5 and Appendix D.4). This includes all the high-frequency ($>0.01$ Hz) perturbations of the attitude, which cannot adequately be represented by the cubic B-splines with the nominal knot spacing. Such high-frequency perturbations arise from a number of different sources, for example thruster noise, the stochastic nature of the control system, fuel sloshing, micro-meteoroid impacts and (unmodelled) micro-clanks.

Micro-clanks are discontinuities in the attitude corresponding to small, almost-instantaneous, mechanical adjustments of the spacecraft structure. Unlike hits from micro-meteoroids (micro-hits), a micro-clank is not accompanied by a net change in the angular momentum of the spacecraft and so the two types of events may easily be distinguished from measurements of the spacecraft angular rates. As micro-clanks had been seen on Hipparcos, their presence was expected and they were explicitly included in the pre-launch Gaia dynamical attitude model (Risquez et al. 2012, Section 4.5). In principle, they could be handled in the manner that has been proposed for micro-hits, that is, by the introduction of additional knots into the B-spline representation of the attitude (Lindegren et al. 2012, Appendix B.3). However, soon after launch it became apparent that the number of micro-clanks on Gaia was an order of magnitude larger than had been expected. This, together with the fact that where there are high-frequency changes in the attitude (such as in the vicinity of micro-clanks) the difference between the effective attitudes associated with the various gates becomes important (leading to problems with the optimal positioning of the additional knots), motivated a radically different approach.

The On-Ground Attitude-3 (OGA3), as determined using AGIS, is dependent on the gates used to observe the astrometric sources. Each observation of a source reflects not the true, instantaneous attitude of the spacecraft, but rather its ‘mean attitude’ over a short interval whose duration depends upon the gate in question. (The idea of a mean attitude makes sense provided the interval under consideration is sufficiently short; a rigorous definition can be given.) This so-called effective attitude is therefore gate-dependent (see Bastian and Biermann 2005) and the OGA3 attitude in Gaia DR1 was correspondingly a hybrid-estimate of the various effective attitudes. However, as the vast majority of source detections are ungated, this effect is small and the OGA3 attitude in Gaia DR1 can to a good approximation be taken to correspond to an estimate of the astrometric attitude (the effective attitude for ungated observations).

When high-frequency components of the attitude, such as micro-clanks, are modelled, the difference between the various effective attitudes becomes more important and must be modelled. In Gaia DR2, the dependency of the OGA3 attitude estimate(s) on the gate has been made explicit. For each gate, $g$, there is now an OGA3-estimate of the effective attitude, ${\mathbf{q}}_{\text{e}}(t,g)$, at time $t$. This is split into two parts: the *primary attitude*, ${\mathbf{q}}_{\text{p}}(t)$, and the *corrective attitude* ${\mathbf{q}}_{\text{c}}(t,g)$:

$${\mathbf{q}}_{\text{e}}(t,g)={\mathbf{q}}_{\text{p}}(t){\mathbf{q}}_{\text{c}}(t,g).$$ | (3.107) |

The primary attitude ${\mathbf{q}}_{\text{p}}(t)$ is, as the name suggests, the primary contribution to the OGA3 attitude(s). It is gate-independent and is represented by means of cubic B-splines as described above. The corrective attitude, ${\mathbf{q}}_{\text{c}}(t,g)$, is a small-angle (no more than a few arcseconds) rotation applied to the primary attitude to convert it into the appropriate effective attitude. It is designed to correct for all known gate-dependent, high-frequency effects, such as those arising from micro-clanks. The split of the effective attitude into the primary and the corrective attitude is clearly not unique. However, once a corrective attitude has been constructed, the iterative procedure described in (Lindegren et al. 2012, Section 5.2) may be used to estimate (the coefficients of the B-spline representation of) the primary attitude.

The corrective attitude corresponds to the rotation of the effective attitude with respect to the primary attitude. Since the angular rotation is kept small, the rotation can be treated as though it were a vector and expressed in component form with respect to the three orthogonal SRS-frame axes. Let these components be ${\theta}_{g}^{x}$, ${\theta}_{g}^{y}$ and ${\theta}_{g}^{z}$. Then, the quaternion for the corrective attitude may be written as

$${\mathbf{q}}_{\text{c}}(t,g)=\frac{{\theta}_{g}^{x}(t)}{2}\mathbf{i}+\frac{{\theta}_{g}^{y}(t)}{2}\mathbf{j}+\frac{{\theta}_{g}^{z}(t)}{2}\mathbf{k}+\mathrm{\U0001d7cf}.$$ | (3.108) |

In Gaia DR2 only a single component, the $z$-axis component, of the corrective attitude is computed. That is, in Equation 3.108, we assume ${\theta}_{g}^{x}$ and ${\theta}_{g}^{y}$ are identically zero. The $z$-axis component is both the most important component and also that which is most easily computed, along-scan measurements being far more numerous than across-scan measurements.

The apparent definition given above of the corrective attitude is really a definition of the primary attitude. The corrective attitude is simply constructed by summing the response of the effective attitude to each of the detected micro-clanks and adding to this the medium-frequency attitude fluctuations. The latter are assumed to be gate-independent and are computed by fitting a cubic spline to the integrated residual along-scan rate measurements. The construction of the effective attitude is done only once in the AGIS-preprocessor. The coefficients of the cubic B-splines which define the primary attitude are then estimated iteratively within AGIS proper.

A micro-clank is modelled as an instantaneous rotation of the spacecraft. Since the angle through which the spacecraft rotates is extremely small (at most several $\mu $as), the rotation can, like that of the effective attitude, be expressed in terms of the components, ${c}_{j}^{x}$, ${c}_{j}^{y}$ and ${c}_{j}^{z}$, with respect to the SRS-frame. Suppose we detect ${n}_{\text{c}}$ clanks and let ${t}_{j}$ and ${\widehat{c}}_{j}^{z}$, $j=1,\mathrm{\dots},{n}_{\text{c}}$, be their estimated times and $z$-components. Then the $z$-component of the effective attitude is

$${\theta}_{g}^{z}(t)=\sum _{j=1}^{{n}_{\text{c}}}{\widehat{c}}_{j}^{z}{J}_{g}(t-{t}_{j})+{K}^{z}(t),$$ | (3.109) |

where

$${J}_{g}(t)=\frac{1}{{\tau}_{g}}\left[R\left(t+\frac{{\tau}_{g}}{2}\right)-R\left(t-\frac{{\tau}_{g}}{2}\right)\right],$$ | (3.110) |

is the response of the effective attitude about a given axis to a unit clank at $t=0$ about the same axis, $R(t)=\mathrm{max}\{t,0\}$ is the ramp function and ${\tau}_{g}$ is the effective width for gate $g$. (The value of the effective width for ungated observations is ${\tau}_{0}\approx 4.42$ s.) ${K}^{z}(t)$ is the $z$-component of the medium-frequency fluctuation of the attitude from its nominal value and it is evaluated from its representation as a cubic B-spline (see below). An additional spline with a large knot interval could be added to the right-hand side of Equation 3.109 to keep the magnitude of ${\theta}_{g}^{z}(t)$ small, but so far this has not been found to be necessary.

The detection and estimation of micro-clanks is achieved by locally performing least-squares fits of angular rate measurements created from CCD observations and seeing where the introduction of micro-clanks leads to a significant improvement of the fits. An alternative approach would have been to examine the residuals of a converged astrometric solution. However, in addition to the need to first generate the astrometric solution, this method would have restricted the measurements to those associated with the primary sources and would have been more sensitive to LSF/PSF calibration errors. In the following paragraphs the adopted procedure is described in a little more detail.

For each field-of-view transit by a source, consecutive CCD observations are first used to create measurements of the mean field angle rates (both along-scan and across-scan). The software currently only makes use of ungated observations, so the measurements are all associated with the astrometric attitude, and since the measurements are created from observations separated by $T\approx 4.86$ s, they correspond to mean values over intervals of length $T$. Next, the along-scan measurements are converted to measurements of the mean astrometric rate about the SRS $z$-axis, ${\overline{\omega}}_{z}$, and then reduced by binning (nominal bin size is 0.2 s) and selecting the median value from each bin. (To perform this conversion, estimates of the the mean astrometric rates about the $x$- and $y$-axes are required and to create these estimates the measurements of the across-scan field angle rates are used.)

For each bin $j$, the reduced measurements of ${\overline{\omega}}_{z}$ are fitted locally over a time interval centred at its midpoint, ${t}_{j}$, and of length $2{w}_{0}\approx 13.0$ s, the value of ${w}_{0}$ having been chosen in an attempt to optimize the detection of micro-clanks. Two models are used, corresponding to the null and the alternative hypotheses. The null hypothesis is that there is no clank present in the interval and the $z$-component of the spacecraft rate varies linearly with the time. The alternative hypothesis is that, in addition to the linear variation with the time, there is a clank present at time ${t}_{j}$. The model for the alternative model is therefore:

$$ | (3.111) |

where $a$, $b$ and ${c}_{j}^{z}$ are constants and

${C}_{0}(t)$ | $=$ | $\frac{1}{T{\tau}_{0}}}[R(t+{\displaystyle \frac{T+{\tau}_{0}}{2}})-R(t+{\displaystyle \frac{T-{\tau}_{0}}{2}})$ | (3.112) | ||

$-R(t-{\displaystyle \frac{T-{\tau}_{0}}{2}})+R(t-{\displaystyle \frac{T+{\tau}_{0}}{2}})]$ |

is the response in the mean astrometric rate about a given axis (here the $z$-axis) to a unit clank at time $t=0.$ The model for the null hypothesis is simply Equation 3.111 with the final term on the right-hand side omitted.

To assess the probability of there being a clank at time ${t}_{j}$, the $p$-values associated with the minimized cost functions for the least-squares fit with and without the clank are computed and compared. Suppose the maximum value of the ratio, ${\lambda}_{j}$, of the former to the latter occurs for a clank at time ${t}_{k}$. Then, provided ${\lambda}_{k}$ exceeds a certain threshold ${\lambda}_{\text{min}}$ (currently set equal to ${10}^{100}$), we conclude there to have been a clank with $z$-component ${\widehat{c}}_{k}^{z}$ at ${t}_{k}$, where ${\widehat{c}}_{k}^{z}$ is the value estimated for ${c}_{k}^{z}$. The rate profile for this clank, ${\widehat{c}}_{k}^{z}{C}_{0}(t-{t}_{k})$, is removed from the reduced measurements of ${\overline{\omega}}_{z}(t)$ and the process is repeated until no further clanks are detected (i.e. until $$).

Reverting to the notation used in Section 3.3.5, suppose that the above procedure has resulted in the detection of ${n}_{\text{c}}$ clanks and let ${t}_{j}$ and ${\widehat{c}}_{j}^{z}$ be the estimated time and $z$-component of the $j$th clank. Then, letting ${\overline{\omega}}_{z,i}$ denote the reduced measurement of ${\overline{\omega}}_{z}$ at time ${t}_{i}$, the corresponding residual measurement (having removed the contributions from all the clanks) is:

$${r}_{z,i}={\overline{\omega}}_{z,i}-\sum _{j=1}^{{n}_{\text{c}}}{\widehat{c}}_{j}^{z}{C}_{0}({t}_{i}-{t}_{j}),i=0,1,\mathrm{\dots}$$ | (3.113) |

Let $\mathrm{\Omega}=59960.5$ mas/s be the nominal $z$-axis rate of the spacecraft. Then the $z$-axis component of the medium-frequency fluctuations in the attitude, ${\theta}_{\text{mf}}^{z}$, is found by numerically integrating the difference between the above residuals and this nominal rate, i.e.

$${\theta}_{\text{mf}}^{z}({t}_{i})=\sum _{j=0}^{i}({r}_{z,j}-\mathrm{\Omega})({t}_{j}-{t}_{j-1}),i=0,1,\mathrm{\dots}$$ | (3.114) |

(Before performing this integration, the software removes any obvious outliers from the residual measurements and uses linear interpolation to add values where measurements are missing.) If the micro-clanks have all been correctly detected and removed, then any high-frequency components which remain can be considered to be noise. To remove these, ${\theta}_{\text{mf}}^{z}$ is fitted with cubic B-splines having a uniform 5 s knot separation. The coefficients of these cardinal B-splines are then used whenever we need to evaluate the medium-frequency fluctuations ${K}^{z}(t)$ (see Equation 3.109).

A hit by a micro-meteoroid (micro-hit) produces discontinuities in the spacecraft rates (angular velocity vector) rather than in the spacecraft attitude. The corresponding signature seen in the mean astrometric rate measurements is consequently quite different from that which results from a micro-clank. The initial response, which may be calculated by simply integrating Equation 3.112, is normally followed by a period during which the rates may oscillate as the Attitude Control System works to bring them back to their nominal values. However, it has been found that the response is often far from predictable. There are frequently periods containing what appear to be multiple hits or a hit that is followed by either a data gap or a ‘cloud’ of measurements. All this makes the detection and estimation of micro-hits by the method used for micro-clanks very difficult and in many cases impossible, so that an alternative approach was needed.

The approach adopted was that of machine learning. Based on a set of manually-selected examples, the software was first taught how to recognize features common to a micro-hit. After a validation period, during which various parameters were tuned, the algorithm was then used to identify the presence of large (greater than 2 mas/s) micro-hits. No attempt was made to estimate the exact (along-scan) magnitudes of the hits; instead the software concentrated on determining the time intervals during which the rate measurements are affected. Finally, these intervals are designated as being ‘attitude dead times’ (see Section 3.3.5). (Measurements shortly before or after known micro-hits are also discarded during the search for micro-clanks. However, these micro-hits were detected using an older, heuristic method.)

Attitude dead time handling is implemented in AGIS to avoid periods where large disturbances occur that would render the attitude processing useless. Additionally, if the attitude is completely incorrect in some period this disturbance can cause adjacent regions to also be affected during the iterative updating of the attitude which can in extreme cases cause the entire solution to diverge. The cause of such regions can be many, a few obvious examples are, telescope refocusing, mirror heating to remove contamination, orbital manoeuvres, micro-meteoroid hits, missing or poor observational data, etc. The time span for the dead time can vary significantly from some days, e.g. mirror heating, to a few minutes, e.g. micro-meteoroid hits. As part of the processing, such unstable intervals have been identified and the time and duration of the event has been entered into a dead time file which is read when constructing the attitude knot sequence. Dead times are also introduced when no observations are available or when the observations are unmatched in IDT. Gaps are then introduced in the attitude and multiple knots are added at the edge of the gaps with multiplicity equal to the spline order. This allows the spline to be gracefully terminated at the beginning of the gap and restarted after the end of the gap.