# 2.2 Properties of the input data

Author(s): Uli Bastian

This section describes the input data from which the astrometric and photometric pre-processing — and thus the DPAC data processing as a whole — starts. The input data largely fall into two categories: Telemetry data from the Gaia satellite, and auxiliary data prepared by the DPAC in advance of the mission.

## 2.2.1 Overview

Author(s): Uli Bastian

The most important and biggest input of course is the telemetry data from the Gaia satellite. Originating from the spacecraft they enter the data processing after several transmission and transformation steps: through the telemetry spacecraft-to-Earth telemetry link, the three ESA ground station antennas at Cebreros (Spain), New Norcia (Australia) and Malargüe (Argentina) to the Mission Operations Centre (MOC) at ESOC, Darmstadt (Germany), into the Telemetry Archive at the Science Operations Centre (SOC) at ESAC, Villafranca (Spain) and into the DPAC’s live pre-processing database via the DPAC’s MOC–SOC Interface Task (MIT) software.

The telemetry consists of housekeep and science data. The former contains a huge variety of on-board status information, subsystems working logs etc., including the autonomous on-board attitude determination results. It is not described further in the present chapter, although it enters the processing in many critical ways.

The science telemetry data are described in Section 2.2.2. The subsequent subsections, from Section 2.2.3 to Section 2.2.5, explain three auxiliary star catalogues prepared before the mission and used in the pre-processing: the Initial Gaia Source List (IGSL) for the preliminary assignment of Gaia observations to celestial objects, the Ecliptic-Poles Catalogue (EPC) for the initial in-orbit performance verification and calibration after launch, and the Attitude Star Catalogue (ASC) for the continual on-ground attitude reconstruction.

## 2.2.2 The raw science telemetry data

Author(s): Jordi Portell

The Gaia spacecraft, and specifically its focal plane through the Video Processing Units (VPUs), generate a variety of raw data packets which are down-linked to the ground and must be processed by DPAC. These packets include the astrometric, photometric and spectroscopic measurements, but they are not self-contained — in the sense that their measurement features are provided through separate packets. This is done for down-link optimisation reasons. Probably the most important task in the astrometric and photometric pre-processing is the reconstruction of self-contained individual measurements.

Raw data is organized in Star Packets (SP) and Ancillary Science Data packets (ASD). The former contain the science data in itself, such as the pixels acquired from the CCDs, whereas the latter contain shared data needed for the reconstruction of raw measurements, such as information on the measurement coordinates through the focal plane or the integration time of each image.

There are 9 types of Star Packets, identified as SP1 to SP9, plus 7 types of Ancillary Science Data packets, identified as ASD1 to ASD7. There is yet another type of data packet, called Service Interface Packet (SIF), but that is only used for payload diagnostics and extended, on-demand data acquisition.

### Generation

Typically, one SP of one or more type is generated for every astronomical source transit across the Gaia focal plane. That is to say, every time that a VPU detects, confirms and measures the transit of a source with enough brightness and sharpness. Some of these packets are only generated during special calibration or non-nominal activities. We can classify Star Packets as follows:

• Nominal astronomical packets:

• SP1, the most numerous ones, with one packet generated for each astronomical source transit across the SM, AF and BP/RP CCDs. These form the main data input to all Gaia data processing systems.

• SP2, same as for SP1 but only for those sources detected in the focal plane rows that have RVS CCDs, and only for sources which are bright enough for being measured there.

• SP3, generated only for SP1 packets for which a significant across-scan motion has been autonomously detected on board. These are called Suspected Moving Objects (SMO).

• Nominal instrumental packets:

• SP4, with regular (periodic) measurements from the Basic Angle Monitoring (BAM) device. That is, although these are labelled as ‘Star Packets’, these do not contain any astronomical information, but instrumental information instead.

• Non-nominal astronomical packets:

• SP6 and SP7, with SM and AF1 measurements of bright stars, which are only generated when the on-board Attitude and Orbit Control System (AOCS) is being initialised and when it loses convergence momentarily. These are mainly down-linked for further analysis and checks, but they do not enter the main data processing pipelines.

• SP8 and SP9, with AF1 or SM measurements of bright stars, also only generated in special on-board conditions and not entering the main data processing pipelines.

• Non-nominal instrumental packets:

• SP5, with measurements from the Wave Front Sensor (WFS) monitoring device. As with SP4, these do not actually contain any astronomical information, but instrumental only.

Regarding the ASD packets, they are generated as follows:

• ASD1, with the across-scan window position information for most CCDs, generated every second for each VPU.

• ASD2, with electronic bias data (pre-scan pixels), generated periodically (about once per minute per VPU).

• ASD3, with information on the RVS resolution changes, generated every time that a bright-enough star is observed in an RVS CCD. Note that very early in the mission it was decided to use always the high-resolution acquisition mode in those CCDs, and thus these packets are not available for most of the mission.

• ASD4, with statistical information and counters on some on-board events, generated periodically (once every number of seconds).

• ASD5, with the times when the artificial Charge Injections (CI) have been applied on the CCDs, generated at a quite regular pace (once every number of seconds).

• ASD6, with the information on the gates activation in the CCDs (to reduce the integration time). These are generated every time that a bright-enough star (about $G<$12) is observed in a CCD.

• Finally, ASD7, with information (time and position) when any SP1, SP2 or SP3 measurement was acquired by any VPU. Therefore, these ASD7 packets are generated for each source transit (although an ASD7 packet actually contains information on a set of transits).

### Contents

The following are the specific contents of each of the raw telemetry packets down-linked by Gaia:

• SP1: These are the most important data packets. One SP1 packet contains the SM, AF, BP and RP samples from one source transit over the focal plane. Only small ‘windows’ of samples are acquired and transmitted, centred by the VPU algorithms on the astronomical source detected.

• For SM, windows of 40$\times$6 samples (each 2$\times$2 pixels) are sent, thus covering an area of about 4.7$\times$2.1 arcsec${}^{2}$.

• For AF CCDs, the exact shape of the windows depends on the CCD (AF1 to 9) and on the source brightness, but typically, windows of 12$\times$12 pixels are sent (with the AC pixels binned into one single sample, thus providing only 12 samples in the AL or scanning direction). Bright stars ($G<$16) are acquired with slightly larger windows (18$\times$12 pixels), with the brightest sources ($G<$13) being acquired in full 2D resolution. Thus, AF windows typically cover about 700$\times$2100 mas${}^{2}$ (raising to about 1$\times$2.1 arcsec${}^{2}$ for bright sources).

• Finally, BP and RP windows cover 60$\times$12 pixels, with 2D resolution only for the brightest sources ($G<$11).

• Besides the raw samples, these packets also include the time, FOV, CCD Row and AC pixel where the source was detected (all based on the AF1 CCD), an on-board estimated magnitude, and information about the exact sampling scheme used — including information on possible window overlaps with (or by) another source.

• SP2: These packets also include the basic detection and measurement information as SP1 packets, but the only sample data included is from the RVS CCDs. Three windows are included (for each of the 3 along-scan RVS CCDs used for a spectroscopic transit), each covering a large area of 1296$\times$10 pixels (about 76.4$\times$1.8 arcsec${}^{2}$). Full 2D resolution is only used for the brightest stars.

• SP3: These packets are complementary to SP1 packets for Suspected Moving Objects (objects for which the VPU has detected an AC motion from SM to AF1). Here, only basic detection information is included (as in SP2), and the only sample data is that from additional BP and RP windows placed on top (or bottom) of the nominal BP/RP windows already included in the associated SP1 packet.

• SP5: WFS data packets include windows of about 682$\times$120 pixels plus timing and position information.

• SP6, SP7, SP8 and SP9 packets are non-nominal. They include, besides timing, position and measurement information, windows of SM and AF samples (SP6 and SP7), AF1 samples (SP8) or SM samples (SP9).

• ASD1: Each of these packets includes, besides a reference time and the CCD row number, the across-scan (AC) shift in pixels with respect to the AF1 reference coordinate for each of the CCDs (except AF1) and for each field of view, indicated in pixels. Thus, combining the adequate ASD1 packet with an SP1 (or SP2 or SP3) packet one can determine the absolute AC position where each window was acquired.

• ASD2: Each ASD2 packet contains a ‘burst’ of pre-scan samples acquired on a given CCD. Thus, these contain a CCD identifier, a reference time, and a set of typically 1024 samples.

• ASD3: These contain a reference time and CCD identifier, plus the resolution switch type (low-to-high or high-to-low).

• ASD4: There are two variants of these packets, both containing a large set of on-board counters (per VPU), such as the number of detected, confirmed and allocated objects (that is, transits of astronomical sources) for the different types of windows and per field of view, or the number of packets generated of each SP/ASD type.

• ASD5: Each of these packets holds a number of times when a charge injection was generated (w.r.t. a packet reference time) for a given CCD. One of these packets can cover a few seconds (up to some 1–2 minutes, depending on the configuration).

• ASD6: Each ASD6 packet indicates the gate configuration that was active for a given CCD at a given time. Thus, a new ASD6 packet is generated every time that a gated window (a window acquired with a shorter integration time, for a bright source) is started or finished. Note that it means that a given bright star, causing the activation of some gate, will also affect other sources being observed on the same CCD at and immediately around that time.

• ASD7: Finally, each ASD7 (or object log) packet contains a reference time plus a set of up to 3071 entries, each corresponding to one SP1, SP2 or SP3 packet allocated on board for the measurement of a source transit. Each of such entries indicates the detection features (time, coordinate, FOV, acquisition mode, etc.) and the brightness of the source.

It is worth noting that, besides the raw data from the spacecraft in itself, other ancillary data is also needed when performing the raw measurements reconstruction. Such data is stored in the so-called Calibration DataBase (CDB), which contains, for example, the Along-scan Phasing Table (ALPT) from which, when combined with an AF1 detection time, we can determine the absolute measurement times for each of the windows in an SP1, SP2 or SP3 packet. Several other tables are also needed from the CDB, such as those that indicate the exact configuration for the Gates, Charge Injections, etc. Needless to say, the CDB must be perfectly synchronised with the actual configuration active on board Gaia.

### Usage in Gaia processing

All the raw science telemetry data previously described is only used in the very first pre-processing stages of the Gaia DPAC — mainly in the Initial Data Treatment system (IDT) (see Section 2.4.2). Such a system takes care of combining all these data packets to generate self-consistent measurement records, which become the basic input data to further downstream systems — be these for astrometric, photometric or spectroscopic processing. For more details see Section 2.4.3.

## 2.2.3 The Initial Gaia Source List (IGSL)

Author(s): Ricky Smart

### Construction

The Initial Gaia Source List (IGSL) was commissioned by the Gaia Data Processing and Analysis Consortium (DPAC) in 2006 to be a combination of the best optical astrometry and photometry information on celestial objects available at the Gaia launch: A snapshot of the sky as we know it before Gaia. The method adopted was to crossmatch large-area star catalogues into one database, then select the best parameters based on the typical precisions for each contributing catalogue. The 3rd delivery of the IGSL was made in late 2012, and it was at that point frozen to be fully integrated into the MDB before launch.

The formal DPAC mandate for the IGSL was to fulfil the following broad requirements: provide all-sky positions, proper motions, and magnitudes for objects to a limit of Gaia magnitude $G$=21 where possible, e.g., where there are large ($>$10 000 square degrees) catalogues that reach that limit. The proper motions and magnitudes are to be provided on a best-effort basis, nominally with precisions of 10 mas yr${}^{-1}$and 0.3 magnitudes, respectively, but obviously limited by the currently available large catalogues. The DPAC Core Processing Coordination Unit (CU3) catalogues of quasi-stellar objects (QSOs) and the Ecliptic Poles catalogue should be included (with no selection on magnitudes) to directly support the CU3 processes that require those resources. The Hipparcos objects were included with no selection on magnitudes to aid in the production of the Hundred Thousand Proper Motions Catalogue (de Bruijne and Eilers 2012; Michalik et al. 2014).

The format and contents of the IGSL are described in Smart and Nicastro (2014). After extensive use within the mission a number of problems were discovered. Such known problems are collected and made available in the documentation on the IGSL webpage.

### Contents

The contents of the IGSL are a compilation of the following catalogues:

• GSC2.3 — The Second Guide Star catalogue version 2.3 (Lasker et al. 2008);

• Tycho-2 — (Høg et al. 2000);

• UCAC4 — USNO CCD Astrograph catalogue version 4 (Zacharias et al. 2013);

• 2MASS — Two Micron All-Sky Survey Point Source catalogue (Skrutskie et al. 2006);

• PPMXL — Positions and Proper Motions ‘Extra Large’ catalogue (Roeser et al. 2010);

• LQRF — The CU3 early version of Large Quasar Reference Frame (Andrei et al. 2009);

• OGLE — Optical Gravitational Lensing Experiment version III (Udalski et al. 2008a);

• Hipparcos Perryman et al. (1997); van Leeuwen and Fantino (2005); van Leeuwen (2007);

• Sky2000 the SKYMAP Master catalogue of bright stars, Version 4 (Myers et al. 2001);

• SPSS the Gaia spectrophotometric standard star catalogue (Pancino et al. 2012).

For details see Smart and Nicastro (2014).

### Usage in Gaia processing

The IGSL was and is being used for the initial (partly preliminary) assignment of individual Gaia observations to known astronomical objects in the sky (not including solar-system objects). The process doing this assignment is called crossmatch (or sometimes crossmatching) in the Gaia jargon.

Although Gaia in the end will create a completely independent and self-contained all-sky inventory of astronomical objects — not relying on any pre-launch knowledge — it was deemed useful to have such an initial list, for two main reasons:

• An assignment to known celestial objects is needed internally for the initial calibration and verification of the spacecraft and instruments.

• Giving pre-defined Gaia source identifiers to known celestial sources (and publishing these in the form of the IGSL) will allow the members of the external scientific community to prepare specific object lists and auxiliary data for specific research topics — and then easily identify these objects in the Gaia catalogue by just using those pre-known ’names’ (source identifiers).

### Known issues with the IGSL

The IGSL was first delivered in 2007 and two other versions were delivered before the frozen version in 9/2013. Subsequent use has revealed a number of problems. Most would be relatively simple to fix but there was no provision made for updating the IGSL hence all downstream Gaia processing had to deal with these problems and we collect the known issues here to have them in a central location.

#### Duplicate entries

Duplicate Hipparcos/Tycho-2 entries: The Hipparcos catalogue was not one of the defining catalogues for the IGSL but after the catalogue was made there was a request to make sure all the Hipparcos stars were nevertheless included. Those that were not included as part of the other catalogues were added in patch as a ’fake’ Tycho-2 star with the Tycho-2 ID = 9999999000000+HIP_Number. Subsequent work has shown that because of a bug in the matching procedures approximately 12000 Hipparcos stars were entered twice. These can be identified as the objects with auxHIP = 1 and idTYCHO $>$ 9999999000000. They should not be used for any purpose.

#### RA or Dec values are out of range

For the GSC23 and SDSS objects when there is a proper motion from the PPMXL it was applied to bring the positions from the epoch of observation to 2000. These new positions were not normalised to be within normal RA/Dec ranges and 34 have remained outside the nominal range. They should be normalised.

#### Classification problems

The 197921 non-stars in the GEPC were classified as 3 or 27 rather than 1 as in other catalogues. The classification of a star is correctly listed as 0.

The 167055567 SDSS objects have their classification inverted, that is stars are classed as 1 and non-stars as 0 or -1.

When the sourceClassification is 0 it is because none of the catalogues with that object provided a Classification which should be set to null.

#### Ecliptic coordinate errors (thanks to S. Roser)

The ecliptic coordinates were calculated with a 1950B rather than J2000 transformation.

#### Proper motion null values

The null value for proper motions are not consistent as they are derived from the null values in the original catalogues. Objects with proper motions and errors of 0 or ’null’ can be considered not provided with two exceptions: QSOs in the LQRF which is a defaulted value and UCAC objects where the null errors are set to zero but the proper motions maybe real.

#### HEALPix in name

For $\sim$30000 entries from the GSC23 the name (sourceID) which is made up of the 12th level HEALPix of the first instance and the running number in the 6th level HEALPix has sometimes the incorrect 12th level HEALPix. As a name they are still valid but generally the user is encouraged not to use the HEALPix part of the sourceID as an indication of the position in the sky.

#### General magnitude transformations

The calculation of magnitudes from the transformations sometimes gives unreasonable numbers because of problems with the input catalogue or working outside the range of the transformations.

Some examples:

• IGSL sourceid = 2641188455049732224 has a bright $G$ but it is not real but due to noise in the SDSS that has a large g$-$r (SDSS ID 1237656906352361542) and so this makes the $G$ bright.

• IGSL sourceid = 1339762992985816960 has a magBJ = 27.5297, magRF = 13.35 and gives $G$ = 17.713, all magnitudes are unrealistic. The B is a transform from the SDSS so the problem is probably in the SDSS original data.

• IGSL sourceid = 5283973366647424768 has a $G$ = 7.6 but no bright source is present. The magnitudes come from the PPMXL and has ${\rm m_{B}}=18.8$ and ${\rm m_{R}}=13.0$ transforming to G the error is very large.

Approximately 80% of the $G$ estimates come from transforms of R, B and the distribution in B$-$R for the IGSL is given in Table 2.1

Most objects outside of B$-$R = -1 to 4 are probably unreal and the transformation is only good from -1 to 2.5 so around 20% probably have large errors.

Very wrong $G$/$G_{\rm RVS}$looking at 88000 HIP stars $\sim$5% of them have very wrong $G_{\rm RVS}$. A limit bad case HIP-112306 that has in IGSL $G_{\rm RVS}$= -0.885 and G = 5.57 and in literature V = 10.89 I = 8.44 or HIP-114598 that has $G_{\rm RVS}$= 20.11 and G = 16.299 and in literature V = 8.1 and I = 8.12. The error in IGSL (of $\sim$0.5) are not indicative of these errors. Again this is due to transformation or input errors.

Finally, all objects from the HIP, SPSS, SKY2000, LQRF and GEPC catalogues were included even if the magnitude information was incomplete. So for example if these objects had OGLE/Tycho-2 magnitudes the relations 10, 13, 16 and 17 are only valid to B$-$V or BT$-$VT of 2.5 so objects outside that were not assigned $R_{F}$ & $B_{J}$ magnitudes and included nevertheless.

#### Magnitude errors

No check was made on the magnitude errors in the IGSL and they are usually just a simple function of the input catalogue errors, so if the inputs were high so the IGSL ones will be high. So for example the IGSL object 9698311034780416 has an unrealistic error in $R_{F}$:

 RAJ2000 DEJ2000 ${\rm magB_{J}}$ emag ${\rm magR_{F}}$ emag 051.794675 +06.859480 19.720 0.450 24.160 27.506

that comes directly from the SDSS the source of the magnitude:

 ra dec g r err_g err_r 51.794674912 6.859479824 19.90372 24.14491 0.4029701 27.50531

for SDSS id = 1237673328683844256.

The Tycho-2 catalogue sometimes did not include the blue or red magnitudes and they were taken from Hipparcos, unfortunately in these cases the errors were not updated and they have remained zero as published in Tycho-2.

#### Position errors

No check was made on the error of the positions, if the catalogue provides 0 or a number that is less than 0.5 mas rounds to 0 when stored in mas then it’s error is listed as 0. An example from the Gaia EPC catalogue, described in Section 2.2.4, is the object GEPCJ055642.10-665127.5:

 # Cat_name RA HH MM SS.SSSS error DEC sDD MM SS.SSS error GEPCJ055642.10 -665127.5 5 56 42.0998 0.0176 -66 51 27.451 0.0

this also happened a lot in the SDSS where a number of objects had errors less than 0.5 mas and these got rounded to 0.

#### GRVS magnitudes

For the objects with GRVS magnitudes coming from Tycho-2, e.g. sourceGrvs = 29, there is an error in the equation of the GRVS, we used:

 ${\rm GRVS}={\rm VT}-.1313-1.3422({\rm BT}-{\rm VT})-0.09316({\rm BT}-{\rm VT})% ^{2}-0.0663({\rm BT}-{\rm VT})^{3}$ (2.1)

while it should have been:

 ${\rm GRVS}={\rm VT}-.1313-1.3422({\rm BT}-{\rm VT})-0.07918({\rm BT}-{\rm VT})% ^{2}-0.04790({\rm BT}-{\rm VT})^{3}$ (2.2)

For large colours this becomes a problem at the 1–3 magnitude level,

#### OGLE entries

OGLE was considered on average to have better photometry than other catalogues so estimates of the G magnitude were taken from those values. We were provided with the deep catalogues of the OGLE surveys which did not have stars brighter than around 13. This means that when a faint OGLE star was matched to a bright input, because the OGLE magnitudes override others, these objects were sometimes dropped or assigned under estimated magnitudes. For example the 13th magnitude star at 91.9967388, -70.9623672 in the UCAC catalogue was matched to a faint nearby star in the OGLE deep catalogues and assigned a G fainter than 21 so was dropped. This occurred in the 5 OGLE regions which are small parts of the sky (a few square degrees in total). How many stars in this region that were mistakenly removed is not easy to estimate.

#### Altitude star catalogue II

As this catalogue was made in a significantly different way the object sourceIds were not consistent. We matched the ASC II to the IGSL source database to try and obtain consistent sourceIds but there will be cases of mismatches as they are fundamentally two different catalogues. In particular we did not use the SDSS in the production of the ASC so the magnitudes are often based on different source catalogues.

#### High proper motion objects

Any objects with proper motions higher than 3276.7  mas yr${}^{-1}$ that were taken from the UCAC catalogue had their proper motions put to 3276.7, 3276.7 due to a bug in the read program. For the IGSL this occurred for the 17 sources where their proper motions should have been those listed in Table 2.2.

## 2.2.4 The Gaia Ecliptic Poles Catalogue (GEPC)

Author(s): Martin Altmann, Uli Bastian

The Gaia Ecliptic Poles Catalogue (GEPC, formerly known as EPC (Ecliptic Poles Catalogue) was assembled primarily to utilise the two Ecliptic Poles fields (SEP: 06:00:00 $-$66:33:41, NEP: 18:00:00 +66:33:41, see Figure 2.1) which are scanned by Gaia twice every rotation (once with each field of view) when the satellite operates in EPSL (Ecliptic Poles Scan Law) mode, which mainly happened during the commissioning time. These frequent observations yield data with a density which would only be reached for other parts of the sky after significantly more time, therefore allowing to evaluate the Gaia performance in a much more realistic way than with other methods. While the fields are located at similar Galactic latitudes, the makeup of both fields is very different, since the southern field is dominated by LMC field stars at fainter magnitudes (it lies in the outskirts of the LMC). The northern field is a normal low-density star field at high galactic latitude. This difference allows to analyse the properties of Gaia under two very distinct stellar environments.

### Construction

The GEPC consists of two $\simeq$1 square degrees fields centred on the ecliptic poles themselves.

The southern field, or SEP-field, was observed with the MPIA 2.2 m telescope at La Silla in Chile and its WFI detector, which covers $\simeq 0.5$${}^{\circ}$$\times 0.5$${}^{\circ}$. To fully cover the 1 square degree field as required, observations were done using 5 pointings, one centred on the pole and the other four being tiled so that they fill $\simeq 60$${}^{\prime}$$\times 60$${}^{\prime}$with some degrees of overlap between them. Observations were done in Bessel $BVRI$ and calibrated to Landolt Standard fields into the Vega magnitude system and then transformed to Gaia magnitudes ($G,G_{\rm BP},G_{\rm RP},G_{\rm RVS}$).

The limiting magnitude in $V$ and $R$ and thus $G$ is roughly 22.5 mag. Centred on $G\simeq$ 18.5 there is a peak in the magnitude distribution, see Figure 2.2. This peak is real, it is caused by the LMC’s Red Clump gint stars, which is a very prominent population in this field.

The northern field was observed with the 3.6 m CFHT located on Mauna Kea (Hawaii, USA) and its MEGACAM detector. As the field of view of this device is already one square degree, observations were carried out without a pointing pattern, only a five-times dithering pattern. Filters used were SDSS $ugri$ in this case, and the $z$ band was incorporated from Hwang et al. (2007). Our own data was calibrated into the system of Hwang et al. (2007). In contrast to the SEP-field, the photometric zero points are for the $AB$ system, as generally in SDSS-type photometric fields. Again, the photometry is transformed into Gaia magnitudes. Due to the larger telescope the faint limiting magnitude for the NEP-field is about 26 in $g^{\prime}$ and $r^{\prime}$ and thus $G$, the limit of completeness being about 24 mag. For some stars the NEP field has proper motions, which were derived using the first epoch material from the POSS, taken from the Minnesota Automated Plate Scanner (MAPS), see Pennington et al. (1993); Cabanela et al. (2003). The plate in question (P72) was taken on August 18, 1952, allowing for a epoch baseline of roughly 56 years.

Both fields have some gaps, as can be seen in Figure 2.1. In the case of the northern field these gaps are caused by the 5 point dither pattern which is not sufficient to close all gaps in this $4\times 8$ detector array. Other gaps in the north and also those present in the southern field are due to matching criterion used in assembly. These gaps appear where the gaps between detectors are least covered by the dithering, and objects are partly only on one image of a set of five. In order to prevent too many false positives, which would have been detrimental for the commissioning process of Gaia, objects only on one image were discarded.

#### Data reduction

This part deals with the data reduction steps from data treatment to photometric calibration.

• Image reduction and source extraction: The northern field was delivered with the basic de-trending (de-biassing, flat-fielding, etc.) done by the Elixir-pipeline (see e.g. Magnier and Cuillandre (2004)). Further steps including the source extraction was conducted with the Theli program (Schirmer 2013), available here, based on the Astromatix Suite (Bertin et al. 2012), see also here, which includes well-known programs such as Sextractor (Bertin and Arnouts 1996). The final assembly and matching of the extracted catalogues including the calibration to Hwang et al. (2007) was done using TOPCAT, a VO-compatible table calculation and plotting tool or the underlying stilts routines, see Taylor (2005), see here respectively here. Since Theli delivers flux conserving images, the source extraction was done using the sky projected images, with the centre being the nominal coordinates of the NEP-field. This means that in contrast to the southern part, the source coordinates were already in one common plane/projection and did not need to be transformed further.

The WFI-data was delivered as raw data including calibration data, and had to be reduced from scratch. Calibration data used, are the usual sets of bias and twilight flat data, as well as sky flats derived from the longer exposed science data. Additionally so called ‘beta’-images were used to save some of the unfortunately rather frequent ‘bad columns’. These images were images exposed to different exposures of $\beta$-radiation which allow the correction of some of the bad columns, namely those which do show a signal response (opposed to those which do not, i.e. dark or hot dead columns). Nonetheless this did not completely work in every case, so some residual columns remain, which leads to the detection of spurious objects along these columns. As a consequence we decided to use harsher rejection methods in the matching process, eliminating the vast majority of such objects, at the cost of missing some others. For the Gaia commissioning, the catalogue is optimised for as few false positives as possible. The reduction of the SEP-data was done using MPIAphot (Meisenheimer, Roeser, priv comm.) a Midas based routine suite developed at the MPIA mainly for reduction of MPIA instruments, such as those on Calar Alto and the 2.2 m MPI-telescope on ESO’s La Silla observatory, including the WFI detector used here. The photometry was derived from the non sky projected images (The sky projected images made with MPIAphot are not flux conserving), sources were again extracted with Sextractor (Bertin and Arnouts 1996). The extracted sources were then brought into one gnomonic plane centred on the centre of the first image of the central pointing using Midas routines.

• Stacking and matching: The stacking and matching of individual images was done in a similar fashion for both fields; Therefore this step is described in one part. This process was not done using the actual images, but the extracted sources. After matching and before combining the data, photometric offsets were determined, and an r.m.s. error was derived. One image (usually the first in the sequence) was chosen to be the reference image, and the others were corrected for the offset to match the reference. Then the stacking of the images was done the following order and the standard deviations of magnitudes and gnomonic coordinates $\Xi,\eta$ were derived for error determination, and Equation 2.10. The optimum matching radius was determined to be 0.6 ${}^{\prime\prime}$ for both fields. This is not surprising since the average seeing was 1 ${}^{\prime\prime}$ in both cases. For the next steps after the first match (where applicable) the errors were calculated by error propagation:

1. (a)

all images of one exposure time and one pass band (and one pointing in the case of the south).

2. (b)

all results from step 1 for all pointings (only for the south, since the north only has one pointing)

3. (c)

all results from step 1 (north) or step 2 (south) from one pass band

4. (d)

all pass bands were matched (not stacked, of course)

• Photometric calibration: Please note that the two regions use different filter systems, the South, Johnson–Cousins–Bessel (JCB), and the North Sloan filters. These are similar but have distinct differences. Sloan does not have a $B$-band and JCB does not have $z$. The Sloan $g$ band is actually roughly speaking a combined $B+V$ JCB filter. The current version GEPC3.0 has $BVRI$ in the SEP-field and $u^{*}g^{\prime}r^{\prime}i^{\prime}z^{\prime}$ in the north, the $z$ is not our own data but taken from (Hwang et al. 2007). Please also note that the northern field is calibrated into the $AB$-system, and the southern field into the Vega-system. The reason for this is that those are the customary photometric zero point systems for each of these filter systems and moreover the relations are calculated this way.

For the northern field, we calibrated our photometry to that of (Hwang et al. 2007), since we had this data set at our disposal which also happened to be observed with the same instrument than our data. The conversion to G-mag was done using the $g,r$-bands and the latest version of the conversion functions. The data for the southern field contains four colour BVRI photometry calibrated to the Landolt secondary standard system. The filters used are Johnson–Cousins resp. Bessel filters, available for the WFI instrument. Since all filters deviate a little from the original, and filter throughput are changed in time by oxidation and other degrading effects, there will always be small residual systematic effects, most of which can be dealt with during calibration, some however will remain.

For this release, individual photometric errors have been derived. These reflect the internal errors only, there are more uncertainties introduced by calibration and various effects, such as filter degradation and others. The true photometric error will thus be larger than the errors listed.

Because of the lack of suited calibration data in the data set on which the GEPC1.x was based on, it only has a rough photometric calibration based on the position of the LMC Red Clump. This has been changed in this version, the photometry is now calibrated to the Landolt secondary standards. The standard field used for the photometric calibration were T–PHE, PG0231+051, SA95–42, and RU–149. The magnitudes are of Vega type (rather than AB). For the northern part, the SDSS type magnitudes are AB by definition.

The calibration coefficients for the southern field are given in Table 2.3

The calibration was conducted using the following calibration equations:

 $\displaystyle(B-V)_{\rm cal}=\frac{(B-V)_{\rm inst}-(B1-V1)-B2\cdot AM_{B}+V2% \cdot AM_{V}}{1+(B3-V3)}$ (2.3) $\displaystyle B_{\rm cal}=B_{\rm inst}-B1-B2\cdot AM_{B}-B3\cdot(B-V)_{\rm cal}$ (2.4) $\displaystyle V_{\rm cal}=V_{\rm inst}-V1-V2\cdot AM_{V}-V3\cdot(B-V)_{\rm cal}$ (2.5) $\displaystyle(V-R)_{\rm cal}=\frac{(V-R)_{\rm inst}-(V1-R1)-V2\cdot AM_{V}+R2% \cdot AM_{R}}{1-R3}$ (2.6) $\displaystyle R_{\rm cal}=R_{\rm inst}-R1-R2\cdot AM_{R}-R3\cdot(V-R)_{\rm cal}$ (2.7) $\displaystyle(V-I)_{\rm cal}=\frac{(V-I)_{\rm inst}-(V1-I1)-V2\cdot AM_{V}+I2% \cdot AM_{I}}{1-I3}$ (2.8) $\displaystyle I_{\rm cal}=I_{\rm inst}-I1-I2\cdot AM_{I}-I3\cdot(V-I)_{\rm cal}$ (2.9)

The following numbers show the zero point shifting between the data and the calibration images, and the measurements of the latter with Sextractor and PHOT (from the daophot package). The according errors show the shift errors and are almost negligible.

Shift within calibration images between sextractor and PHOT (instr, $mag_{\rm sex}-mag_{\rm phot}$):
$B$: $-0.674$ ($\sigma$=0.029, $\Delta$=0.0017), 275 stars $V$: $-0.029$ ($\sigma$=0.018, $\Delta$=0.0013), 179 stars $R$: $-0.016$ ($\sigma$=0.015, $\Delta$=0.0008), 331 stars $I$: $+0.226$ ($\sigma$=0.013, $\Delta$=0.0008), 276 stars

Shift between calibration images (sextr) and data zero level (instr $mag_{\rm corr}-mag_{\rm sex}$):
$B$: -0.050 ($\sigma$=0.016, $\Delta$=0.0009), 299 stars $V$: -0.038 ($\sigma$=0.017, $\Delta$=0.0008), 476 stars $R$: -0.148 ($\sigma$=0.032, $\Delta$=0.0014), 481 stars $I$: -0.136 ($\sigma$=0.019, $\Delta$=0.0011), 279 stars

Total shift between data zero level and aperture photometry:
$B$: $-0.724$, ($\Delta$=0.0019) $V$: $-0.067$, ($\Delta$=0.0015) $R$: $-0.164$, ($\Delta$=0.0016) $I$: $+0.090$, ($\Delta$=0.0014)

Since the northern part could be calibrated by differential photometry using data from Hwang et al. (2007) meaning only a magnitude shift was applied, we do not give the details here, since they are irrelevant.

The magnitude errors are computed by deriving the scatter and then the errors of the single values for each star. The according standard equation is:

 $dMag=\sqrt{\frac{1}{n}}\cdot\sigma_{Mag}=\sqrt{\frac{1}{n(n-1)}}\sqrt{\sum_{i=% 1}^{n}(\overline{Mag}-Mag_{i})}$ (2.10)

with $n$ being the number of detections and $\sigma_{Mag}$ the standard deviation. When combining data of different exposure Equation 2.10 was carried out for every set separately and the error of the combined data was derived by error propagation.

A note of caution: Stars with only one or two detections will have an error of zero, or a quite unrealistic one. Some (a few) objects have a RMS error much larger than others of comparable magnitude. In most cases this hints at variability, taking into account that most of the data were not observed on the same day, and in some cases a year lie between different parts of a dither series, etc.

The photometry errors given in the catalogue are internal RMS errors only. They do not include other systematic sources of error, such as calibration errors, photometry errors of non-point sources, brightness/colour related errors, etc. At least in the southern field, zonal errors, which may be caused by non-prefect flat fielding are partly taken into account due to the 5 point pointing pattern. As a conservative assumption a systematic accuracy of 0.1 mag is mandated.

• Astrometry: The astrometry was improved, so that systematic astrometric inaccuracies, as present in GEPC1.x have been corrected. Analysis shows no detectable mid frequency systematics to our precision scale. Accuracy is now mainly limited by the underlying reference catalogue, which for the GEPC2/3 is the PPMXL (Roeser et al. 2010), while the earlier versions are based on the UCAC 2 catalogue (Zacharias et al. 2004). While the PPMXL is newer, the reference catalogue was not expected to have a large influence on the astrometry. However our experience shows that this is indeed the case. First of all, all available reference catalogues do have systematic differences, as is explained later in this text. Apparently not only the accuracy (i.e. systematic) effects, but also the precision plays a role and can lead to systematic effects in the reduced data. The reason for this is at current only partially understood, however most of the stars in the EPC field which are also in the reference catalogues, are in the faint part of the latter, consequently with a large error range, which will lead to ‘sloppy’ fits.

The registration and astrometric solution was done for each chip and each frame separately using the PPMXL as a reference and using 3rd order polynomials. An iterative method was used clipping 3-sigma outliers after the first round. The final positions were obtained using all of the good positional data, from all filters. This way we could ensure that every star has a valid position. We could not detect any sign of DCR. However since especially the U-band is prone to DCR, an alternative assembly of the final values might be considered in a future minor release.

For the NEP, we also excluded the long $i$-band images, since these produced large problems in the astrometry. For proper motions, we also added scans taken from the Minnesota Automated Plate Scanner (MAPS) Catalogue of POSS I. The MAPS database is supported by the University of Minnesota, available here. Pennington et al. 1993; Cabanela et al. 2003, See of the relevant POSS I-plate (P72, taken 18. August 1952), in order to get a longer baseline, than the two years for which we have baselines. In order to also include high proper motions stars, we chose a large matching radius of 5 ${}^{\prime\prime}$. In the current version we do not give errors for the proper motions, the according columns are thus completely filled. The scatter of the proper motions of the NEP field shows a sigma of about 10 mas yr${}^{-1}$. This may well serve as an upper limit for the overall precision of the proper motions, since this value includes the proper motion and the positional error.

Concerning the astrometric precision, the error given for in the relevant columns for right ascension and declination reflect the RMS error only, i.e. the scatter between the positions of all positions used to compile the position. The overall derivation of these errors is similar to those of the photometry, see Equation 2.10. As in the case of most small field astrometry, we used a reference catalogue, which itself contains systematic errors to some degrees. These are not reflected in the errors as given in the EPC. One can presume zonal medium scale errors of about 50–100 mas. As an example, the PPMXL and 2MASS catalogues (which build up partially on the same data!) show a residual slope against each other of up to 50 mas. Therefore the absolute astrometric positional accuracy cannot be better than this value. For proper motions, using the same reference catalogue for all epochs largely cancels out the systematic error introduced by the reference CATALOGUES. For the NEP-field, which currently has proper motions, these show a sigma of about 10 mas yr${}^{-1}$. This may well serve as an upper limit for the overall precision of the proper motions, since this value includes the proper motion and the positional error.

It should also be noted that neither the instruments used, i.e. mosaic detectors nor the available software are optimised for high precision astrometry, since they have largely been conceived and developed for extra galactic work, where the demands are much lower. Therefore some areas with additional systematics will exist, especially near chip edges, dither gaps etc.

• Stellarity: Another quantity added to the GEPC is also the stellarity index also known as CLASS-parameter (Bertin and Arnouts 1996). This is created during the source extraction from the 2d images using SExtractor. It is a measure for the ‘stellarity’ of an object, i.e. how star like it is. The stellarity index relies on a combined analysis of the measured morphological parameters, also employing neural networks. Values near 1 mean that it is very likely that this object is a point source like a star (it could of course also be the stellar nucleus of an AGN, etc., the stellarity index doesn’t say anything about the physical nature of an object). In reality one could consider all values below about 0.3 to be galaxies, i.e. non point source-like objects. $S>$ 0.85 is a good lower limit for stars. At bright magnitudes, i.e. significantly above the detection limit, this classification works quite well, both object types are well separated, however about 2 mag above the detection limit it starts to break down, and soon the objects will not be classified correctly. This magnitude regime is also where most of the values between 0.3 and 0.85 occur. For saturated objects CLASS is also to be used with caution.

The northern field has a larger pixel scale than most other detectors, i.e. less angle per pixel. The neural networks on which the determination of the CLASS parameter of Sextractor is based are optimised for a FWHM of about 3 pixels. This means that the more the data deviates from this value, the less reliable the resulting CLASS value will be. This is not a linear process, but rather happens more or less suddenly — that it at least in this case already appears in the case of the NEP data is somewhat surprising. The networks can be trained for other FWHM values, however since this parameter was for the GEPC a secondary quantity, we did not embark on this tedious and difficult process.

### Contents

The GEPC contains positional astrometry (and proper motions for a smaller subset in the NEP-field, see Sect. Section 2.2.4.) and multi-pass-band photometry of 612,946 objects, of these 448,478 are located in the southern field, and 164,468 in the north. This discrepancy is caused by the presence of the LMC in the south, which outweighs the significantly fainter magnitude limit in the north. Note that the photometry has different characteristics in the two fields (filter pass band system, magnitude system), as described in Sect. Section 2.2.4. The Gaia magnitudes however are comparable. The current version is GEPC3.0 which has been incorporated into the IGSL, and with the IGSL into the Main Database. For details of the IGSL, see Section 2.2.3 and Smart and Nicastro (2014).

The format of GEPC3.0 is given in Table 2.4.

### Usage in Gaia processing

The EPC was most prominently used in the commissioning phase via the Ecliptic-Poles Scanning Law and the First-Look software system

• to verify and quantify the efficiency of the on-board star image detection algorithms and Sky Mapper CCDs,

• to perform initial measurements of the photometric throughput of the telescopes, CCDs and pre-amplifier electronics

• to adjust the lower threshold of the on-board star image detection algorithms

Since the start of the nominal mission, the first and third of these items are re-checked whenever the scanning law touches the ecliptic poles — which is quite regularly about once per month for 1–2 scans per pole and field of view each — while the second item is now being covered by a daily comparison of bright-star measurements with independent Tycho-2 magnitudes. All this is part of the daily First-Look data processing.

## 2.2.5 The Attitude Star Catalogue (ASC)

Author(s): Ricky Smart

The Attitude Star Catalogue (ASC) was commissioned by the Gaia Data Processing and Analysis Consortium (DPAC) in 2006 to allow a first reconstruction of the attitude of Gaia. Eventually it will be replaced by a catalogue constructed from the Gaia observations but for at least the first two years a precompiled ground based catalogue was needed. The ASC entries were required to be of a high astrometric precision, isolated from other bright $G>13.7$ objects, and, brighter than the 2D window threshold of the Gaia instrument.

The first version delivered to the DPAC in September 2013 was simply a subset of the Initial Gaia Source List (IGSL) described in Smart and Nicastro (2014) identified by the parameter toggleASC=1. Early commissioning usage and an examination of the ASC subset revealed a number of repeat entries for the same object and entries that did not meet the isolation requirements. Since the reliability of the ASC was fundamental to the Gaia mission a new re-compilation was requested in January 2014. The new separate ASC was delivered to DPAC in April 2014 and is available from the IGSL website.

### Construction

The source catalogues used and their order of inclusion were:

• Hipparcos (Perryman et al. 1997): The photometry from the original Hipparcos Catalogue and the astrometric parameters from the update by van Leeuwen (2007) when published. Initially all entries were included regardless of the known errors, e.g. also for entries that are considered erroneous. Since inclusion in the ASC requires an estimate of the $G$-mag the unreal entries were excluded as part of the cleaning phase.

• Tycho-2 (Høg et al. 2000): This catalogue forms the backbone of all the major ground based catalogues currently available. It was made from a combination of the Tycho star mapper observations on the Hipparcos satellite (Høg et al. 1997), the Astrographic Catalogue and 143 other ground-based catalogues.

• Sky2000 (Myers et al. 2001): The SKYMAP Star Catalogue System is a list of all stars with either measured Johnson blue or visual magnitudes brighter than 9.0. The version used here had 299167 entries of which 212 were not in the combined Hipparcos + Tycho-2 catalogues. Sky2000 provides positions at 2000, proper motions and a blue and visual magnitude. We assumed the positions to have an error of 100 mas, the proper motions an error of 10 mas yr${}^{-1}$, and an error of 0.6 in the ASC magnitudes derived from the Sky2000 values.

• UCAC4 (Zacharias et al. 2013): The USNO CCD Astrograph Catalogue version 4 is the most precise all-sky astrometric catalogue in the range V=10–16 currently available. There are no original standard magnitudes in this catalogue.

• GSC2.3 (Lasker et al. 2008): The Second Guide Star Catalogue version 2.3 forms the bulk of the photometry and defines the red and blue magnitudes ($B_{J}$ and $R_{F}$) as this is the sky survey with the largest coverage on a precise homogeneous photometric system. The only variation with the public version is that we removed the multiple entries discussed in section 4.2 of Lasker et al. (2008). This was done by insisting that only one entry from any objects with position differences of less than 10 mas were kept selecting Tycho-2 or Sky2000 over other entries.

• PPMXL (Roeser et al. 2010): The Positions and Proper Motions ‘Extra Large’ Catalogue,produced from a combination of the USNO–B (Monet et al. 2003) and the Two Micron Sky Survey point source catalogue (Epchtein et al. 1999). This catalogue was included to provide magnitudes for those entries that did not have them in the previous catalogues.

In addition any objects in the Washington Double Star catalogue (Mason et al. 2010) or the Tycho Double Star catalogue (Fabricius et al. 2002) were indicated as probable members of a binary system.

The first version of the ASC was a subset of the IGSL and consequently was derived using the procedure in Smart and Nicastro (2014). In summary we produced a master list of objects starting with the large faint catalogues, progressively adding other catalogues and increasing the master list as entries from new catalogues were unmatched. The catalogues of bright objects were then matched to a large master list which resulted in mismatches of the bright objects to noise or faint objects near the true bright objects. Also it was found that the large Schmidt catalogues in the overlap region between plates often had many multiple entries of the same objects, this can be seen in the sky plot of the PPMXL. These multiple entries, if they were bright enough, were included as ASC sources.

The first on-ground attitude reconstruction of Gaia is described in detail in Gaia DR1 papers. The goal of this attitude reconstruction is to provide the attitude with an accuracy of 50 mas for the first year when the ASC will be the primary source of reference objects. Later in the mission it is planned to replace this catalogue with one produced by Gaia with an expected accuracy of 5 mas. This reconstruction requires at least one 2D measurement per second and per field of view which equates into a minimum density of 75 stars per square degree.

To be automatically assigned a 2D window the star must have a $G<$ 13 but this does not provide enough calibration sources especially near the galactic poles. The compromise was to provide a list of faint calibration stars to a $G$=13.4 which sets the limiting magnitude of the ASC. Note that, originally the limit of the ASC was set to $G$=14.0 however a change of the procedure allowed a relaxation of that requirement to $G$=13.4.

The crossmatching radius in the first on-ground attitude reconstruction will be between 20–30 ${}^{\prime\prime}$, the precise value to be optimized during the commissioning phase. Hence we conservatively require that all ASC sources are isolated at the level of 40 ${}^{\prime\prime}$. This would potentially allow up to 8000 ASC entries per square degree and not violate the isolation criteria. Following this consideration, in the original subset of the IGSL that constituted the ASC, we just lowered the magnitude limit to reduce the number of stars to less than 1000. We then assumed the isolation criteria would always be met when there were this many objects per square degree. However, because of the multiple entries, uncatalogued binary systems and general non uniform distribution it was found that the isolation requirement of the ASC subset was violated.

To address the isolation and duplicate entry issues the ASC was reproduced from scratch using the catalogues listed above in the order given. The production of the ASC starts with all objects in the Hipparcos catalogue as a master list, the other catalogues are input and matched to this master list with a matching radius of 5 ${}^{\prime\prime}$. All entries from the input catalogue not matched are included as new master list objects. If more than one entry from the input catalogue matches the master list only the closest is considered matched and a new entry is generated for the others.

In this way the master list grows with each included catalogue. Since the first catalogues are composed of bright objects they are sparse and the chances of a mismatch between the input catalogues and the master list was reduced. The confusion at the bright end of the master list was in this way minimized. When the large, dense Schmidt plate based catalogues are included there is still the possibility that non-real entries are matched to bright objects and the real bright object in the GSC23/PPMXL enter as new entries. However, the Schmidt data is only used to provide photometric information and to clean up the ASC list we drop any objects that are not in either UCAC4, Tycho-2, sky2000 or the Hipparcos catalogues under the assumption that the union of these catalogues are complete to fainter than the Gaia isolation limit of $G$=13.7.

Once the master list was completed with the compilation of all the catalogues we estimated the red $R_{F}$, blue $B_{J}$, Gaia $G$ and Gaia $G_{RVS}$ using the relations and priorities in Smart and Nicastro (2014) with the photometry from the contributing catalogues. We then dropped any objects fainter than $G$=13.7. This compilation and selection criteria results in 15 million objects. We assume all objects are stellar and then examine each object one-by-one and indicate for each object the number of neighbours within 40 ${}^{\prime\prime}$.

From this list we drop any star with a (i) neighbour, (ii) $G<7.0$ or $G>13.4$ or (iii) in the Washington Double Star or Tycho Double Star catalogues.

### Contents

The Attitude Star Catalogue was made by combining 7 all sky catalogues and selecting entries based on magnitude, isolation and astrometric precision criteria. The catalogue has 8 173 331 entries with estimates of the positions at 2000, proper motions and magnitudes (Gaia $G$, Gaia $G_{rvs}$, red $R_{F}$ & blue $B_{J}$) in the magnitude range $7.0.

### Usage in Gaia processing

Throughout the commissioning phase and scientific mission of Gaia, the ASC is used as the astrometric reference in the first on-ground attitude reconstruction, OGA1, see Section 2.4.5. At some time the ASC as described above will be replaced by a similar star catalogue derived from Gaia observations, i.e. as an excerpt from one of the early Gaia catalogues.