# 7.3 Processing steps

## 7.3.1 MDB Integrator

Author(s): Alex Hutton

The operation of the MDB Integrator is relatively simple for data reduction cycle 01 and its main task is to combine the data listed in table 7.1.

The steps in the MDB Integrator can be summarised as follows:

1. 1.

The MDB Integrator begins by taking the CompleteSource table produced from the IGSL catalogue.

2. 2.

The source list provided by IDU, represented by the unique list of source ids in the IDU Match table, is read, and a new CompleteSource record is created for all source ids not already present in the CompleteSource table.

3. 3.

The IDU New Source records are read, and any available astrometric and photometric information is attached to the newly added CompleteSource records for these sources.

4. 4.

The information supplied by AGIS is used to update the astrometric information in the CompleteSource table for all sources for which AGIS provides an update. If AGIS provides a 2-parameter solution for a source, then the position of the source is updated in the CompleteSource table. If AGIS provides a 5-parameter solution then the position, parallax and proper motion values are updated. When an update is provided by AGIS, this replaces the existing information (from the IGSL or the IDU NewSource record).

5. 5.

The information supplied by PhotPipe is used to update the photometric information in the CompleteSource table for all sources for which PhotPipe provides an update. When an update is provided by PhotPipe, this replaces the existing information (from the IGSL or the IDU NewSource record).

6. 6.

The information supplied by CU7 variability processing is attached to the sources in the CompleteSource for which data is supplied.

7. 7.

The Integrator writes out the updated CompleteSource table, together with the Track records supplied by IDU.

The operation of the MDB Integrator is expected to become more complicated in future processing cycles when different cyclic processes evaluate different values for the properties of a same source.

## 7.3.2 Ingestion in GACS

Author(s): Enrique Utrilla

CompleteSource is a convenient format for the automated processing of information received from several different DPCs, but also a complex data structure. For this reason, the first step of the ingestion into the Gaia Archive database is to convert the CompleteSources consolidated by MDB Integrator into a simpler and flatter data structure, gaia_source, which is more suitable for publication in a table format.

The conversion of CompleteSource into gaia_source performs the following operations:

• Mapping of names in internal DPAC conventions to easier to understand names.

• Conversion of units, e.g. from radians to milliarcseconds when appropriate

• Generation of a random index that can be used to generate randomly distributed, repeatable subsets of the catalogue.

• Conversion of right ascension and declination from ICRS to ecliptic and galactic coordinates.

• Calculation of covariances between astrometric variables from the coefficients of the normal matrix.

• Calculation of the magnitude from the value of the flux.

In a similar fashion other data items, such as CU7 variability data, are converted to a format suitable for publication.

The converted data is ingested into the GACS database, including fields that are only used during the validation process. When this validation is concluded, those fields are removed to publish only data items with the expected level of quality.

## 7.3.3 Crossmatch with external catalogues

Author(s): Paola M. Marrese and Silvia Marinoni

### Introduction

The Gaia DR1 includes a precomputed crossmatch with large optical/near infrared photometric surveys. The external catalogues are here shortly described, with a particular attention on the characteristics which are important for the crossmatch. A subset of the matched external catalogues were also used by Gaia CU9 Validation and were of help in validating Gaia results.

The version of the external catalogues included in the Gaia DR1 are the ones which were used during crossmatch and validation activities: they reflect the objective they were created for, which implies that they are different from the original external catalogues in several ways. First of all they are not complete versions of the corresponding original catalogues, on the contrary they include only a subset of the available fields. In addition, we often modified original fields names, null values treatment and units. We sometimes added new fields needed by crossmatch and finally we tried to homogenize the catalogues as far as possible. Modifications were in general applied to simplify and facilitate the use of the catalogue for crossmatch purposes. In some cases the external catalogues described here were obtained from a larger set of data (SDSS dr9 was obtained from photoObj FITS data).

### The external catalogues matched with Gaia

The following is the list of External Catalogues crossmatched with Gaia DR1 catalogue:

• 2MASS PSC

• UCAC4

• GSC2.3

• SDSS DR9

• AllWISE

• PPMXL

• URAT-1

2MASS PSC

Reference paper: Skrutskie et al. (2006)

The 2MASS All-Sky Data Release contains Image and Catalog data covering 99.998% of the sky, derived from all northern and southern survey observations. The all-sky release products include a Point Source Catalog (PSC), containing positions and photometry for 470 992 970 objects, an Extended Source Catalog (XSC), containing positions, photometry and basic shape information for 1 647 599 resolved sources, most of which are galaxies, and the Image Atlas, containing over 4 121 439 J, H, and K${}_{s}$ FITS images covering the sky.

Total objects: 470 992 970 point sources (+ 1 647 599 extended)
Magnitude limit: J=16
Epoch of positions: 1997-2001
Average coordinates absolute error (stars): 70-80 mas (9 $<$ K${}_{s}<$ 14) and 120 mas (K${}_{s}<$9)
Average photometric accuracy: 5%
Completeness: $\sim$99% ( J=16.1, H=15.5, K${}_{s}$=15.1, b$>$30 deg)
Bands: J, H, K${}_{s}$
Saturation limit: J=4.5, H=4, K${}_{s}$=3.5.

Source positions are reconstructed in ICRS using Tycho-2 reference catalogue. Comparison of 2MASS with Tycho-2 and UCAC demonstrate that 2MASS positions are consistent with the ICRS with a net offset no larger that 15 mas. Position residuals of individual sources validate a typical position uncertainty for K${}_{s}<$14 sources of less than 100 mas (rms).

The accuracy of position reconstruction will be slightly poorer near the declination ends of Survey Tiles, in regions with a low density of astrometric reference stars, and near the celestial poles where the telescope tracking was least stable. The degraded accuracy is reflected in the position uncertainties quoted in the PSC.

The position errors are at $1\sigma$ level.
The effective resolution is 5 arcsec.
The Julian Date has an accuracy of $\pm 30$ sec.
Covariances (correlation coefficients) are not available.

The primary areas of confusion are:
1) longitudes $\pm 75$ degrees from the Galactic centre and latitudes $\pm 1$ degree from the Galactic plane;
2) within an approximately $5$ degrees radius of the Galactic centre.

UCAC4

Reference paper: Zacharias et al. (2013)

Original catalogue: DVD sent by author.

UCAC4 is a compiled, all-sky star catalogue covering mainly the 8 to 16 magnitude range in a single bandpass between $V$ and $R$. Positional errors are about 15 to 20 mas for stars in the 10 to 14 mag range. Proper motions have been derived for most of the about 113 million stars utilizing about 140 other star catalogues with significant epoch difference to the UCAC CCD observations. These data are supplemented by:
- 2MASS photometric data for about 110 million stars, and
- 5-band ($B$,$V$,$g$,$r$,$i$) photometry from the APASS survey (AAVSO Photometric All-Sky Survey) for over 50 million stars.
All bright stars not observed with the astrograph have been added to UCAC4 from a set of Hipparcos and Tycho-2 stars. Thus UCAC4 should be complete from the brightest stars to about R=16.

The proper motions of bright stars are based on about 140 catalogues, including Hipparcos and Tycho, as well as all catalogues used for the Tycho-2 proper motion construction. Proper motions of faint stars are based on re-reductions of early epoch SPM data ($-$90 to about $-$20 deg Dec) and NPM (PMM scans of early epoch blue plates) for the remainder of the sky.

Observations were made in a single bandpass (579-642 nm), thus the UCAC magnitudes are between Johnson $V$ and $R$.

While calculating proper motions, no attempt was made to correct data for parallaxes. This will lead to slightly inferior results for few stars with high parallax if it involves observations from largely different parallax factors. Errors in proper motions of the bright stars (to $R$$\sim$12) run from about 1 to 3  mas yr${}^{-1}$ benefited by the large epoch spans involved. For the fainter stars using SPM and NPM data, typical errors are 2 to 6  mas yr${}^{-1}$. Not all stars in UCAC4 have proper motions.

Pixel Scale: 0.9 arcsec/pixel.
Effective resolution: 2.0 arcsec.

The astrometry provided in UCAC4 is on the Hipparcos system, i.e., the International Celestial Reference System (ICRS), as represented by the Tycho-2catalogue. Positions in UCAC4 are given at the standard epoch of Julian date 2000.0, thus the UCAC4 is a compiled catalogue. In order to be able to calculate positional errors at any epoch, the central epoch, i.e., the weighted mean epoch of the data (UCAC + early epoch other catalogues) is given. At the central epoch (which varies from star to star and is also different for RA and Dec) the positional error has its smallest value: the one given in the catalogue for ”sigma position”. In most cases this central epoch will be close to the UCAC observational epoch due to the relatively large weight given to the UCAC observations. However, a fair number of stars have a vastly different mean epoch, ranging back to about 1947. The proper motions are given at the central epoch. Positional errors of stars increase according to the errors in the proper motions when going forward or backward in time from the central epoch. For objects without proper motions, the positions are at the central epoch (which actually is UCAC4 observation epoch). There are 4 982 212 stars without proper motions.

Since the publication of UCAC4 in august 2012, the authors advised to apply the following corrections:
- 2013 Mar 10, UCAC4 streak objects: Some objects in the UCAC4 catalogue are already classified as ”streak objects”. In an effort to identify artifacts in the catalogue this issue has been further investigated by others; see for example (http://www.ap-i.net/skychart/en/news/ucac4_streak)
- 2013 Feb: Data for a small number of high proper motion stars have been corrected.
(http://www.usno.navy.mil/USNO/astrometry/optical-IR-prod/ucac)

We applied both suggested corrections to the original files. The resulting UCAC4 version has $113\,728\,883$ objects.

GSC 2.3

Reference paper: Lasker et al. (2008)

Original catalogue: R. Smart, private communication

The Guide Star Catalog II (GSC-II) is an all-sky database of objects derived from the uncompressed Digitized Sky Surveys that the Space Telescope Science Institute has created from the Palomar and UK Schmidt survey plates and made available to the community. Like its predecessor (GSC-I), the GSC-II was primarily created to provide guide star information and observation planning support for Hubble Space Telescope. Two catalogues have already been extracted from the GSC-II database and released to the astronomical community. A magnitude-limited (Rf = 18.0) version, GSC2.2, was distributed soon after its production in 2001, while the GSC2.3 release has been available for general access since 2007. The GSC2.3 catalogue contains astrometry, photometry, and classification for 945 592 683 objects down to the magnitude limit of the plates. Positions are tied to the International Celestial Reference System; for stellar sources, the all-sky average absolute error per coordinate ranges from 0.2 to 0.28 arcsec depending on magnitude. When dealing with extended objects, astrometric errors are 20% worse in the case of galaxies and approximately a factor of 2 worse for blended images. Stellar photometry is determined to a 0.13-0.22 mag accuracy as a function of magnitude and photographic pass bands (Rf, Bj, In). Outside of the galactic plane, stellar classification is reliable to at least 90% confidence for magnitudes brighter than Rf = 19.5, and the catalogue is complete to Rf = 20.

SDSS DR9

Reference paper: Ahn et al. (2012)

The SDSS imaging camera took its first science quality data the night of September 19, 1998, and was the world’s most productive wide-field imaging facility until its last night of science quality data on November 18, 2009. In between it took a total of around 35 000 square degrees of images, covering a unique footprint of 14 055 square degrees of sky. Through the BOSS & SEGUE Surveys, Data Release 10 does not include any new or updated imaging data, but includes all prior imaging SDSS imaging data.

SDSS DR9 Catalogue, primary object only, extracted from photoObj FITS files. There are no changes in the photometric reduction since DR9 (i.e., DR10, DR11, and DR12 photoObj). The calibrated object lists reports positions, fluxes, and shapes of all objects detected at $>$5 sigma on the survey images.

The photoObj data has photometric and astrometric calibrations applied, and contains enough information to select unique objects and to perform quality cuts.

The $r$ photometric CCDs serve as the astrometric reference CCDs for the SDSS. That is, the positions for SDSS objects are based on the $r$ centroids and calibrations. The $r$ CCDs are calibrated by matching up bright stars detected by SDSS with the UCAC astrometric reference catalogues. Stars detected on the $r$ CCDs are matched directly with stars in the United States Naval Observatory CCD Astrograph Catalog (UCAC2; Zacharias et al. 2004), which has a precision of 70 mas at its catalogue limit of $r=16$, and systematic errors of less than 30 mas. UCAC2 extends up to around a declination of 41 degrees north. Outside the UCAC2 area we use an ”internal” UCAC data release known as ”r14”. Together UCAC2 and r14 cover the whole sky. There are approximately 2-3 magnitudes of overlap between UCAC and unsaturated stars on the $r$ CCDs. The astrometric CCDs are not used. The $r$ CCDs are calibrated directly against the primary astrometric reference catalogue.

SDSS should be complete to magnitude $r=22$.

There are 23945 objects with position errors in either RA or DEC larger than 10 arcsec (with max values around 14 degrees). For those stars a reliable match is very difficult as there would be a huge amount of neighbours. We thus decided to delete those objects. The number of objects in the SDSS DR9 version used for XM activities is thus 469 029 929.

AllWISE

Reference papers: Wright et al. (2010), Mainzer et al. (2011), Cutri and et al. (2013)

The AllWISE program extends the work of the successful Wide-field Infrared Survey Explorer mission by combining data from the cryogenic and post-cryogenic survey phases to form the most comprehensive view of the mid-infrared sky currently available. AllWISE has produced a new Source Catalog and Image Atlas with enhanced sensitivity and accuracy compared with earlier WISE data releases. Advanced data processing for AllWISE exploits the two complete sky coverages to measure source motions for each Catalog source, and to compile a massive database of light curves for those objects.

The AllWISE Source Catalog contains accurate positions, motion measurements, photometry and ancillary information for 747 634 026 objects that were detected on the deep, coadded AllWISE Atlas Images. Once detected, sources positions and fluxes were measured by fitting PSF templates simultaneously to the ”stack” of all Single-exposure images in all WISE bands that cover their locations. The W1 and W2 depth-of-coverage is generally a factor of two greater than that for W3 and W4 in the AllWISE Source Catalog. AllWISE combined W1 and W2 Single-exposure images from the WISE 4-Band Cryo, 3-Band Cryo and NEOWISE Post-Cryo survey phases, and W3 and W4 images from the 4-Band Cryo phase only. The additional epoch of W1 and W2 coverage accentuates the weight of those two bands in determining source properties such as position and motion. The additional epoch of W1 and W2 Single-exposure observations fill in most low-coverage areas for the AllWISE Catalog, but there are still small gaps in the effective W3 and W4 coverage.

The AllWISE Source Catalog contains both point-like and resolved sources. The AllWISE Source Catalog is not a ”Point Source” catalogue. It contains detections of point-like objects, such as stars and unresolved galaxies, as well as resolved sources such as close multiple stars, galaxies, and detections of sections of large nearby galaxies and clumps or filaments in Galactic nebulosity, as long as they meet the catalogue selection criteria.

Very bright stars suppress the detection of fainter sources in their vicinity. The AllWISE Source Catalog contains unreliable entries.The reliability of AllWISE Source Catalog is estimated to be $>$99.9% for sources brighter than SNR=20 in unconfused regions of the sky. The fractional reliability decreases for fainter objects and in regions where there is less coverage. The original WISE astrometric requirements were with respect to the 2MASS catalogue and that catalogue includes no proper motions to account for the decade between the 2MASS and WISE epochs. Tying the WISE solution directly to 2MASS meant that the effects of systematic proper motion shifts between the two catalogue epochs, which approached 200 mas in some sky positions, was imprinted on the All-Sky positions. AllWISE addresses this issue by making use of proper motion data from the UCAC4 catalogue to adjust 2MASS positions before they are used as reference stars. Limiting the reference stars used to those which have good quality UCAC4 proper motions reduced the number of reference stars available.

Most source positions are dominated by W1. Extremely red objects with the highest SNR flux measurements in the W3 and/or W4 bands may have a small astrometric bias with respect to bluer objects. Because of a small residual band-to-band offset that was not removed by the Multiframe Position Reconstruction improvements for AllWISE, the reconstructed position of rare, very red sources that are detected primarily in W3 and/or W4 may be offset from bluer sources by up to $\sim$70 mas in the in-scan (ecliptic longitude) direction. The sign of the offset is in the sense that the ecliptic longitude positions of very red sources will be slightly larger than bluer sources.

The sky coverage depth for sources in the AllWISE catalogue is approximately twice as large in W1 and W2 as it is in W3 and W4. AllWISE combined W1 and W2 Single-exposure images from the WISE 4-Band Cryo, 3-Band Cryo and NEOWISE Post-Cryo survey phases, and W3 and W4 images from the 4-Band Cryo phase only. The additional epoch of W1 and W2 coverage accentuates the weight of those two bands in determining source properties such as position and motion. It was thus decided to use the average mJD of W1 observation as the refEpoch for crossmatch purposes. When W1 epoch is not available ($\sim$10 000 sources), then the average mJD of W2, W3 or W4 observation (in this order) was used as refEpoch.

PPMXL

Reference papers: PPMXL: Roeser et al. (2010); PPMX: Röser et al. (2008)

Original Catalogue: VO access: http://vo.uni-hd.de/ppmxl

PPMXL is a new determination of mean positions and proper motions on the ICRS system obtained by combining USNO-B1.0 and 2MASS astrometry. PPMXL aims to be completed from the brightest stars down to about $V$ $\sim$ 20 all sky. The resulting typical individual mean errors of the proper motions range from 4  mas yr${}^{-1}$ to more than 10  mas yr${}^{-1}$ depending on observational history. The mean errors of positions at epoch 2000.0 are 80-120 mas, if 2MASS astrometry could be used, 150-300 mas else. We also give correction tables to convert USNO-B1.0 observations of, e.g., minor planets to the ICRS system.

USNO-B1.0 contains more than a billion entries: stars and galaxies, and a number of artifacts. Spurious entries in USNO-B1.0 (that are caused by diffraction spikes and circular reflection halos around bright stars in the original imaging data) have been detected. These defects, numbering some 24 million or 2.3% of the catalogue objects, were removed. The final version of PPMXL contains some 900 million stars. An entry from USNO-B1.0 was kept whenever the maximum epoch difference between the observations was larger than 10 years. This somewhat arbitrary choice was guided by the idea to formally derive proper motions even if a star has only observations from 2MASS and the second epoch POSS, whereas no observations from the first epoch POSS are available. Because of this short epoch difference, these stars have large mean errors of proper motions, and they have to be used with care.

At its bright end, PPMXL is merged with PPMX. The stars of PPMX were searched in PPMXL using a cone with 1.5 arcsec radius. When no match was found, the respective PPMX star was added to the catalogue. This mainly happened in the case of bright stars. When a match has been found, the PPMX star is selected if the mean error of its proper motion is smaller than that of the PPMXL star, and vice versa. If a PPMX star is added to the catalogue, all PPMXL matches within 1.5 arcsec are deleted.

PPMXL contains $910\,468\,710$ entries, including stars, galaxies, and bogus entries. Of these, 412 410 368 are in 2MASS, i.e., 2MASS is used to determine proper motions and the J, H, Ks magnitudes are given in the catalogue. In total, 6 268 118 stars are taken from PPMX, so PPMXL aims to be complete from the brightest stars down to about 20th magnitude in $V$.

The covariance matrix obtained with a least-squares adjustment gives (per coordinate and per star) the mean epoch, the mean error of position at mean epoch, and the mean error of proper motions. All these quantities are published in the catalogue. Mean errors of the positions at the reference epoch 2000.0 can be computed star by star. On average, the mean errors of position 2000.0 are between 80 and 120 mas if 2MASS astrometry is available, and range from 150 mas to 300 mas else.

PPMXL is a catalogue that is nominally on the ICRS system. It is linked to the Hipparcos catalogue, the optical representation of the ICRS, via Tycho-2and PPMX.

According to the reference paper (Roeser et al. 2010), the PPMXL catalogue contains $910\,468\,710$ entries. The original catalogue available at the the following link http://vo.uni-hd.de/ppmxl contains $910\,468\,688$ entries. The CDS version contains $910\,469\,430$ entries. The version used for the crossmatch is consistent with the original catalogue downloaded from http://vo.uni-hd.de/ppmxl (i.e., $910\,468\,688$ entries).

There seems to be a small fraction of stars with extremely large magnitudes (up to $\sim$65.5).

URAT1

Reference paper: Zacharias et al. (2015)

Original catalogue: CDS (ftp://cdsarc.u-strasbg.fr/pub/cats/I/329/)

URAT (USNO Robotic Astrometric Telescope) is a follow-up project to the successful UCAC project using the same astrograph but with a much larger focal plane array and a bandpass shifted further to the red. Longer integration times and more sensitive, backside CCDs allowed for a substantial increase in limiting magnitude, resulting in about 4-fold increase in the average number of stars per square degree as compared to UCAC. Additional observations with an objective grating largely extend the dynamic range to include observations of stars as bright as about 3rd magnitude. Multiple sky overlaps per year result in a significant improvement in positional precision as compared to UCAC.

URAT-1 is an observational catalogue at a mean epoch between 2012.3 and 2014.6; it covers the magnitude range 3 to 18.5 in $R$-band, with a positional precision of 5 to 40 mas. It covers most of the northern hemisphere and some areas down to $-$24.8 degrees in declination.

### Crossmatch results

The crossmatch results for a given external catalogue are presented in two different tables: the BestNeighbour and the Neighbourhood.

While for each matched Gaia object, the BestNeighbour table contains a single entry, i.e., the neighbour with the highest value of the figure of merit (obtained with a likelihood ratio implementation), the Neighbourhood contains all good neighbours, i.e., objects whose position error ellipses overlap within a 5$\sigma$ confidence level with the given Gaia object. For the TGAS sub-sample the proper motions were used in the crossmatch computations. The BestNeighbour table includes the angular distance, the number of mates and the number of neighbours (listed in the Neighbourhood). The mates of a given Gaia object are defined as other Gaia objects with the same bestNeighbour in the external catalogue. The presence of mates is allowed by the crossmatch algorithm we used which is of the many-to-one kind. True mates should be objects resolved by Gaia which were unresolved in the external catalogue, which in general has a much lower angular resolution compared to Gaia. The Neighbourhood table contains the angular distance and the figure of merit (named score) for each good neighbour. The figure of merit strongly depends on the angular distance, but it depends also on positional errors of both Gaia and the external catalogue and on the local density of the external catalogue.

Details on the crossmatch algorithm are given in Marrese et al. (2017).