10.7.4 Crossmatch with external catalogues

Author(s): Paola Maria Marrese, Silvia Marinoni, Michele Fabrizio, Giuseppe Altavilla


The Gaia DR2 includes a pre-computed crossmatch with optical/near infra-red photometric and spectroscopic surveys. A subset of the matched external catalogues were also used by Gaia CU9 Validation and were of help in validating Gaia results.

The external catalogues matched with Gaia DR2 are all obtained in the optical/near-IR wavelength region (with the exception of allWISE (Cutri et al. 2013) which extends in the medium-IR domain), are general surveys not restricted to a specific class of objects and have an angular resolution lower than Gaia. The external catalogues are not homogeneous enough among themselves to use exactly the same algorithm for all of them. We thus broadly separated the external catalogues into two different groups: large dense surveys and sparse catalogues, and we defined two slightly different algorithms for the two groups. External catalogues are defined as dense surveys when it is possible to define a precise (i.e. based on a reasonable number of objects) and accurate (i.e. local) density around the majority of their objects.

The algorithms we define are both not symmetric: for dense surveys we use Gaia DR2 as the leading catalogue, while for sparse catalogues we use Gaia DR2 as the second catalogue. This means that for dense surveys for each Gaia DR2 source possible counterparts are searched for in the external catalogue, while for sparse catalogues for each external catalogue source possible counterparts are searched for in Gaia DR2. Both cross-match algorithms are positional only and use the Gaia DR2 positions, their errors, their correlation and, when available, parallax, proper motions and the corresponding errors and correlations.

Details on the crossmatch algorithms are given in Marrese et al. 2018 (in preparation) and in Marrese et al. (2017).

The external catalogues matched with Gaia

The external catalogues are here briefly described, with a particular attention on the characteristics which are important for the crossmatch. The version of the external catalogues included in the Gaia DR2 are the ones which were defined and used during crossmatch and validation activities: they reflect the objective they were created for, which implies that they are different from the original external catalogues in several ways. First of all they are not complete versions of the corresponding original catalogues, on the contrary they include only a subset of the available fields, with the exception of Hipparcos-2 (van Leeuwen 2007) and Tycho-2 (Høg et al. 2000). In addition, we often modified original fields names, null values treatment and units. Modifications were in general applied to simplify and facilitate the use of the catalogue for crossmatch purposes. In some cases the external catalogues described here were obtained from a larger set of data (i.e. SDSS dr9, Ahn et al. (2012) or Pan-STARRS1, Chambers et al. (2016b)).

The following is the list of External Catalogues classified as dense surveys and crossmatched with Gaia DR2 catalogue:

  • GSC2.3


  • SDSS DR9

  • URAT-1

  • Pan-STARRS1



  • AllWISE

The following is the list of External Catalogues classified as sparse catalogues and crossmatched with Gaia DR2 catalogue:

  • Hipparcos-2

  • Tycho-2

  • RAVE DR5

GSC 2.3

Reference paper: Lasker et al. (2008)

Original catalogue: R. Smart, private communication

The Guide Star Catalog II (GSC-II) is an all-sky database of objects derived from the uncompressed Digitized Sky Surveys that the Space Telescope Science Institute has created from the Palomar and UK Schmidt survey plates and made available to the community. Like its predecessor (GSC-I), the GSC-II was primarily created to provide guide star information and observation planning support for Hubble Space Telescope. Two catalogues have already been extracted from the GSC-II database and released to the astronomical community. A magnitude-limited (Rf = 18.0) version, GSC2.2, was distributed soon after its production in 2001, while the GSC2.3 release has been available for general access since 2007. The GSC2.3 catalogue contains astrometry, photometry, and classification for 945 592 683 objects down to the magnitude limit of the plates. Positions are tied to the International Celestial Reference System; for stellar sources, the all-sky average absolute error per coordinate ranges from 0.2 to 0.28 arcsec depending on magnitude. When dealing with extended objects, astrometric errors are 20% worse in the case of galaxies and approximately a factor of 2 worse for blended images. Stellar photometry is determined to a 0.13-0.22 mag accuracy as a function of magnitude and photographic pass bands (Rf, Bj, In). Outside of the galactic plane, stellar classification is reliable to at least 90% confidence for magnitudes brighter than Rf = 19.5, and the catalogue is complete to Rf = 20.

There are 3 350 256 objects in GSC 2.3 with RA and Dec errors equal to 0, while it is mandatory to have errors on coordinates in order to run the cross-match. For the sake of completeness, these objects were not deleted, but we assigned to them the largest position error found in the catalogue (i.e. 1.6 arcsec).


Reference papers: PPMXL: Roeser et al. (2010), PPMX: Röser et al. (2008)

Original Catalogue: VO access http://vo.uni-hd.de/ppmxl

PPMXL is a new determination of mean positions and proper motions on the ICRS system obtained by combining USNO-B1.0 and 2MASS astrometry. The resulting typical individual mean errors of the proper motions range from 4  mas yr-1 to more than 10  mas yr-1 depending on observational history. The mean errors of positions at epoch 2000.0 are 80-120 mas, if 2MASS astrometry could be used, 150-300 mas otherwise.

USNO-B1.0 contains more than a billion entries: stars and galaxies, and a number of artefacts. These defects, numbering some 24 million or 2.3% of the catalogue objects, were removed. The final version of PPMXL contains some 900 million stars. At its bright end, PPMXL is merged with PPMX. In total, 6 268 118 stars are taken from PPMX, so PPMXL aims to be complete from the brightest stars down to about 20th magnitude in V.

PPMXL is a catalogue that is nominally on the ICRS system. It is linked to the Hipparcos catalogue, the optical representation of the ICRS, via Tycho-2 and PPMX.

The covariance matrix obtained with a least-squares adjustment gives (per coordinate and per star) the mean epoch, the mean error of position at mean epoch, and the mean error of proper motions. All these quantities are published in the catalogue. Mean errors of the positions at the reference epoch 2000.0 can be computed star by star. On average, the mean errors of position 2000.0 are between 80 and 120 mas if 2MASS astrometry is available, and range from 150 mas to 300 mas else.

According to the reference paper (Roeser et al. 2010), the PPMXL catalogue contains 910 468 710 entries. The original catalogue available through VO access contains 910 468 688 entries. The version used for the crossmatch is consistent with the latter one.


Reference paper: Ahn et al. (2012)

The SDSS imaging camera took its first science quality data the night of September 19, 1998, and was the world’s most productive wide-field imaging facility until its last night of science quality data on November 18, 2009. In between it took a total of around 35 000 square degrees of images, covering a unique footprint of 14 055 square degrees of sky.

The r photometric CCDs serve as the astrometric reference CCDs for the SDSS. That is, the positions for SDSS objects are based on the r centroids and calibrations. The r CCDs are calibrated by matching up bright stars detected by SDSS with the UCAC astrometric reference catalogues. Stars detected on the r CCDs are matched directly with stars in the United States Naval Observatory CCD Astrograph Catalog (UCAC2, Zacharias et al. 2004), which has a precision of 70 mas at its catalogue limit of r=16, and systematic errors of less than 30 mas. UCAC2 extends up to around a declination of 41 degrees north. Outside the UCAC2 area SDSS DR9 uses an ”internal” UCAC data release known as ”r14”. Together UCAC2 and r14 cover the whole sky.

SDSS should be complete to magnitude r=22.

The SDSS DR9 catalogue matched with Gaia DR2 contains primary object only, extracted from photoObj FITS files. The photoObj data has photometric and astrometric calibrations applied, and contains enough information to select unique objects and to perform quality cuts.

There are 23 945 objects with position errors in either RA or DEC larger than 10 arcsec (with max values around 14 degrees). For those stars a reliable match is very difficult as there would be a huge amount of neighbours. We thus decided to delete those objects. The number of objects in the SDSS DR9 version used for XM activities is thus 469 029 929.

There are also a number of stars with extremely faint magnitudes (fainter that mag 30), we kept those objects, even if they are quite unreliable.

In the original documentation it is stated that ”objID needs to be cast as unsigned 64-bit, though in many files we waste a few bytes and write it as a string to avoid FITS compliance issues”. Since we obtained the OriginalValid from the photoObj fits files, we kept it as a CHAR(19).


Reference paper: Zacharias et al. (2015)

Original catalogue: CDS (ftp://cdsarc.u-strasbg.fr/pub/cats/I/329/)

URAT (USNO Robotic Astrometric Telescope) is a follow-up project to the successful UCAC project using the same astrograph but with a much larger focal plane array and a bandpass shifted further to the red. Longer integration times and more sensitive, backside CCDs allowed for a substantial increase in limiting magnitude, resulting in about 4-fold increase in the average number of stars per square degree as compared to UCAC. Additional observations with an objective grating largely extend the dynamic range to include observations of stars as bright as about 3rd magnitude. Multiple sky overlaps per year result in a significant improvement in positional precision as compared to UCAC.

URAT-1 is an observational catalogue at a mean epoch between 2012.3 and 2014.6; it covers the magnitude range 3 to 18.5 in R-band, with a positional precision of 5 to 40 mas. It covers most of the northern hemisphere and some areas down to -24.8 degrees in declination.


Reference papers: Chambers et al. (2016b); Magnier et al. (2016a); Waters et al. (2016); Magnier et al. (2016c, b); Flewelling et al. (2016)

The PS1 survey observed the 30 000 square degrees of sky at declination higher than -30 degrees in five broad bands filters (g,r,i,z,y) from magnitude 13 to magntude 23.

The Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) is a system for wide-field astronomical imaging developed and operated by the Institute for Astronomy at the University of Hawaii. Pan-STARRS1 (PS1) is the first part of Pan-STARRS to be completed and is the basis for Data Release 1 (DR1). The PS1 survey used a 1.8 meter telescope and its 1.4 Gigapixel camera to image the sky in five broadband filters (g, r, i, z, y).

The Pan-STARRS1 version prepared for the crossmatch with Gaia DR2, contains a filtered subsample of the 10 723 304 629 entries listed in the original ObjectThin table.
We used only ObjectThin and MeanObject tables to extract Pan-STARRS1 table, this means that objects detected only in stack images are not included here. The main reason for us to avoid the use of objects detected in stack images is that their astrometry is not as good as the mean objects astrometry: ”The stack positions (raStack, decStack) have considerably larger systematic astrometric errors than the mean epoch positions (raMean, decMean).” The astrometry for the MeanObject positions uses Gaia DR1 as a reference catalogue, while the stack positions use 2MASS as a reference catalogue.

In details, we filtered out all objects where:

  • nDetections = 1

  • no good quality data in Pan-STARRS, objInfoFlag 33554432 not set

  • mean astrometry could not be measured, objInfoFlag 524288 set

  • stack position used for mean astrometry, objInfoFlag 1048576 set

  • error on all magnitudes equal to 0 or to -999;

  • all magnitudes set to -999;

  • error on RA or DEC greater than 1 arcsec.

The number of objects in Pan-STARRS1 version prepared for the crossmatch with Gaia DR2 is 2 264 263 282.

The Pan-STARRS1 version prepared for the crossmatch with Gaia DR2 contains only a subset of the columns available in the combined ObjectThin and MeanObject tables. A description of the original ObjectThin and MeanObjects tables can be found at: https://outerspace.stsci.edu/display/PANSTARRS/PS1+Database+object+and+detection+tables


Reference paper: Henden et al. (2016)

The AAVSO Photometric All Sky Survey (APASS) project is designed to bridge the gap between the shallow Tycho2 two-bandpass photometric catalogue that is complete to V=11 and the deeper, but less spatially-complete catalogues like SDSS or PanSTARRS. It can be used for calibration of a specific field; for obtaining spectral information about single sources, determining reddening in a small area of the sky; or even obtaining current-epoch astrometry for rapidly moving objects.

The survey is being performed at two locations: near Weed, New Mexico in the Northern Hemisphere; and at CTIO in the Southern Hemisphere. Each site consists of dual bore-sighted 20cm telescopes on a single mount, designed to obtain two bandpasses of information simultaneously. Each telescope covers 2.9×2.9 square degrees of sky with 2.57 arcsec/pixels, with the main survey taken with B, V, g’, r’, i’ filters and covering the magnitude range 10<V<17. A bright extension is under way, saturating at V=7 and extending the wavelength coverage from u’ to Y. The faint completeness limit is V=16. The APASS Data Release 9 contains approximately 62 million stars, about 99% of the sky.

There are some issues in the catalogue which should be taken into account when crossmatching it:

  • APASS team is not providing star IDs until the final product and suggest to identify stars by their RA and DEC.

  • There are a number of duplicate entries. These appear to be caused by the merging process, where poor astrometry in one field may cause two seed centroids to form for a single object.

  • There are a number of entries with 0.000 errors.

  • Centroiding in crowded fields is very poor, blends cause photometric errors as well as astrometric ones.

  • There are saturated stars in the catalogue and the APASS team suggests not to use sources brighter than V=7.


Reference paper: van Leeuwen (2007)

Original catalogue: CDS (ftp://cdsarc.u-strasbg.fr/pub/cats/I/311/),

A new reduction of the astrometric data as produced by the Hipparcos mission has been published, claiming accuracies for nearly all stars brighter than magnitude Hp = 8 to be better, by up to a factor 4, than in the original catalogue.
The formal errors on the parallaxes for the new catalogue are confirmed. The presence of a small amount of additional noise, though unlikely, cannot be ruled out.
The new reduction of the Hipparcos astrometric data provides an improvement by a factor 2.2 in the total weight compared to the catalogue published in 1997, and provides much improved data for a wide range of studies on stellar luminosities and local galactic kinematics.

Hipparcos-2 contains 117 955 sources.


Reference paper: Høg et al. (2000)

Original catalogue: CDS (ftp://cdsarc.u-strasbg.fr/pub/cats/I/259/)

The Tycho-2 Catalogue is an astrometric reference catalogue containing positions and proper motions as well as two-colour photometric data for the 2.5 million brightest stars in the sky. The Tycho-2 positions and magnitudes are based on precisely the same observations as the original Tycho Catalogue (hereafter Tycho-1) collected by the star mapper of the ESA Hipparcos satellite, but Tycho-2 is much bigger and slightly more precise, owing to a more advanced reduction technique. Components of double stars with separations down to 0.8 arcsec are included. Proper motions precise to about 2.5 mas/yr are given as derived from a comparison with the Astrographic Catalogue and 143 other ground-based astrometric catalogues, all reduced to the Hipparcos celestial coordinate system. Tycho-2 supersedes in most applications Tycho-1, as well as the ACT and TRC catalogues based on Tycho-1.

  • Total objects: 2 539 913

  • Magnitude limit: VT=11.5

  • Epoch of positions: 1989.85-1993.21 (brought in most cases to epoch 2000.0)

  • Average coordinates absolute error (stars): 7-60 mas

  • Average photometric accuracy: 0.013-0.10 mag

  • Completeness:   90 % (V=11.5)   99 % (V=11)

  • Bands: BT,VT

  • Saturation limit: BT= 2.1, VT=1.9

For cross-match purposes, for the stars with proper motions, the reference epoch is 2000.0, while for the stars with no proper motions the coordinates are the Tycho observed ones and the reference epoch is set to (raObsEpoch+deObsEpoch)/2.0.


Reference paper: Skrutskie et al. (2006)

The 2MASS All-Sky Data Release contains Image and Catalogue data covering 99.998% of the sky, derived from all northern and southern survey observations. The all-sky release products include a Point Source Catalog (PSC), containing positions and photometry for 470 992 970 objects, an Extended Source Catalog (XSC), containing positions, photometry and basic shape information for 1 647 599 resolved sources, most of which are galaxies, and the Image Atlas, containing over 4 121 439 J, H, and Ks FITS images covering the sky.

  • Total objects: 470 992 970 point sources

  • Magnitude limit: J=16

  • Epoch of positions: 1997-2001

  • Average coordinates absolute error (stars): 70-80 mas (9 < K<s 14) and 120 mas (K<s9)

  • Average photometric accuracy: 5%

  • Completeness: 99% ( J=16.1, H=15.5, Ks=15.1, b>30 deg)

  • Bands: J, H, Ks

  • Saturation limit: J=4.5, H=4, Ks=3.5.

Source positions are reconstructed in ICRS using Tycho-2 reference catalogue. Comparison of 2MASS with Tycho-2 and UCAC demonstrate that 2MASS positions are consistent with the ICRS with a net offset no larger that 15 mas. Position residuals of individual sources validate a typical position uncertainty for K<s14 sources of less than 100 mas (rms).

The accuracy of position reconstruction will be slightly poorer near the declination ends of Survey Tiles, in regions with a low density of astrometric reference stars, and near the celestial poles where the telescope tracking was least stable. The degraded accuracy is reflected in the position uncertainties quoted in the PSC.

  • The position errors are at 1σ level.

  • The effective resolution is 5 arcsec.

  • The Julian Date has an accuracy of ±30 sec.

  • Covariances (correlation coefficients) are not available.

The primary areas of confusion are:
1) longitudes ±75 degrees from the Galactic centre and latitudes ±1 degree from the Galactic plane;
2) within an approximately 5 degrees radius of the Galactic centre.


Reference paper: Kunder et al. (2017)

Data Release 5 (dr5) of the Radial Velocity Experiment (RAVE) is the fifth data release from a magnitude-limited (9<I<12) survey of stars randomly selected in the southern hemisphere. The RAVE medium-resolution spectra (R=7500) covering the Ca-triplet region(8410-8795Å) span the complete time frame from the start of RAVE observations in 2003 to their completion in 2013. Radial velocities from 520 781 spectra of 457 588 unique stars are presented, of which 215 590 unique stars have parallaxes and proper motions from the Tycho-Gaia astrometric solution (TGAS) in Gaia DR1. For RAVEdr5 catalogue, stellar parameters (effective temperature, surface gravity, overall metallicity) are computed using the RAVEdr4 stellar pipeline, but calibrated using recent K2 Campaign 1 seismic gravities and Gaia benchmark stars, as well as results obtained from high-resolution studies. Also included are temperatures from the Infrared Flux Method, and it is provided a catalogue of red giant stars in the dereddened colour (J-Ks)0 interval (0.50,0.85) for which the gravities were calibrated based only on seismology. Further data products for sub-samples of the RAVE stars include individual abundances for Mg, Al, Si, Ca, Ti, Fe, and Ni, and distances found using isochrones. This is the first RAVE data release in which an error spectrum was generated for each RAVE observations, so a realistic uncertainty and probability distribution functions for the derived radial velocities and stellar parameters are provided. The RAVE spectra were taken using the multi-object spectrograph 6dF (6 degree fields) on the 1.2 m UK Schmidt Telescope of the Australian Astronomical Observatory (AAO). Each fibre has a diameter of 100 μm (6.7 arcsec in the sky) and can be placed accurately (to within 10 μm, or 0.7 arcsec) on star positions anywhere within the 6 degrees diameter field.

Crossmatch output

For each external catalogue, the crossmatch results are presented in two separate tables: a BestNeighbour table which lists the leading catalogue matched objects with their best neighbour and a Neighbourhood table which includes all good neighbours for each matched object.

While for each matched leading catalogue object, the BestNeighbour table contains a single entry, i.e. the neighbour with the highest value of the figure of merit (FoM), the Neighbourhood contains all good neighbours, i.e., objects whose position error ellipses overlap within a 5σ confidence level with the leading catalogue target. The leading catalogue is Gaia DR2 for dense surveys, while it is the external catalogue for sparse catalogues.

For dense surveys, the BestNeighbour table includes the angular distance, the number of mates, the number of neighbours (listed in the Neighbourhood table), the bestNeighbourMultiplicity and the gaiaAstrometricParams. The mates of a given Gaia object are defined as other Gaia objects with the same bestNeighbour in the external catalogue. The presence of mates is allowed by the crossmatch algorithm we used which is of the many-to-one kind. True mates should be objects resolved by Gaia which were unresolved in the external catalogues, since in general they have a much lower angular resolution compared to Gaia. The best-match of a Gaia source in an external catalogue is the external catalogue object that has the highest value of the FoM. As the FoM is based on positional and density properties, it happens that there is more than one source in the external catalogue with the same FoM value. Even if a single best-match is always chosen, the bestNeighbourMultiplicity indicates if there were more ”best” neighbours. Those neighbours can be found in the Neighbourhood table. The gaiaAstrometricParams indicates the number of Gaia astrometric parameters which were available in Gaia for a given source. This field is set to 2 when only RA and DEC where available, while is set to 5 when RA, DEC, pmRA, pmDEC and parallax are available and thus used to propagate a Gaia source position to the External Catalogue source coordinates epoch.

For sparse catalogues, the BestNeighbour table does not include the number of mates and the bestNeighbourMultiplicity. Mates should not be defined since the external catalogues have a lower resolution compared to Gaia and the algorithm used is of the one-to-one kind. The bestNeighbourMultiplicity is not needed since there are no two or more different Gaia DR2 sources with exactly the same astrometry and thus the same FoM.

For both dense and sparse catalogues, the Neighbourhood table contains the angular distance, the FoM (named score) and the gaiaAstrometricParams for each good neighbour.

Details on the crossmatch output and results are given in Marrese et al. 2018 (in preparation).