9.2.3 Filtering and Ingestion in the Gaia Archive

Author(s): Enrique Utrilla

The records generated during the integration are stored in an internal format which is convenient for storage and processing, but is also a complex data structure. For this reason, the first step of the ingestion into the Gaia Archive database is to convert these record into a simpler and flatter data structure, gaia_source (see Section 14.1.1), which is more suitable for publication in a table format.

The conversion of the integrated records into gaia_source performs the following operations:

  • mapping of names in internal DPAC conventions to easier to understand names,

  • conversion of units, e.g. from radians to milliarcseconds (mas) when appropriate,

  • generation of a random index that can be used to generate randomly distributed, repeatable subsets of the catalogue,

  • conversion of right ascension and declination from ICRS to ecliptic and galactic coordinates,

  • calculation of covariances between astrometric variables from the coefficients of the normal matrix,

  • calculation of the magnitude from the value of the flux,

  • etc.

During this conversion, some preliminary quality checks are applied. Sources that meet any of the following criteria are discarded rather than converted, and are excluded from any further validation as candidates to be published in the archive:

  • Sources which did not converged to a solution in AGIS-02,

  • Sources whose astrometry was computed in AGIS-01 and did not get an update in AGIS-02,

  • Sources with less than five individual observations,

  • Sources with excess noise greater than 20 mas,

  • Sources for which the longest semi-major axis of the position error ellipse is greater than or equal to 100 mas,

  • Sources without photometry data in the G band generated by CU5 during the cycle 2 with at least ten individual observations.

Additional filters are applied to remove partial data (e.g. BP/RP photometry) from some sources even if the sources themselves are tentatively allowed for publication.

In a similar fashion other data items, such as detailed photometry and variability data, are converted to a format suitable for publication, after removing entries related to sources filtered out during the conversion of gaia_source.

The converted data is ingested into the Archive database, including fields that are only used during the validation process and are removed before the release. As a result of the validation, a new list of sources and/or fields within the sources to clean up is generated and applied to the dataset to publish only data items with the expected level of quality.