skip to main content

gaia early data release 3 documentation

7.2 Data Consolidation

7.2.3 Filtering and Ingestion in the Gaia Archive

Author(s): Enrique Utrilla

The records generated during the integration are stored in an internal format which is convenient for storage and processing, but is also a complex data structure. For this reason, the first step of the ingestion into the Gaia Archive database is to convert these record into a simpler and flatter data structure, gaia_source (see Section 13.1.1), which is more suitable for publication in a table format.

The conversion of the integrated records into gaia_source performs the following operations:

  • mapping of table and field names in internal DPAC conventions into labels that are easier to understand,

  • conversion of units, e.g. from radians to milliarcseconds (mas) when appropriate,

  • generation of a random index that can be used to generate randomly distributed, repeatable subsets of the catalogue,

  • conversion of right ascension and declination from ICRS to ecliptic and galactic coordinates,

  • calculation of covariances between astrometric variables from the coefficients of the normal matrix,

  • calculation of the magnitude from the value of the flux,

  • etc.

During this conversion, some preliminary quality checks are applied. Sources that meet any of the following criteria are discarded rather than converted, and are excluded from any further validation as candidates to be published in the archive:

  • Sources which did not converged to a solution in AGIS-03,

  • Sources with less than five individual observations used for the astrometric solution,

  • Sources for which the longest semi-major axis of the position error ellipse is greater than or equal to 100 mas,

  • Sources marked as duplicates of another source that was selected into the catalogue.

Additional filters are applied to remove partial data (e.g. G/BP/RP photometry) from some sources even if the sources themselves are tentatively allowed for publication. As a difference with respect to Gaia DR2, sources for which the G-band photometry is not considered suitable for publication have been kept.

As described in Section 6.1.1, the radial velocities computed in this cycle will be published in the next release whereas in the current release, the radial velocities already published in Gaia DR2 have been used instead, whenever possible. Some Gaia DR2 radial velocities have not been used if they had been assigned to a source that has dissapeared in Gaia EDR3 e.g. due to it being split into two or more separate sources, or being merged into another. Also some radial velocities found to be problematic since the release of Gaia DR2 were excluded.

Regarding the simulated data, simulated sources with a parallax larger than 500 mas, closer than 1.3 parsecs or with a too large radius were excluded.

The converted data is ingested into the Archive database, including fields that are only used during the validation process and are removed before the release. As a result of the validation, a new list of sources and/or fields within the sources to clean up is generated and applied to the dataset to publish only data items with the expected level of quality.