13.2.3 Filtering and Ingestion in the Gaia Archive
Author(s): Enrique Utrilla
The records generated during the integration are stored in an internal format which is convenient for processing, but is also a complex data structure. For this reason, the first step of the ingestion into the Gaia Archive database is to convert these record into a simpler and flatter data structure, gaia_source (see Section 20.1.1), which is more suitable for publication in a table format.
Typically this conversion involves the selection of fields to be published and the adaptation of format and units, such as the conversion of radians into degrees or milliarcseconds (mas), or the rearrangement of matrices into arrays. In some cases more complex transformations are applied, such as computing correlation matrices from the covariances, magnitudes from the fluxes, or the coordinates of the sources in galactic and ecliptic reference frames from the ICRS coordinates.
Additionally, some corrections and quality checks are applied. This may result in the elimination of some of the data of a record, or even of the complete record itself. Other sources are filtered out or modified based on lists of specific sourceIds provided by the CUs that provided the original data or the CU9 scientific validation team.
Finally, some checks are applied in order to ensure the consistency of the information presented in each table after this filtering so that, for instance, no variability data is published for a source that has been filtered out from the main gaia_source catalogue. Please note that there are some exceptions to this rule, such as the Solar-Sytem Objects (SSO), which use a diferent set of source IDs, or the science_alerts, which include all the entries already published in the Gaia Photometric Science Alerts page up to the cycle 3, even if the sources themselves have not passed the cut to publication into the archive in this release.
Conversion of gaia_source
A number of filters were applied to the set of integrated sources from cycle 3 in order to remove:
Sources which did not converged to a solution in AGIS-03,
Sources with less than five individual observations used for the astrometric solution in AGIS,
Sources for which the longest semi-major axis of the position error ellipse is greater than or equal to 100 mas,
Sources marked as duplicates of another source that was selected into the catalogue.
As a difference with respect to Gaia DR2, sources for which the G-band photometry is not available or is considered suitable for publication have not been removed.
These are the same filters that were used in Gaia EDR3, so the list of sources is indeed identical. They all keep the same source ID as in Gaia EDR3, even if the associated designation changes the prefix from “Gaia EDR3” to “Gaia DR3”.
Unlike in Gaia EDR3 though, which was published using a filtered set of radial velocities already published in Gaia DR2, this release includes updated radial velocities from cycle 3 data.
Conversion of XP Mean Spectra
Of all the available spectra, the base list of candidates for publication was those those in the processing of published Astrophysical Parameters. Additionally, some other spectra specifically requested have been included, following the criteria defined in Section 5.3.6.
Of those candidate XP spectra, the following were removed from both xp_summary and xp_continuous_mean_spectrum:
Spectra of sources not in gaia_source
Spectra with less than 241 CCD observations in both the BP and RP bands.
An internally calibrated sampled spectra (xp_sampled_mean_spectrum) was generated just for a subset of the spectra above. In particular, it was not generated for:
Spectra of sources of mangitude greater than 15
Spectra with less than 241 CCD observations in either the BP or RP bands.
Conversion of RVS Mean Spectra
The RVS spectra that were eligible for publication were those that had been flagged as valid by the spectroscopic pipelines. Nevertheless, only a subset of around one million of those spectra was selected for publication in Gaia DR3. Please check the selection details in Section 6.5.2.
Please notice that those selection criteria are not spatially uniform: the availability of reference catalogues, the criterion of SNR and the selection function used result in some areas of the sky having a higher density of sources, while other regions contain noticeable patterns and gaps (see Figure 17.41 (left panel)).
Conversion of Variability data
The table vari_summary contains statistical information about both the sources identified as any of the variable types considered by the variability analysis (see Chapter 10), and about non-variable sources included in the Gaia Andromeda Photometric Survey (GAPS). These sources should also have data for each photometric observation made by Gaia up to cycle 3 in the table epoch_photometry. The sources identified as variable have an entry providing additional type-specific information in one or more of the other vari_* tables.
Additionally, for some of those sources, there will also be detailed data on their radial velocity, both statistical (vari_rad_vel_statistics) and the corresponding epoch observations (var_epoch_radial_velocity).
The remaining tables not mentioned in this section have not had applied any further publication filter other than the validity criteria of the originating pipelines.