11.4 Quality assessment and validation 11.4 Quality assessment and validation 11.4.2 Selection function

11.4.1 Summary of main validation results

Author(s): Morgan Fouesneau, Ulrike Heiter

Here, we summarise the main results, caveats, and known issues for each module. An overview of Apsis product types that are processed in several modules is given in Table 11.37.

DSC

provides 5 class probabilities (stars, physical binaries, quasars, galaxies, white dwarfs) using two independent classifiers (spectra and astrometry/photometry) and a combined one. DSC is primarily an extragalactic classifier and performs well in these classes. In the extragalactic tables, DSC provides two labels, one which provides a statistically ”complete” sample and one which provides a ”pure” selection. Validation tests on the pure selection show good agreement with the literature. The pure sample for quasars has at most 60% completeness only for $z=0.0-2.5$ and $G<19.25$ . Both DSC quasar class labels are incomplete for low redshift quasars in SDSS ( $z_{\rm SDSS}\lesssim 0.4$ ), due to the assignation of a significant fraction of these sources to the galaxy class (in Gaia data, this is expected).
The DSC probabilities depend on an adopted prior, which was set beforehand (Bailer-Jones et al. 2019). However, it is possible to update or change this prior directly with the catalogue. Using star cluster known members, we find a negligible number of misidentified QSOs and galaxies; however, DSC performs poorly on binaries and white dwarfs.
GSP-Phot

determines the atmospheric parameters ( $T_{\rm eff}$ , $\log g$ , $[\rm M/H]$ ), extinction and reddening (A ${}_{0}$ , $A_{\rm G}$ , $A_{\rm BP}$ , $A_{\rm RP}$ , $E(G_{\rm BP}-G_{\rm RP})$ ), distance, absolute $G$ magnitude, and radius of sources with $G<19$ .
They generally show good agreement with external data; the MAD (median absolute deviation) for $T_{\rm eff}$ and $\log g$ are 119 K and 0.132 dex (compared to seismic $\log g$ values of main-sequence stars). The CMD (colour-magnitude diagram) looks good and corrections for extinction work well. For stars with more than 25% fractional parallax uncertainties, the CMD using GSP-Phot distances is qualitatively better than the one using parallaxes.
$[\rm M/H]$ estimates are generally too low by about 0.2 dex and exhibit systematic errors such that they should only be used with due caution, and using the empirical calibration should be considered.
Distance estimates exhibit outliers due to an overly harsh distance prior that suppresses large distances (issue understood after post-processing and validation). These outliers, mostly beyond $\sim$ 2 kpc (e.g., in clusters, Cepheids, etc.) represent $\sim$ 25% of the results, with significant parallax uncertainties, leading to underestimated distances. For stars with high parallax quality, the distance prior has not such a strong impact and the resulting GSP-Phot distances are still reliable.
The median differences in $T_{\rm eff}$ , $\log g$ , and $[\rm M/H]$ for stars in common with GSP-Spec are 10 K, 0.01 dex, and 0.008 dex, respectively.
Validation confirms underestimated uncertainties and suggests that they should be increased by a factor of 1 to 4.
A comparison with solar analogues gives median $T_{\rm eff}$ and $\log g$ offsets of –100 K / 10 K for MARCS / PHOENIX and $<0.03$ dex for both. Their mean metallicity is –0.2 dex. A comparison with benchmark stars is statistically good, but with a very large scatter; these are bright stars and probably suffer from systematics in the input data.
A comparison with clusters shows a mean $T_{\rm eff}$ offset of 34 $\pm$ 600 K (MAD=400 K), with overestimated/underestimated $T_{\rm eff}$ for giants/dwarfs, and only 4% of outliers (with $>$ 50% discrepancies). Comparison with open clusters reveals a mean $[\rm M/H]$ offset of –0.2 $\pm$ 0.3 dex. GSP-Phot performs poorly (overestimates $[\rm M/H]$ ) on stars with $[\rm M/H]$ $<-1.5$ dex. The usual $T_{\rm eff}$ -extinction degeneracy remains challenging. The average temperature correlates with the Galactic plane (which could additionally reflect the Gaia selection function), but it is most notable for hot stars with featureless spectra, which do not allow to discriminate effects from $T_{\rm eff}$ and A ${}_{0}$ .
GSP-Phot did not implement white dwarf models, so APs of these sources are of poor quality.
There is an accumulation of stars at the tip of the red giant branch due to low-SNR BP/RP spectra and parallax.
GSP-Spec

determines the atmospheric parameters, individual abundances and DIB parameters from mean RVS spectra of single stars for more than 5M sources whose spectra have SNR $>$ 20. The atmospheric parameters are derived using MatisseGauguin (MG, found in the astrophysical_parameters table) and ANN (in the astrophysical_parameters_supp table). The quality of the results is highly dependent on the radial velocity errors and the parameters themselves. Recio-Blanco et al. (2023) details the 41 character-long flagging system, and this is also summarised in flags_gspspec.
Known issues are: $\log g$ and [ $\alpha$ /Fe] coming from MG, and most of the abundances exhibit small biases that can, however, mostly be removed via an internal (Gaia self-consistent) calibration (Recio-Blanco et al. 2023).
It is important to point out that the use of the GSP-Spec quality flag chain is crucial to use the data (and therefore to estimate any possible bias).
FLAME

derives radius, ${\cal L}$ , gravitational redshift, mass, age and evolutionary stage based on GSP-Phot input (results for $\sim$ 280 million sources are in the astrophysical_parameters table) and based on GSP-Spec MG input (results for $\sim$ 5 million sources are in the astrophysical_parameters_supp table). The determination of the mass, age, and evolutionary stage assumes solar metallicity.
Comparisons of radius and ${\cal L}$ show agreement with the literature at the $<$ 3% level.
Comparisons with open clusters show global agreement with ages for main-sequence stars at 30% but 70% for globular clusters (lower metallicity).
Masses are reliable in the 0.5 to 7.0 M ${}_{\odot}$ range for main-sequence stars (at the 3% level in clusters). However, the performance on giants is not so good. Within the 1–2 M ${}_{\odot}$ range, FLAME’s giant mass estimates are consistent with the literature when accounting for FLAME’s reported uncertainties. Outside of this range, we do not find agreement with the literature and we recommend cautiousness if using these values.
The uncertainties in FLAME mass and age are underestimated.
MSC

analyzes the BP/RP spectra assuming they represent co-eval binaries (two component stars with same extinction, age, metallicity, and distance). It determines $T_{\rm eff}$ and $\log g$ of each component, their distance, extinction, and $[\rm M/H]$ . We note that MSC imposes a Gaussian prior on $[\rm M/H]$ of $0.0\pm 0.2$ . MSC also determined a flux ratio of the two components but this was only used for validation and is not published.
The $T_{\rm eff}$ 1 shows a typical bias and MAD of $-$ 70 K and 140 K, respectively, compared to literature values of known binaries. Performance for $T_{\rm eff}$ 2 is not as good. In general, $T_{\rm eff}$ 1 is hotter or equal to, and $T_{\rm eff}$ 2 cooler than the $T_{\rm eff}$ determined for their counterparts by other Apsis modules. This behaviour is expected for the assumption of two unresolved components of a binary.
MSC overestimates the $\log g$ compared to the literature and other Apsis modules, but not significantly.
The distance estimate shows good agreement with the literature and GSP-Phot although there is a small positive bias compared to GSP-Phot, which is explained by the binary assumption.
ESP-CS

produces a chromospheric activity index from the Ca ii infrared triplet lines in the RVS spectra for 2M stars. A small bias of +0.02 dex is found compared to an analysis with FEROS spectra, but this is within the reported uncertainties. The activity scale is robust, and one can identify young stars with mass accretion. A value between $0.01$ and $0.1$ nm indicates a high magnetic activity or a moderate accretion rate. A value greater than $0.1$ nm most likely indicates high rates of mass accretion. The distribution of results depends on whether GSP-Phot or GSP-Spec parameters were used as inputs, and by default the GSP-Spec ones are adopted (both GSP-Spec and ESP-CS analyse the RVS spectra). The auxiliary parameter activityindex_espcs_input reports the source of the data. More outliers / extreme values are naturally found with the GSP-Phot input values.
ESP-HS

provides $T_{\rm eff}$ , $\log g$ , A ${}_{0}$ , $A_{\rm G}$ , $E(G_{\rm BP}-G_{\rm RP})$ , and $v\sin i$ for A, B, and O stars (7500 K $<T_{\rm eff}<$ 50 000 K) for $G<17.65$ , based on BP/RP and RVS spectra (BP/RP-only for the fainter stars), and by assuming a solar metallicity. $T_{\rm eff}$ and $\log g$ are estimated to 10% and 0.2 dex for A and B stars, but they do not reach this goal for O stars. $A_{\rm 0}$ is estimated to be within 10%. $T_{\rm eff}$ agrees with LAMOST to +200 K. For the hotter stars, ESP-HS performs better than GSP-Phot due to semi-empirical corrections performed in the processing, which reduces the $T_{\rm eff}$ -extinction degeneracy. $\log g$ performs well across the full $T_{\rm eff}$ range. Horizontal branch stars can be identified in a Kiel diagram. Post-processing had a large impact on O-type stars. Of the 612 known from the literature, only 186 have parameters corresponding to hot stars. Some stars with $T_{\rm eff}$ $>$ 50 000 are most likely misclassified and have $T_{\rm eff}$ of $<7500$ K. The published $v\sin i$ is a proxy for the broadening of the spectral lines and this is a fitted parameter along with $T_{\rm eff}$ and $\log g$ .
ESP-ELS

identifies emission-line stars (Planetary Nebula and Wolf-Rayat, WR) and then classifies a subset of these into the following classes: Be, Herbig Ae/Be, T Tauri, active M dwarf, WC, WN, or PN for stars with $G<17.65$ . It also provides a pseudo-equivalent width (pEW) from H $\alpha$ for all sources that it processed. We note that rhe H $\alpha$ wavelength region (i.e. with the highest effective resolving power) is at the very blue edge of the RP passband. The APs from GSP-Phot were used for $T_{\rm eff}$ $<5000$ K to choose the spectral template for defining the pEW (as well as for the classification of Be, Herbig Ae/Be, T Tauri, and active M dwarfs). This module also produces a spectral type tag for all processed sources (218 million). ESP-ELS identifies 128 new planetary nebulae; and 130 WC (WR Carbon stars) and 200 WN (WR Nitrogen) stars have been detected. Other types, such as carbon stars (“CSTAR” tag in the spectraltype_esphs field), have also been detected, see (Gaia Collaboration et al. 2023c).
ESP-UCD

derives $T_{\rm eff}$ of ultra-cool dwarfs (UCDs). These are objects with $T_{\rm eff}$ $\leq 2500$ K (spectral type later than M7). The $T_{\rm eff}$ are based on Gaussian processes trained on observed spectra. There are some contaminating sources with poor astrometric solutions, probably very extincted bright sources. Temperatures in the hot end may be underestimated with respect to literature values.
UGC

processes the BP/RP spectra of remote galaxies and estimates redshifts $z$ in the range of 0.0 to 0.6 based on a support vector machine trained on observed spectra of galaxies. It relies on DSC classification of galaxies to process a source, and it produces a redshift and upper/lower prediction limits. In the internal pre-release data set used for validation there were 1.3M sources, and median errors on $z$ are 0.006 with a standard deviation of 0.029. UGC performance decreases with magnitude and $z$ , for the latter this is mainly due to the limits in the training set. An overdensity of sources has been observed in colour-colour space, primarily associated with the Galactic centre, and the Large and Small Magellanic Clouds (LMC/SMC). Most of these have been removed, but some could remain. There is also some contamination of QSOs (2% of sources have a QSO class in the SDSS catalogue). It turns out that the redshifts are reasonable for these sources. There are also about 50 known high-redshift sources that were assigned a low redshift. A suspicious large peak of sources is shown in the redshift bin $0.0707-0.0709$ . Most of these are bright sources with small redshift which is overestimated by UGC.
QSOC

determines redshifts of sources ( $0.08<z<6.13$ ) that are classified as quasars by DSC. A comparison with the Milliquas 7.2 catalogue of Flesch (2021) showed that $90$ % of the common sources have an absolute error on the redshift that is $|\Delta z|<0.1$ if flags_qsoc $=$ 0 sources are considered. If only $G<20$ mag sources are considered, then $80$ % of the sources have $|\Delta z|<0.1$ , independently on whether these raise warning flags or not. Nonetheless, it was found that regions around redshift $z\approx 2$ and $0.9<z<1.3$ suffer from frequent emission line mismatches in low SNR BP/RP spectra. While the former is due to unavoidable confusion between the Ly $\alpha$ and C iv emission lines (their separation being comparable to the one between the C iv and C iii] emission lines), the latter is due to the fact that the C iii] emission line is often the sole detectable line in the low SNR regime, hence increasing the rate of mismatches. We also note that QSOC processed a significant fraction of contaminating stars, which is expected as we made a deliberate choice of processing as many objects as possible, favouring completeness at the expense of purity.
OA

analyzes 56 million sources with the lowest probability classification from DSC, in order to understand their origin and to provide feedback to DSC and the other modules of Apsis. The processing by OA is triggered by a source not getting any probability higher than 0.999 of any DSC classes. This threshold was chosen in order to process approximately 56 million sources ( $\sim$ 3–4%). OA uses an unsupervised classification (self-organizing maps) to group objects with similar BP/RP spectra. It also reports statistics on their Gaia observables: magnitudes, latitude, parallaxes, number of transits, etc. This latter is found in the oa_neuron_information table. A class label identifies the neuron to which a source belongs, and a distance to this neuron is defined (found in the astrophysical_parameters table). 36% of the neurons are high-quality ones (category 1); the category 6 corresponds mostly to groups of spectra affected by artifacts or low SNR. As there is a lack of templates for physical binary objects, that class is lacking in Gaia DR3.
TGE

provides an all-sky HEALPix map at four levels (6, 7, 8, and 9) of the total Galactic extinction. It uses GSP-Phot $A_{\rm 0}$ of extragalactic giants, i.e. distant enough to be outside the Galactic disk. TGE also produces an optimal HEALPix map. 98% of the HEALPix pixels have $\sigma_{A0}<0.1$ or a relative error of $\leq$ 20%. At low Galactic latitudes, TGE extinctions are higher than in the Schlegel map, but not significantly. The map should be used with caution for $|b|<5^{\circ}$ where extinction is not reliable, but we did not remove these HEALPix values. The LMC and SMC are seen in the map (because GSP-Phot tags these as distant).

Table 11.37: Overview of Apsis products and types of objects that are processed in several modules.

	A	B	C	D	E	F	G	H	I	J	K	L	M
$T_{\rm eff}$		✓	✓		✓	✓	✓		✓
$\log g$		✓	✓		✓	✓			✓
${\cal L}$		✓	✓			✓			✓
FGK stars		✓	✓	✓		✓		✓
Chemical abundances		✓	✓
Hot stars		✓	✓	✓	✓			✓
Cool stars		✓	✓	✓		✓	✓	✓
Clusters		✓	✓	✓	✓			✓
Distances		✓
Galaxy	✓										✓	✓
Quasar	✓												✓
Extinction		✓	✓		✓				✓	✓		✓
Outliers	✓										✓

Notes. The column headings abbreviations are A=DSC, B=GSP-Phot, C=GSP-Spec, D=ESP-ELS, E=ESP-HS, F=ESP-CS, G=ESP-UCD, H=FLAME, I=MSC, J=TGE, K=OA, L=UGC, M=QSOC.

gaia data release 3 documentation

11.4.1 Summary of main validation results