8.4 Quality assessment and validation 8.4.1 Summary of main validation results IV Gaia catalogue

8.4.2 Additional validation

We show the ‘theoretician’s’ Hertzsprung-Russell diagram, $T_{\rm eff}$ vs. ${\cal L}$ , in Figure 8.11a. There are many vertical stripes. When we use the Priam flag values 0100001 or 0100002 (see Table 8.1) to filter for the best data only, Figure 8.11b shows that there are still some stripes left. These are due to the unbalanced $T_{\rm eff}$ distribution in the training sample of ExtraTrees (see Section 8.3.1). Figure 8.11c shows that further vertical stripes are induced by sources which have bad Priam flags.

Figure 8.11: Hertzsprung-Russell diagram using all data (panel a), data with Priam flag 0100001 or 0100002 (panel b) and data with other Priam flag values (panel c).

As further validation of both $T_{\rm eff}$ and ${\cal L}$ we investigate the Gaia benchmark stars (Heiter et al. 2015). Unfortunately, most of the 34 Gaia benchmark stars are too bright for Gaia such that we only found 16 of them to have photometry and even only 6 of them to have astrometry in Gaia DR2 (see Table 8.4). Moreover, not all of them are published in Gaia DR2 because some were filtered out due to bad photometry or astrometry. Figure 8.12 shows that the RMS difference in effective temperature is $\sim$ 268K and $\sim$ 13.1% in bolometric luminosity, respectively. This is well within the total error budget of our methods.

Table 8.4: Gaia benchmark stars (Heiter et al. 2015) that have been found in Gaia DR2. Note that the brighter a star, the more likely it is saturated in the astrometric field, thus leading to poor Gaia parallax measurements or corrupted photometry. Sources or values in red have been filtered out of Gaia DR2.

name	Gaia source ID	$G$ [mag]	parallax [mas]	Priam $T_{\textrm{eff}}$ [K]	FLAME $R$ [ $R_{\odot}$ ]	FLAME $L$ [ $L_{\odot}$ ]
61 Cyg A	1872047223895571456	4.71	–	$4327_{-124}^{+84}$	–	–
61 Cyg B	1872046574983497216	5.42	286.15 $\pm$ 0.06	$4194_{-214}^{+535}$	$0.58_{-0.13}^{+0.06}$	$0.09_{-0.01}^{+0.01}$
$\epsilon$ Eri	5164707970261629952	3.37	–	$4975_{-284}^{+496}$	–	–
$\epsilon$ Vir	3736865265439463424	2.45	30.56 $\pm$ 0.44	$4710_{-863}^{+1907}$	$14.75_{-7.3}^{+7.71}$	$98.29_{-1.59}^{+1.59}$
$\gamma$ Sge	1823067317695767552	2.74	12.42 $\pm$ 0.36	$4611_{-768}^{+1311}$	$35.58_{-15.03}^{+14.72}$	$485.95_{-15.78}^{+15.78}$
Gmb1830	4034171629042489344	6.18	–	$5300_{-97}^{+61}$	–	–
HD107328	3701693091058501632	4.52	–	$4480_{-120}^{+144}$	–	–
HD122563	3723554268436602368	5.86	–	$4704_{-117}^{+82}$	–	–
HD140283	6268770373590148096	7.02	–	$5796_{-247}^{+150}$	–	–
HD220009	2661005953843811328	4.58	–	$4366_{-106}^{+225}$	–	–
HD22879	3250489115708824064	6.52	38.20 $\pm$ 0.07	$5914_{-48}^{+37}$	$1.06_{-0.02}^{+0.02}$	$1.25_{-0.01}^{+0.01}$
HD49933	3113219383954556416	5.65	33.44 $\pm$ 0.09	$6589_{-136}^{+95}$	$1.45_{-0.05}^{+0.06}$	$3.57_{-0.02}^{+0.02}$
HD84937	615943806835727872	8.19	–	$6410_{-71}^{+175}$	–	–
$\xi$ Hya	3478394889480965120	3.17	–	$4760_{-148}^{+1015}$	–	–
$\mu$ Ara	5945941905576552448	4.90	–	$5813_{-145}^{+211}$	–	–
$\mu$ Leo	643819484616249984	3.42	30.65 $\pm$ 0.42	$4474_{-206}^{+134}$	$11.04_{-0.48}^{+0.93}$	$44.01_{-0.69}^{+0.69}$

Figure 8.12: Parameter estimation for Gaia benchmark stars (Heiter et al. 2015) for effective temperature (panel a) and bolometric luminosity (panel b). Sources in red have been filtered out of Gaia DR2.

We next investigate the $T_{\rm eff}$ estimate (the median of the ExtraTrees outputs) and the $T_{\rm eff}$ uncertainty estimates, which are 16th and 84th percentiles of the ExtraTrees ensemble (see Section 8.3.1). For 34 231 test stars with literature estimates of $T_{\rm eff}$ , we find that 16 471 (48%) have a median value which is below the literature value. Likewise, we find that for 7 971 stars (23%), the 16th percentile is above the literature estimate, ans for 7 368 stars (22%), the 84th percentile is above the literature value. Figure 8.13 shows that these numbers are inconsistent with 16%, 50% and 84%. At face value, these results seem to suggest that our estimated uncertainty interval is too narrow. However, we need to keep in mind that the literature values we are comparing to also have errors. If we take the estimated literature uncertainties into account, we find in Andrae et al. (2018) that our estimated uncertainties coincide nicely with the actual errors.

Figure 8.13: Validation of uncertainty estimates of $T_{\rm eff}$ by comparison of rescaled beta distributions (solid lines) to expected percentiles (vertical dashed lines). Panel a: $\sim$ 23% of stars are below the lower (16%) confidence level. Panel b: $\sim$ 48% of stars are below the median. Panel c: $\sim$ 78% of stars are below the upper (84%) confidence level.

We turn now to the asymmetry of uncertainty estimates for $T_{\rm eff}$ . Figure 8.14a shows the distribution of the asymmetry $\frac{T_{\rm eff}^{\rm upper}-T_{\rm eff}}{T_{\rm eff}-T_{\rm eff}^{\rm lower}}$ for all sources with Priam flags 0100001 or 0100002 (see Table 8.1). While for about 57% of sources the upper and lower uncertainty intervals differ by less than a factor of 2, there are also about 2.5% of sources for which the difference is larger than a factor of 10. As it is obvious from Figure 8.14b, boundary effects play a role here: If $T_{\rm eff}$ is close to the lower limit of 3000K, then there is little room left for the 16th percentile, such that the lower uncertainty interval is ‘squeezed’. Likewise, near the upper limit of 10 000K the 84th percentile is restricted, such that there the upper uncertainty interval is also squeezed. However, Figure 8.14b also shows that asymmetries larger than a factor of ten can arise at any other temperature, too. These extreme asymmetries appear to coincide with temperatures that are overrepresented in the ExtraTrees training sample, whereas the asymmetries are more moderate for temperatures where we have little training data.

Figure 8.14: Asymmetry of uncertainty estimates for $T_{\rm eff}$ shown as a histogram (left panel), and as a function of the estimated $T_{\rm eff}$ (right panel). The grey histogram the right panel shows the $T_{\rm eff}$ training distribution for ExtraTrees.

Concerning the degenerate extinction and reddening estimates, Andrae et al. (2018) discuss how degenerate estimates have been identified and filtered out. They only show the results for $A_{\rm G}$ . Figure 8.15 complements this by also showing the corresponding results for $E(G_{\rm BP}-G_{\rm RP})$ .

Figure 8.15: Identification of the most degenerate reddening estimates for high Galactic latitude stars ( $|b|>50$ ${}^{\circ}$ ). Panels a and b show the identification via the asymmetry of confidence intervals. Panels c and d show the identification via the lower confidence interval. A corresponding plot for $A_{\rm G}$ is provided in Andrae et al. (2018). Note that this plot contains data that have been filtered out of Gaia DR2.

As a demonstration of the usefulness of the additional filtering on extinction that have been applied to Gaia DR2, we investigate the relation between our $A_{G}$ estimate and the estimate of $A_{K}^{\textrm{WISE}}$ from APOGEE (Zasowski et al. 2013). We expect an approximate relation of $A_{G}\sim 6.36\cdot A_{K}^{\textrm{WISE}}$ , although this will strongly depend on the adopted extinction law and the intrinsic source SED. Figure 8.16a shows that without the additional filtering there is a prominent group of outliers with low $A_{K}^{\textrm{WISE}}$ but large $A_{G}$ . However, these are removed almost completely from the final Gaia DR2 sample, as shown in Figure 8.16b.

Figure 8.16: Comparison of $A_{G}$ estimate vs. APOGEE’s $A_{K}^{\textrm{WISE}}$ for 13 143 validation targets. Panel (a) shows the results for all stars with Priam flags 0100001 or 0100002 (see Table 8.1). Panel (b) shows the results for stars in Gaia DR2 with the additional filtering. The dashed line shows the expected approximate relation $A_{G}\sim 6.36\cdot A_{K}^{\textrm{WISE}}$ . Note that panel (a) cannot be reproduced from Gaia DR2 because extinction and reddening estimates have been removed for sources that violate Equations (8)–(11) in Andrae et al. (2018).

As mentioned in Section 8.3.2, there are no literature estimates of $A_{\rm G}$ or $E(G_{\rm BP}-G_{\rm RP})$ . And since the Gaia passbands are very broad (Jordi et al. 2010) and thus strongly dependent on the intrinsic source SED (see Figure 8.6), it is very hard to meaningfully compare estimates of $A_{\rm G}$ or $E(G_{\rm BP}-G_{\rm RP})$ to literature estimates of $A_{\rm V}$ or $E(B-V)$ . Nevertheless, in Figure 8.17 we compare our estimates of $A_{\rm G}$ to estimates of $A_{\rm V}$ from Rodrigues et al. (2014). There is a decent correlation, but there is a large scatter in our estimates of $A_{\rm G}$ . Similar results are seen when comparing to extinction and reddening estimates in the Kepler Input Catalog and estimates from Lallement et al. (2014).

Figure 8.17: Priam estimates of $A_{\rm G}$ vs. estimates of $A_{\rm V}$ from Rodrigues et al. (2014) using the Bayesian method (panel a) and the direct method (panel b).

Another important demonstration is that the extinction estimate $A_{\rm G}$ must approach zero for nearby stars. This is investigated in Figure 8.18. We can see many high-extinction stars below $\varpi<1$ mas, whereas above $\varpi>5$ mas the density profile is an exponential consistent with zero extinction and random noise as explained in Section 6.5 of Andrae et al. (2018). Note that it is not the sharp decline of the maximum $A_{\rm G}$ above $\varpi>5$ mas, which is only due to a quick decline in number of stars as the parallax increases. Instead, it is the distribution for fixed $\varpi$ that changes.

Figure 8.18: Priam estimates of $A_{\rm G}$ vs. parallax for $\varpi>0.5$ mas.

In Andrae et al. (2018), we argue that for high Galactic latitude stars ( $|b|>50$ ${}^{\circ}$ ) our extinction and reddening estimates are consistent with an exponential distribution. The exponential would be the maximum-entropy distribution for a non-negative random variate, i.e., the distribution that contains the minimum amount of information (e.g. Dowson and Wragg 2006). This suggests that for high Galactic latitude stars our extinction and reddening estimates are consistent with being pure noise (without systematics). This is further detailed in Figure 8.19. Panels a and b show that for $A_{\rm G}$ $\lesssim$ 1.3 mag and $E(G_{\rm BP}-G_{\rm RP})\lesssim$ 0.6 mag the exponential is a very good fit. However, for larger extinctions and reddenings, there are significant departures from the exponential, with our results having heavier tails than expected. In particular, as pointed out in Dowson and Wragg (2006), the exponential distribution must satisfy the relation $\langle x^{2}\rangle=2\langle x\rangle^{2}$ between first and second moments. This is tested in Figure 8.19c and d, where we compare the relation $\langle x^{2}\rangle=2\langle x\rangle^{2}$ to the first and second moments of $A_{\rm G}$ and $E(G_{\rm BP}-G_{\rm RP})$ , where the moments have been estimated from 1000 bootstrap samples drawn from the sample of high Galactic latitude stars. We can clearly see that the relation $\langle x^{2}\rangle=2\langle x\rangle^{2}$ for an exponential is not satisfied. More quantitatively, for both, $A_{\rm G}$ and $E(G_{\rm BP}-G_{\rm RP})$ , the second moment is $\sim$ 14% too large compared to the value expected from the first moment or, alternatively, the first moment is $\sim$ 7% too small compared to the value expected from the second moment. This agrees with our previous observation that the tails of our estimates are too heavy compared to an exponential, because such heavy tails would affect the second moment more than the first. However, our sample may also be contaminated: First, our removal of outliers with degenerate extinction and reddening estimates (see Figure 8.15) is not perfect and outliers that are not caught by the selection criteria from Andrae et al. (2018) would lead to precisely this kind of heavy tails. Second, even at high Galactic latitudes ( $|b|>50^{\circ}$ ) there are still real dust structures that obtain genuinely high estimates of $A_{\rm G}$ and $E(G_{\rm BP}-G_{\rm RP})$ , thereby also causing heavy tails. Given these two limitations in our test sample plus the fact that this deviation of 14% from $\langle x^{2}\rangle=2\langle x\rangle^{2}$ is not very large (although statistically significant), we still conclude that we are largely consistent with an exponential distribution. Consequently, having established the exponential as the maximum-entropy (minimum-information) distribution, we obtain global uncertainty estimates of $A_{\rm G}$ and $E(G_{\rm BP}-G_{\rm RP})$ from $\sqrt{\langle x^{2}\rangle}$ (“standard deviation with respect to $x=0$ ”) given the sample of high Galactic latitude stars.

Figure 8.19: The exponential as the maximum-entropy distribution of $A_{\rm G}$ and $E(G_{\rm BP}-G_{\rm RP})$ for high Galactic latitude stars ( $|b|>50^{\circ}$ ). Panels a and b show the histograms of $A_{\rm G}$ and $E(G_{\rm BP}-G_{\rm RP})$ (black lines), compared to the exponential distribution having the same mean value (red lines). Panels c and d show the 1000 bootstrap-sample estimates of first and second moments for $A_{\rm G}$ and $E(G_{\rm BP}-G_{\rm RP})$ (black points) compared to the relation $\langle x^{2}\rangle=2\langle x\rangle^{2}$ of an exponential (solid black lines). (Andrae et al. 2018).

As is shown in Figure 8.20a-c, there are clear ‘fingers of god’ artefacts and ring-like structures in the Apsis results. The fingers of god clearly coincide with high $A_{\rm G}$ and $E(G_{\rm BP}-G_{\rm RP})$ , which suggests that these are genuine extinction features caused by foreground dust clouds. Comparison to the source density in Figure 8.20d shows that the fingers indeed correspond to lack of sources and comparison to Figure 8.20e shows that the fingers are also systematically redder, as is expected by foreground extinction. The rings are most prominent in Figure 8.20f, where within $\sim$ 0.5 kpc the intrinsic colour is very red, getting bluer around $\sim$ 3 kpc and then turning redder again towards $\sim$ 5 kpc. These rings are most likely due to the dwarf-giant bimodality in the stellar distribution. Faint and cool dwarfs are only detected nearby, thus explaining the red ‘centre’. As we go further out, they become too faint to be detected and only bluer main sequence stars and red giants remain, thus causing the mean colour to become bluer. As we go out even further, also the blue main sequence stars eventually cease to be detectable and we are only left with the luminous red giants, thus causing the mean colour to become redder again. Thus the rings are caused in part by the magnitude limit (G $\leq$ 17) of our sample.

Figure 8.20: Fingers of god and solar-centric rings in Apsis results, shown as a projection on the Galactic plane using $1/\varpi$ as a distance estimate, for stars with G $\leq$ 17, better than 20% parallax uncertainties and clean Priam flags. The Sun is at $(0,0)$ . The panels are colour-coded by (a) $T_{\rm eff}$ , (b) $A_{\rm G}$ , (c) $E(G_{\rm BP}-G_{\rm RP})$ , (d) logarithmic source density, (e) observed $G_{\rm BP}-G_{\rm RP}$ , and (f) de-reddened $G_{\rm BP}-G_{\rm RP}$ colour. Black circles have radii of 0.5 kpc, 3 kpc, and 5 kpc.

To validate the results for ${\cal R}$ and ${\cal L}$ we compare the derived radii and luminosities with those from a selection of external catalogues, whereby the targets are mostly bright ( $G<12$ ) and nearby ( $<1500$ pc). In Figure 8.21 the FLAME radius is compared to a compilation of asteroseismic and interferometric references as a function of literature radii and $T_{\rm eff}$ . For the less evolved stars ( ${\cal R}$ $<$ 3.0 ${\cal R}_{\odot}$ ) the differences are consistent with zero and the scatter is around 7%, consistent also with the radius uncertainties for this range (Andrae et al. 2018). The green triangles represent radii from automatic asteroseismic analysis using scaling relations, where typical uncertainties in the radius can be 5% and the actual values of the radii are rather sensitive to the input $T_{\rm eff}$ (see Chaplin et al. 2014). For this sample we also find a systematic trend in the radii which increases with decreasing $T_{\rm eff}$ . This suggests that the differences in the radii results from the different temperature scales used. The other stars in this less evolved sample range have been studied in much finer detail using interferometry or detailed asteroseismic analysis (black and blue stars). For these better studied stars, we do find much better agreement with the FLAME radii ( $<5\%$ ), with no significant differences as a function of $T_{\rm eff}$ .

The largest sample in Figure 8.21 is that from Vrard et al. (2016), who studied red giants using asteroseismic scaling relations. For the giants, typical uncertainties in the asteroseismic radii are on the order of 7–10% and can also show systematic differences due to the adopted $T_{\rm eff}$ . As explained earlier, we ignore interstellar extinction, and thus we expect the luminosities and radii to be systematically underestimated. This is particularly problematic for giants which are more distant, as extinction could be non-negligible. Figure 8.21 shows indeed that the FLAME radii are slightly underestimated. However, the large scatter is also a result of the differences in the $T_{\rm eff}$ scales, with Priam $T_{\rm eff}$ having values typically $1-5\%$ cooler than the values used by Vrard et al. (2016).

As with the main sequence stars, the interferometric sample (blue stars) and the giants in NGC 6819 (black triangles), which were studied in much finer detail, show agreement with the FLAME radii to within the 10% level and no trend in their differences as a function of radius or $T_{\rm eff}$ .

Figure 8.21: Comparison of the FLAME radius with external data as a function of literature radius (left) and literature $T_{\rm eff}$ (right). $\Delta$ is defined as FLAME minus literature. The symbols indicate different literature sources: red squares are Vrard et al. (2016); green triangles are Chaplin et al. (2014); blue stars are Boyajian et al. (2016) and Ligi et al. (2016); black stars are Creevey et al. (2017); black triangles are members of NGC 6819 from Basu et al. (2011). (Andrae et al. 2018).

Further validation of the FLAME luminosity is shown using the Hertzsprung-Russell diagrams shown in Figure 8.22. The upper panels exhibit a sharp diagonal cut at the bottom. This is a result of the removal of stars with ${\cal R}<0.5$ ${\cal R}_{\odot}$ by the post-processing filtering. It is also evident from panel a (low Galactic latitudes) that where extinction is non-negligible, the $T_{\rm eff}$ and ${\cal L}$ can be underestimated. However, moving to regions of the sky where extinction is much less of an issue (high Galactic latitudes, panel b), we see that the main components of the Hertzsprung-Russell diagram are much more distinct and not contaminated. If we replace the abscissa by the dereddened colour and apply Equation 8.7 to correct for extinction, using $A_{\rm G}$ , the Hertzsprung-Russell diagrams (lower two panels) show quite narrow main sequence and giant regions, thus validating Priam E(BP-RP).