11.3.14 Quasar Classifier (QSOC)
Author(s): Ludovic Delchambre
Goal
The Quasar Classifier (QSOC) module aim to determine the redshift, $z$, of the sources that are classified as quasars by the DSC module (see Section 11.3.2 for more details).
Inputs
The determination of the redshift by QSOC is based on

1.
the internally calibrated BP/RP spectra as sampled by SMSgen (see Section 11.3.1)

2.
the DSC Combmod probability for the selection of the sources to be processed, namely those having classprob_dsc_combmod_quasar $\ge 0.01$.

3.
BP/RP restframe quasar templates coming from the weighted principal component analysis (Delchambre 2015) of continuumsubtracted spectra of quasars from the twelfth data release of the Sloan Digital Sky Survey Quasar Catalog (Pâris et al. 2017, hereafter DR12Q) and simulated through the BP/RP simulator (see Section 11.2.3). See Delchambre et al. (2023) for more details on the way these templates were built.
Method
The determination of the redshift of quasars by QSOC is based on the fact that the redshift,
$$z=\frac{{\lambda}_{\mathrm{obs}}{\lambda}_{\mathrm{rest}}}{{\lambda}_{\mathrm{rest}}}=\frac{{\lambda}_{\mathrm{obs}}}{{\lambda}_{\mathrm{rest}}}1,$$  (11.41) 
turns into a simple offset once considered on a logarithmic wavelength scale
$$Z=\mathrm{ln}(z+1)=\mathrm{ln}{\lambda}_{\mathrm{obs}}\mathrm{ln}{\lambda}_{\mathrm{rest}},$$  (11.42) 
where we assume that a given spectral feature standing at restframe wavelength ${\lambda}_{\mathrm{rest}}$ is observed at wavelength ${\lambda}_{\mathrm{obs}}$. Accordingly, given a set of restframe templates, $\bm{T}$, and an observation vector, $\bm{s}$, that are sampled on the same logarithmic wavelength scale, ${\lambda}_{i}={\lambda}_{0}{L}^{i}$ where ${\lambda}_{0}$ is a reference wavelength and $L$ is the logarithmic wavelength sampling we use (here $\mathrm{ln}L=0.001$ or equivalently $L\approx 1.001$), the problem of finding the optimal shift, $k$, between $\bm{T}$ and $\bm{s}$ can be formulated as a ${\chi}^{2}$ minimization problem through
$${\chi}^{2}(k)=\sum _{i}\frac{1}{{\sigma}_{i}^{2}}{\left({s}_{i}\sum _{j}{a}_{j,k}{T}_{i+k,j}\right)}^{2}$$  (11.43) 
where ${\sigma}_{i}$ is the uncertainty on ${s}_{i}$ and ${a}_{j,k}$’s are the coefficients that allow to fit $\bm{T}$ to $\bm{s}$ in a least squares sense while considering a shift $k$ that is applied to the templates. The redshift that is associated with the shift $k$ being then simply retrieved through $z={L}^{k}1$. A subsampling precision is then obtained by fitting a quadratic curve in the vicinity of the selected shift. For reasons explained in Delchambre (2016) and in Delchambre et al. (2023), we compute a reversed and shifted version of this ${\chi}^{2}$,
$$ccf(k)=C{\chi}^{2}(k),$$  (11.44) 
which we refer to as a crosscorrelation function (CCF), where $C$ is a constant. Figure 11.73 illustrates the CCF of a quasars spectrum from the Sloan Digital Sky Survey (SDSS) against quasar templates coming from Delchambre (2018). Whereas no spectral features can be modelled in shifts associated with local minima, local maxima corresponds to the fit of some of the template emission lines to the observed spectra, while the global maximum usually corresponds to the most probable redshift. As the BP and RP spectra are distinct, we should also note that the effective CCF is actually composed of the sum of two CCF
$$ccf(k)={ccf}_{\mathrm{bp}}(k)+{ccf}_{\mathrm{rp}}(k),$$  (11.45) 
each associated with a given part of the spectrum.
The shift that is selected by QSOC is the one that is associated with the highest score, as defined by
$$S(k)={w}_{0}\times {\left[{\chi}_{r}^{2}(k)\right]}^{p}+{w}_{1}\times {\left[{Z}_{\mathrm{score}}(k)\right]}^{p},$$  (11.46) 
where ${w}_{0}=0.71413$, ${w}_{1}=0.28587$ and $p=0.24365$ (see Delchambre et al. 2023, for details). In this last equation, ${\chi}_{r}^{2}(k)$ is the chisquare ratio, ccfratio_qsoc, defined as the value of the CCF evaluated at $k$ to the maximum of the CCF,
$${\chi}_{r}^{2}(k)=\frac{ccf(k)}{{max}_{k}(ccf)}$$  (11.47) 
and ${Z}_{\mathrm{score}}(k)$, zscore_qsoc, is an indicator of the presence of quasar emission lines. A ${Z}_{\mathrm{score}}$ close to one indicates that all the emission lines that we expect at redshift $z$ are found in the spectra while the miss of a single emission line often leads to a very low ${Z}_{\mathrm{score}}$ (see Delchambre et al. 2023, for details). In order to facilitate the filtering of the potentially erroneous redshifts by the final user, we further define a binary processing flag, flags_qsoc which is based on these quality indicators as well as on the number of spectral transits and $G$ magnitude of the source.
Finally, the redshift that is reported by QSOC is distributed as a lognormal distribution of mean $Z=\mathrm{ln}(z+1)$ and variance ${\sigma}_{Z}^{2}={\sigma}_{k}^{2}{\mathrm{ln}}^{2}L$ such that its lower and upper confidence intervals, respectively taken as its $0.15866$ and $0.84134$ quantiles, are given by
$${z}_{\mathrm{low}}=\mathrm{exp}(Z{\sigma}_{Z})1\mathit{\hspace{1em}\hspace{1em}}\text{and}\mathit{\hspace{1em}\hspace{1em}}{z}_{\mathrm{up}}=\mathrm{exp}(Z+{\sigma}_{Z})1.$$  (11.48) 
More information on the QSOC method and on the computation of the ${Z}_{\mathrm{score}}$ can be found in Delchambre et al. (2023).
Outputs
The output of the QSOC module can be found in the following fields of the Gaia qso_candidates table:

1.
redshift_qsoc: The quasar redshift, $z$, from Equation 11.41

2.
redshift_qsoc_lower and redshift_qsoc_upper: The lower and upper confidence intervals, ${z}_{\mathrm{low}}$ and ${z}_{\mathrm{up}}$, corresponding to the 16% and 84% quantiles of $z$, respectively, as given by Equation 11.48

3.
ccfratio_qsoc: The chisquare ratio, ${\chi}_{r}^{2}$, from Equation 11.47

4.
zscore_qsoc: The ${Z}_{\mathrm{score}}$ from Equation 11.46

5.
flags_qsoc: The QSOC processing flags, ${z}_{\mathrm{warn}}$.
Scope
QSOC produces redshift predictions in the Gaia qso_candidates table only for sources with a DSC Combmod probability, classprob_dsc_combmod_quasar $\ge 0.01$, while having

•
flags_qsoc $\le $ 16, meaning that the BP/RP spectra are considered as reliable (i.e. flag
Z_BADSPEC
is not set such that flags_qsoc $$ 16) or the processing of this source rises no warning flag even though the spectrum was considered unreliable (i.e. flags_qsoc = 16). 
•
flags_qsoc $>$ 16 but added by other contributors to the Gaia QSO table, as indicated in the source_selection_flags field of the qso_candidates table.
Results
Scope
We provide here a summary of the QSOC performances. We however refer the user to Delchambre et al. (2023) for a more detailed analysis of these results. QSOC performances are assessed by comparing the predicted reshifts against values from the literature. For this purpose, we crossmatched 6,375,063 sources having redshift estimates from QSOC with 790,776 quasars having spectroscopically confirmed redshifts in the Milliquas 7.2 catalogue of Flesch (2021) (i.e. type = 'Q' in Milliquas). The 1${}^{\mathrm{\prime \prime}}$ search radius we used then allows us to extract 439 127 QSOC sources with literature redshifts. We should however stress out that neither these redshifts, nor the $G$ magnitudes of the crossmatched sources follow realistic distributions, as they inherit from the selection/observational biases that are present in both the Milliquas catalogue and in Gaia. Accordingly, the numbers reported here should be taken with the caution they deserve. That being said, a straight comparison of these predictions shows that $279,850/439,127=63.73\%$ of the sources have an absolute error on the predicted redshift, $\mathrm{\Delta}z$ that is lower than 0.1 while this ratio rises to $89,107/91,320=97.58\%$ if only flags_qsoc = 0 sources are considered.
In Figure 11.74, we show the distribution of the logarithmic redshift error, $\mathrm{\Delta}Z=\mathrm{ln}(z+1)\mathrm{ln}({z}_{\mathrm{true}}+1)$ between QSOC redshift, $z$ and literature redshift, ${z}_{\mathrm{true}}$, for the 439,127 sources we previously identified. The distribution of this logarithmic redshift error provides, in addition to the number of good predictions, a straight visualisation of the mismatches existing between common quasars emission lines. We can see that most of the predictions have $\mathrm{\Delta}Z\approx 0$ and are accordingly in good agreement with their literature values. The emission line mismatches mainly occur with respect to two specific emission lines: C iii] and Mg ii, while the most frequent mismatch occurs when the C iv emission line is misidentified as the Ly$\alpha $ emission line, for reasons explained in Delchambre et al. (2023). By requiring that flags_qsoc = 0, we can mitigate the effect of these emission line mismatches without affecting too much the central peak of correct predictions.
Considering now only $$ predictions, so as to isolate the central peak from Figure 11.74, and computing the normalized logarithmic error as $\mathrm{\Delta}Z/{\sigma}_{Z}$ where ${\sigma}_{Z}=[\mathrm{ln}({z}_{\mathrm{up}}+1)\mathrm{ln}({z}_{\mathrm{low}}+1)]/2$, we can see that the distribution of $\mathrm{\Delta}Z/{\sigma}_{Z}$ approximately follows a Gaussian distribution of median 0.00744 and standard deviation – extrapolated from the interquartile range – of $1.05335$ ($0.002163\pm 1.139733$ if flags_qsoc = 0 observations are considered). The normality of $\mathrm{\Delta}Z/{\sigma}_{Z}$ is expected from Section 11.3.14, though large tails are present that come from systematics and from a smooth background of random predictions associated with very low signaltonoise ratio (SNR) observations.
Finally, in Figure 11.76, we plot the fraction of sources with $$ with respect to literature/QSOC redshift and Gaia $G$ magnitude for all the 439,127 predictions found in the Milliquas catalogue and for those where we encounter no processing issues (i.e. flags_qsoc values $1$–$8$ are not set, such that flags_qsoc $=0$ or flags_qsoc $=16$). This plot can be regarded as a way to evaluate the purity/completeness of the QSOC predictions as it straightly provides the fraction of $$ observations in terms of predicted redshift (i.e. purity) and in terms of ‘true’ redshift, assimilated here to be the literature redshift (i.e. completeness). We can note an overall decrease of the performances as we go to fainter objects. This is an obvious consequence of the generally lower SNR of these objects. Regarding the QSOC completeness (left part of Figure 11.76), we can note two problematic regions in $$ and around ${z}_{\mathrm{true}}\approx 2$ that are due to the fact that only the Mg ii emission line is covered in the BP/RP spectra of $$ quasars and to the misidentification of the C iv emission line as Ly$\alpha $ in ${z}_{\mathrm{true}}\approx 2$ quasars, as explained in Delchambre et al. (2023). The lower purity seen in quasars having QSOC redshift $$ or $z>4$, comes from the rarity of these very low/high redfshift quasars such that any false predictions towards these loosely populated regions are largely reflected in the final fraction of observations. Again, appropriate cuts on the flags_qsoc allow to circumscribe these wrong predictions to limited regions of the $G$ magnitude/redshift space, as seen by comparing the lower and upper panels of Figure 11.76.
Use
A few points of attention should be kept in mind while using QSOC predictions:

•
The set of sources processed by QSOC aim to be complete rather than pure (i.e. we voluntarily set a very low threshold on the DSC Combmod probability of classprob_dsc_combmod_quasar $\ge 0.01$). As a consequence, we expect most of our processed sources to be stellar contaminants rather than genuine quasars such that users interested in purer samples should consider using stricter selection rules as classlabel_dsc_joint $=$'quasar' or those described in (Gaia Collaboration et al. 2023b, Section 8).

•
QSOC is –by construction– designed to process TypeI/coredominated quasars with broad emission lines in the optical and accordingly yield only poor predictions on galaxies, typeII AGN and BL Lacertae/Blazars objects, that would in any case have flags_qsoc $\ne 0$.

•
The presence of the sole Mg ii emission line in the observed BP/RP spectra of quasars at $$ can lead to a higher rate of degeneracy between redshifts in this particular range. Also, the mismatch between the C iv and Ly$\alpha $ emission lines in $z\approx 2$ quasars is a frequent case of misidentification. These degeneracies being more frequent amongst faint sources.

•
Because SMSGen, described in Section 11.3.1, do not provide covariance matrices on the integrated flux (see Creevey et al. 2023), the computed ${\chi}^{2}$, from Equation 11.43, is systematically underestimated and is consequently not published in Gaia DR3. The computed redshift and associated confidence intervals, ${z}_{\mathrm{low}}$ and ${z}_{\mathrm{up}}$ from Equation 11.48, though appropriately rescaled, might sporadically suffer from this limitation.

•
Requiring that flags_qsoc $=0$ may lead to a large amount of predictions that are discarded, especially at faint magnitudes. For users interested in $G\gtrsim 19$ mag quasars, we hence suggest to use a less stringent cut of the form flags_qsoc $=0$ or flags_qsoc $=16$, where we encounter no processing issue (i.e. flags $1$–$8$ are not set) even when the BP/RP spectra are unreliable (i.e. flag $16$ can be set).