10.3.3 Calibration models
Classification employed the supervised learning algorithms of Random Forest (Breiman 2001) and XGBoost (Chen and Guestrin 2016) from the H2O machine-learning platform (H2O.ai 2020). The classifiers were trained to identify a large number of classes and class groups, among which the ones that were published in Gaia DR3 were denoted by the following labels (defined in the vari_classifier_class_definition table): ACVCPMCPROAMROAPSXARI, ACYG, AGN, BCEP, BEGCASSDORWR, CEP, CV, DSCTGDORSXPHE, ECL, ELL, EP, GALAXY, LPV, MICROLENSING, RCB, RR, RS, S, SDB, SN, SOLARLIKE, SPB, SYST, WD, and YSO.
Galaxies were included in the training set mainly because of the artificial photometric variability detected by Gaia for extended objects (Holl et al. 2023a), as already noticed in Gaia DR2 (Clementini et al. 2019) and originally identified by S. Cheng and S. Koposov during the 2018 NYC Gaia Sprint (http://gaia.lol/2018NYC.html). In order to avoid confusion with galaxies associated with real variability, they were not published amongst the variability results, but in a dedicated galaxy_candidates, which identified them by the condition vari_best_class_name=‘GALAXY’ (with classification scores reported in vari_best_class_score).
Classes that were trained but not published in Gaia DR3, besides constant stars, included blue large-amplitude pulsators (Pietrukowicz et al. 2017), FK Comae Berenices-type variables, heartbeat stars, high mass X-ray binaries, poorly studied irregular variables, post-common envelope binaries (or pre-cataclysmic variables), protoplanetary nebulae embedding yellow supergiant post-AGB stars, PV Telescopii-type variables, strong reflection (re-radiation) in close binary systems, general sources with variable X-ray emission, and ZZ Leporis stars.
Classification models were built with 60 thousand sources from the classes mentioned above and a selection of attributes, which characterized the information contained in the time series and general source properties as follows:
the Abbe value of FoV transit magnitudes in the band (abbe_mag_g_fov);
the sample-size unbiased unweighted variance and kurtosis (central moments) of FoV-transit magnitudes in the band, denoised assuming Gaussian uncertainties (Rimoldini 2014);
the duration of the time series from the first to the last FoV transit observation in the band (time_duration_g_fov);
the unweighted 95th percentile of magnitude changes per time interval between successive FoV transits in the band;
the ratio between the sample-size biased unweighted standard deviation of FoV-transit magnitudes in the band and the root-mean-square of their uncertainties (std_dev_over_rms_err_mag_g_fov);
the square root of the sample-size unbiased unweighted variance of FoV-transit magnitudes in the band (std_dev_mag_g_fov);
the parallax value (parallax);
the Pearson correlation coefficient from the magnitudes of FoV transits in the and bands;
the sample-size unbiased unweighted skewness moment of FoV transit magnitudes in the band, standardized by the variance of such measurements (skewness_mag_g_fov);
the ratio of the third spectral shape coefficients in the and bands (Section 5.1.1);
parameters derived from the Least Square periodogram (Section 10.2.3):
the top frequencies (corresponding to the highest amplitudes) in the frequency ranges 0.1–1 and 1–25 d;
the signal detection efficiencies (from the difference between the maximum and mean periodogram amplitudes, divided by the standard deviation of such amplitudes) in the frequency ranges 0.1–1 and 1–25 d;
the false alarm probabilities of the top frequencies in the frequency ranges 0.0007–0.1, 0.1–1, and 1–25 d;
the highest amplitudes in the frequency ranges 0.0007–0.1, 0.1–1, and 1–25 d.