skip to main content

gaia data release 3 documentation

10.3 General classification

10.3.3 Calibration models

Classification employed the supervised learning algorithms of Random Forest (Breiman 2001) and XGBoost (Chen and Guestrin 2016) from the H2O machine-learning platform (H2O.ai 2020). The classifiers were trained to identify a large number of classes and class groups, among which the ones that were published in Gaia DR3 were denoted by the following labels (defined in the vari_classifier_class_definition table): ACVCPMCPROAMROAPSXARI, ACYG, AGN, BCEP, BEGCASSDORWR, CEP, CV, DSCTGDORSXPHE, ECL, ELL, EP, GALAXY, LPV, MICROLENSING, RCB, RR, RS, S, SDB, SN, SOLAR_LIKE, SPB, SYST, WD, and YSO.

Galaxies were included in the training set mainly because of the artificial photometric variability detected by Gaia for extended objects (Holl et al. 2023a), as already noticed in Gaia DR2 (Clementini et al. 2019) and originally identified by S. Cheng and S. Koposov during the 2018 NYC Gaia Sprint (http://gaia.lol/2018NYC.html). In order to avoid confusion with galaxies associated with real variability, they were not published amongst the variability results, but in a dedicated galaxy_candidates, which identified them by the condition vari_best_class_name=‘GALAXY’ (with classification scores reported in vari_best_class_score).

Classes that were trained but not published in Gaia DR3, besides constant stars, included blue large-amplitude pulsators (Pietrukowicz et al. 2017), FK Comae Berenices-type variables, heartbeat stars, high mass X-ray binaries, poorly studied irregular variables, post-common envelope binaries (or pre-cataclysmic variables), protoplanetary nebulae embedding yellow supergiant post-AGB stars, PV Telescopii-type variables, strong reflection (re-radiation) in close binary systems, general sources with variable X-ray emission, and ZZ Leporis stars.

Classification models were built with 60 thousand sources from the classes mentioned above and a selection of attributes, which characterized the information contained in the time series and general source properties as follows:

  1. 1.

    the Abbe value of FoV transit magnitudes in the G band (abbe_mag_g_fov);

  2. 2.

    the astrometry-based luminosity, computed as parallax×10^(0.2 median_mag_g_fov-2) (Arenou and Luri 1999);

  3. 3.

    the possibly reddened colour index GBP-GRP, estimated by median_mag_bp - median_mag_rp;

  4. 4.

    the possibly reddened colour index G-GRP, estimated by median_mag_g_fov - median_mag_rp;

  5. 5.

    the sample-size unbiased unweighted variance and kurtosis (central moments) of FoV-transit magnitudes in the G band, denoised assuming Gaussian uncertainties (Rimoldini 2014);

  6. 6.

    the duration of the time series from the first to the last FoV transit observation in the G band (time_duration_g_fov);

  7. 7.

    the unweighted 95th percentile of magnitude changes per time interval between successive FoV transits in the G band;

  8. 8.

    the qso_variability and non_qso_variability parameters from Butler and Bloom (2011), computed from FoV-transit magnitudes in the G band, after adaptations to the Gaia data;

  9. 9.

    the ratio between the sample-size biased unweighted standard deviation of FoV-transit magnitudes in the G band and the root-mean-square of their uncertainties (std_dev_over_rms_err_mag_g_fov);

  10. 10.

    the square root of the sample-size unbiased unweighted variance of FoV-transit magnitudes in the G band (std_dev_mag_g_fov);

  11. 11.

    the parallax value (parallax);

  12. 12.

    the Pearson correlation coefficient from the magnitudes of FoV transits in the GBP and GRP bands;

  13. 13.

    the sample-size unbiased unweighted skewness moment of FoV transit magnitudes in the G band, standardized by the variance of such measurements (skewness_mag_g_fov);

  14. 14.

    the ratio of the third spectral shape coefficients in the GBP and GRP bands (Section 5.1.1);

  15. 15.

    the ratio of the standard deviations in magnitude of FoV transits in the GBP and GRP bands, i.e., std_dev_mag_bp/std_dev_mag_rp;

  16. 16.

    the single-band Stetson variability index (Stetson 1996) computed from the magnitudes of FoV transits in the G band, pairing observations within 0.1 days (stetson_mag_g_fov);

  17. 17.

    a Wesenheit-like magnitude of FoV transits in the G band, estimated by the following expression: median_mag_g_fov - 2 (median_mag_bp - median_mag_rp);

  18. 18.

    parameters derived from the Least Square periodogram (Section 10.2.3):

    1. (a)

      the top frequencies (corresponding to the highest amplitudes) in the frequency ranges 0.1–1 and 1–25 d-1;

    2. (b)

      the signal detection efficiencies (from the difference between the maximum and mean periodogram amplitudes, divided by the standard deviation of such amplitudes) in the frequency ranges 0.1–1 and 1–25 d-1;

    3. (c)

      the false alarm probabilities of the top frequencies in the frequency ranges 0.0007–0.1, 0.1–1, and 1–25 d-1;

    4. (d)

      the highest amplitudes in the frequency ranges 0.0007–0.1, 0.1–1, and 1–25 d-1.