10.3.4 Processing steps
The processing of classification included the following steps (see Rimoldini et al. 2023, for details):
-
1.
cross-match of Gaia sources with variable objects from the literature, as described in Gavras et al. (2023);
-
2.
ranking of literature catalogs according to their reliability, for each (sub)class;
-
3.
definitions of training sets:
-
(a)
selection of subsets of trustable representatives according to several diagnostics and maximising coverage across spacial, photometric, and astrometric parameters, for each (sub)class;
-
(b)
combination of the selected source subsets into different sets of classes and class groups (for training, validation, and test sets);
-
(c)
definition and selection of classification attributes;
-
(a)
-
4.
definitions of classifiers:
-
(a)
combination of (sub)classes according to physical and intrinsic classifier confusion for multi-class classifiers;
-
(b)
combination of all but one class for binary one-vs-rest classifiers;
-
(c)
tuning of classifier hyperparameters;
-
(d)
creation of multiple multi-class and one-vs-rest classifiers using the aforementioned Random Forest and XGBoost methods;
-
(e)
creation of meta-classifiers to combine multi-class classifiers that used different methods (improving the overall performance for all classes);
-
(a)
-
5.
verification and validation of results:
-
(a)
selection of results from up to 12 classifiers for each class or class group;
-
(b)
comparison with the literature;
-
(c)
generation of an overall classification score (best_class_score) for the verified classifications;
-
(a)
-
6.
selection of a single class for less than 1 % of the sources that were associated with multiple classes.
As in Gaia DR2, each classified source was associated with a single class and score. Classification results are available in the vari_classifier_result table and, as mentioned in Section 10.3.3, for the special case of galaxies, in the galaxy_candidates table.
Classification score
The best_class_score field combines the probabilities returned by different classifiers for a given source to be of class best_class_name into a single number, which expresses the median of the normalized ranks of such probabilities (where each rank refers to the corresponding classifier). The number of classifiers can vary from 1 to 12, depending on both class and source.
Classification vs SOS modules
Several SOS work packages selected their input from general supervised classification results, as shown in Figure 10.1. This means that the results of these SOS modules are generally a subset of the classification results. However, there are two main reasons for exceptions to this rule:
-
1.
Each SOS module accessed up to 12 classifiers (whose probabilities are not published) for potential contributions to a given class, with custom classifier probability thresholds, which were sometimes so low that general classification favoured a different class or simply missed some of these candidates because its verification diagnostics were less sophisticated than the ones of the SOS module.
-
2.
Some SOS modules also investigated candidates from classes that were expected to cause partial confusion with their targeted class.
Both procedures improved completeness of the dependent SOS work packages, although their selections slightly deviated from the ones of the general classifier.