10.14.2 Properties of the input data
This catalogue started from the set of Gaia DR3 sources that were derived to be variable by the CU7 variability pipeline based on variability criteria in the time domain using -band observations (more precisely the ExtremeErrorCleaningMagnitudeDependent_FOV_G data), as described in Section 10.2. These variable sources where then classified by the same pipeline using several XGB and Random Forest classifiers as described in Section 10.3.
We selected the sources that were classified as either BCEP, SPB, GDOR, DSCTSXPHE, or pulsating pre-main-sequence stars (see type label definitions in the Gaia DR3 vari_classifier_class_definition table), with a classifier probability (unpublished) above a minimum threshold. In addition, we included sources that were classified as either ACVCPMCPROAMROAPSXARI or BEGCAS. Although the latter (composite) variability classes are in principle not main-sequence pulsators, the confusion matrix generated from the training of the classifiers showed that upper main-sequence pulsators can erroneously end up in one of these two classes. This is particularly true for the SPBs. The underlying reason is not only overlap in the classification attribute space, but also the fact that the relative variability class proportions of the training set do not reflect the true relative proportions. So, despite the fact that SPBs are more numerous than ACVCPMCPROAMROAPSXARI stars, the former often end up in the latter class because the training set of the latter was (considerably) larger.