6.5 Quality assessment and validation

6.5.1 Verification

Author(s): Leanne Guy, Isabelle Lecoeur-Taïbi, Jan Cuypers, Berry Holl, Lorenzo Rimoldini, Nami Mowlavi, Gisella Clementini, Laurent Eyer

Verification is the process of checking that the results meet the specifications. Below you will find an overview of the various verifications performed on the data processed as part of this release.


The distribution of various statistical parameters were inspected in detail as described in section 6.2 of Eyer et al. (2017).

Variability detection

The selection of variable candidates was performed using a Random Forest classifier and the completeness and contamination rates obtained are respectively around 92-93% and 7-8% for both variable and constant objects of the OGLE-IV GSEP (Soszyński et al. 2012) training set as shown in figure 18 of section 5.3 in Eyer et al. (2017).


Confusion matrices of 10-fold cross-validation on the training sets of the supervised classifiers Gaussian Mixtures, Bayesian Networks, and Random Forests are found in section 5.5.3 and figure 23 of Eyer et al. (2017). They show excellent performance for the target classes of RR Lyrae and Cepheid variables.

6.5.2 Validation

Author(s): Isabelle Lecoeur-Taïbi, Leanne Guy, Jan Cuypers, Berry Holl, Lorenzo Rimoldini, Nami Mowlavi, Gisella Clementini, Laurent Eyer

Validating is the process of assessing the scientific accuracy of the results by comparison against independent reliable sources of knowledge. Below you will find an overview of the various validations performed on the data processed as part of this release.

Variability detection

For a given variability criterion, the analysis of the histogram of empirically computed p-values is a good way to validate the results of the classifier. Several variability criteria showed the expected behaviour.


A first limited validation study on the period recovery of the characterisation pipeline was based on 28 days of EPSL data of 384 variables from the OGLE-IV GSEP data (Soszyński et al. 2012) and is described in section 5.4 of Eyer et al. (2017). Even for this very limited time span the results were as satisfactory as could be expected.

More validation of characterisation is also presented in section 6.3 of Eyer et al. (2017). There details are given on the period recovery of the 2044 sources from the OGLE-IV GSEP crossmatch set of variables with EPSL and NSL data. 1940 sources have a correctly recovered period, and for most of the others an alias is found or the non-recovery can be explained by insufficient number of data or the odd distribution of the time series.


An extensive comparison of classification attributes with respect to other surveys is discussed in section 8 of Eyer et al. (2017).

SOS Cepheids and RR Lyrae

Cepheids and RR Lyrae stars are known to obey period-luminosity and period-amplitude relationships that must be satisfied by the results of the SOS processing (figures 4, 10, 18 and 19 in Clementini et al. 2016). Likewise the Fourier parameters describing their time series are not randomly distributed, but fill specific regions in given diagrams such as those displayed in figures 21 and 22 of Clementini et al. (2016). Those properties were used to validate the data processing results.

In addition, the light curves with the computed models superposed on them were visually checked.

The results of the SOS Cep&RRL pipeline processing have also been validated using existing ground based catalogues of those objects in the field of Large Magellanic Cloud. The list of those catalogues is given at the end of section 3.1 of Clementini et al. (2016), and a comparison of the Gaia data products of those stars with the data products available in the literature is given in section 4 of that Paper. A completeness estimate of the number of Cepheids and RR Lyrae stars in the South Ecliptic Pole region published in this Gaia data release is given in section 7 of Eyer et al. (2017).