7.6.5 Quality assessment and validation

The short timescale suspected periodic selection criteria relies on the analysis of known constant and variable sources from OGLE catalogues. In order to validate the analysis, sources from other catalogues of variable stars such as Catalina, LINEAR, ASAS, AAVSO, etc as well as other resources from the literature are crossmatched with the Gaia data using the Simbad crossmatch tool. Finally, visual inspection of candidate light-curves together with complementary follow-up of some short period variable candidates enabled us to further refine the selection criteria and clean the suspected short period sample.

Verification

By applying the preliminary short-timescale selection criteria to all Gaia sources with G CCD photometry available, having more than 20 FoV transits in G, and G a magnitude between 16.5 and 20 mag (which is the range where the variogram detection criterion has been validated), 16 703 sources are selected as preliminary short period candidates. Visual inspection of light-curves of a few hundred randomly selected examples enables to identify several unexpected and probably spurious behaviours, such as G light-curves switching between two discrete magnitude level, or sources exhibiting incompatible behaviours in G, GBP and GRP.

To filter out such spurious variability, cleaning of the sample based on the candidates’ environment over the sky (in a similar way as to what is done by Wevers et al. 2018), removing e.g. candidates possibly contaminated by bright nearby sources, have been necessary.

An additional time series cleaning operator has also been applied, specific to the short timescale analysis and based on the expected amplitude of the variation in the G band, to remove the possibly remaining GBP and GRP outliers.

Finally, thanks to extra-cuts on the number of observations, skewness, median variogram ratio and correlation values in G, GBP and GRP bands, the remaining spurious variable candidates have been efficiently excluded.

Validation

At this stage, some further validation and black-listing of the short timescale candidates sources has been necessary.

First, a few tens of sources in the sample are reported as showing excess flux features in GBP +GRP compared to G, which have been removed.

Additionally, a few hundred candidates are overlapping with the bona fide eclipsing binaries sample provided by the eclipsing binaries work-package (whose analysis were performed as a test case, but whose results were not made public for Gaia DR2) to CU4 for further analysis and characterization. The publication of new eclipsing binaries identified and characterized from Gaia data is planned only from Data Release 3 and onwards. Hence those few hundred sources are excluded from the published short timescale candidates list.

Finally, after applying all the filtering and refinements described in the previous and current sections, the published list of short timescale, suspected periodic candidates should contain 3018 bona fide sources. This list includes about 138 known variables from the literature catalogues used for quality assessment and validation, with about three quarters of them being period variables with periods below 1d. All the non-periodic variable and constant sources from these catalogues have been removed from the published short timescale suspected periodic candidates sample. Hence, there is a contamination of about 19% of the sample from longer period variables. However, those sources have periods around a few days, and relatively high amplitudes, hence not being short period variables per se, but whose detection at the short timescale level is justified.

When compared to all the OGLE short period variables processed as part of the global short timescale variability search for Gaia DR2, the completeness of the short timescale suspected periodic candidates sample published is assessed around 0.05%.

Further contamination estimation is performed, using the OGLE photometric database: the Gaia DR2 short timescale sample of 3018 sources is crossmatched with this OGLE catalogue in the Magellanic Clouds, then the OGLE and Gaia time series are compared to check if the features observed in the later are compatible with the former. From this analysis, the real contamination from spurious or non-periodic variability is assessed around 10–20% is those regions.

More details on the Gaia DR2 short timescale analysis results, efficiency and quality, are available in Roelens et al. (2018).