skip to main content

gaia data release 3 documentation

14 Validation

14.5 Multidimensional analysis

Author(s): Shourya Khanna, Eduardo Balbinot, Amina Helmi

We perform Kullback–Liebler Divergence (KLD) tests in order to check for correlations and clustering between observables. This allows us to identify where combinations of observables (subspaces) exhibit unexpected behaviour. For two-dimensional subspaces, the KLD is given by,

KLD=-d2xp(x)log[p(x)/q(x)] (14.27)

where x is a subspace of observables, p(x) is the joint distribution of observables in the dataset, and q(x)=Πipi(xi), i.e. the product of marginalised 1D distribution of each of the observables.

The KLD measures clustering in n-dimensional data. KLD values are only meaningful when compared to each other. When comparing KLD values from different datasets, a 1-to-1 relation is expected if both were drawn from the same underlying distributions. A higher(lower) KLD value means more(less) clustering.