skip to main content

gaia data release 3 documentation

13.2 Data Consolidation

13.2.4 Data partitioning

During the conversion process, the data are partitioned in different files so that combining data from different tables becomes easier. This partitioning, already used in EDR3, is based on 3386 ranges of HEALPix level-8 indexes. These ranges were defined so that each of them (except for the last one) contained around 500,000 sources in the Gaia DR2 catalogue. For each table, there is one file for each of those ranges, identified by a suffix containing the first and the last level-8 indexes included in it, except if the range contained no source at all. In this case, no file is generated.

The HEALPix index encoded on the source_id of each source. The main advantage is that the partitioning scheme can then be applied consistently in almost all tables that have a source_id:

  • It is guaranteed that the information about one specific source in two different tables will end up in files with the same suffix, so joins between tables can be performed independently for each of these suffixes.

  • The HEALPix level-8 index of any given source can be computed by right-shifting 43 bits the value of source_id, equivalent to calculating source_id/243. This makes simple to identify which file contains each specific source (if at all).

  • The HEALPix index identifies specific regions in the sky in the International Celestial Reference System (ICRS). Thus, spatial queries such as cone search can be implemented by computing which HEALPix level-8 pixels intersect with the area of interest, and selecting the range or ranges that contain those pixels.

Please notice that the source_id is assigned at the moment of the creation of the source record, but its position as reported by the fields ra and dec in gaia_source can be modified later by IDU or AGIS. Thus, the HEALPix index computed from both methods might be different in some cases, so spatial filtering based on the HEALPix index may require slightly increased margins.

There are some exceptions to this scheme. The tables that don’t have a source_id field, e.g. total_galactic_extinction_map, total_galactic_extinction_map_opt, oa_neuron_information,oa_neuron_xp_spectra are very small and are all stored in a single file for each table. Similarly, some tables that do have a source_id field but have few entries are stored in a single file since otherwise each file would contin only a few entries. Examples of these tables are gaia_crf3_xm, science_alerts, alerts_mixedin_sourceids and the four NSS tables.

The last category are the SSO tables (sso_source, sso_observation and sso_reflectance_spectrum). These tables do contain a source_id field, but it follows a different format to the ones used in the other tables and do not have a HEALPix index encoded. These three tables are split into 20 files using a modulo operation on the source_id excluding the last digit, so it is guaranteed that still all the information about a given source will be in files with the same suffix across all three tables.