skip to main content

gaia early data release 3 documentation

1.3 Spacecraft and ground-segment status applicable to Gaia Early Data Release 3

1.3.5 Ground-segment status

Author(s): Rocio Guerra, Javier Castañeda, Chantal Panem, Francesca De Angeli, Krzysztof Nienartowicz, Rosario Messineo, Jos de Bruijne

This section describes the responsibilities and status of the six data processing centres that are involved in the Gaia data processing (Section 1.2.2). Clearly, the Gaia data-processing centres face and successfully cope with several challenges:

  • A large number of elements and data volume to handle: dozens to hundreds of database tables, containing up to hundreds of billions of rows, populated from 26 billion field-of-view transits per year: of order 1 petabytes (1000 terabytes) of processed data are foreseen to be released at the end of the nominal, five-year mission (not counting intermediate data). With extended mission operations, this number obviously grows;

  • A complex processing, based on an iterative, self-calibrating approach, with added boundary conditions imposed by temporal constraints and resource sharing: short delays for daily processing have to be balanced against deadlines for cyclic processing;

  • The cyclic chains process a growing amount of data from one data-processing cycle to the next, with significant added complexity in the software modules: this poses continuous challenges regarding the scalability, reliability, integrity, consistency, and robustness of both the software pipelines and the operational infrastructure on which they are executed;

Descriptions of how these callenges are dealt with in practice, as well as details on the specific hardware and software processing configurations, frameworks, workflows, etc. are beyond the scope of this documentation.

DPCE

Background

The Data Processing Centre at ESAC (DPCE) is part of the Gaia Science Operations Centre (SOC; Section 1.1.5) at ESAC, Madrid, Spain. DPCE operates software from:

  • CU1 (system architecture):

    • Main database (MDB; Section 1.2.3);

    • MOC interface task (MIT);

    • De-compression and calibration services (DCS);

    • Payload operations system (POS);

    • Gaia transfer system (GTS);

  • CU3 (astrometric processing):

    • Initial data treatment (IDT; Section 3.4.2);

    • First look (FL; Section 3.5.2);

    • Astrometric global iterative solution (AGIS; Section 4.4.2);

  • DPCE:

    • IDT/FL database (IDTFLDB);

    • Daily pipeline (DPL);

    • Gaia observing schedule tool (GOST).

DPCE has responsibility for interactions with the Mission Operations Centre (MOC; Section 1.1.5) regarding payload calibration and operations. As the central ‘hub’ in the Gaia science ground segment (SGS), DPCE is the interface between the MOC and other DPCs in DPAC (Section 1.2.2). DPCE tasks include science-data retrieval from MOC and distribution to the other DPCs (after IDT/FL processing) as well as distribution of the MDB, i.e., receiving processed data from the other DPCs and assembly of the subsequent version after DPAC processing. Data transfers make use of the internet-based Gaia transfer system (GTS).

DPCE performs both daily and cyclic processing (Section 1.2.3 and Section 1.2.3).

Daily processing

Daily processing at DPCE, based on the principal software systems MIT, IDT, and FL, aims to transform the raw observation data from the spacecraft into usable form for further processing and to provide an initial treatment into higher-level data. The input to daily processing is the telemetry stream from the spacecraft, with initial input data in the form of catalogues and calibration data. The daily processing started during the commissioning phase of Gaia and will continue until the end of extended spacecraft operations.

Cyclic processing

The cyclic processing at DPCE consists of running AGIS and the MDB:

  • AGIS produces the main astrometric solution for the mission (Section 4.4.2).

  • The MDB is the central repository for the Gaia mission data (Section 1.2.3). The MDB comprises a number of tools required for handling the input/output actions, assessing the data accountability, and checking the data consistency. In particular, the MDB Integrator collects all output data processed during the data-processing cycle plus previous inputs, and unifies them into a unique table of Gaia sources for publication.

DPCB

Background

The Data Processing Centre of Barcelona (DPCB) is embedded in the Gaia DPAC group at the University of Barcelona (UB) and the Institute for Space Studies of Catalonia (IEEC), in close cooperation with the Barcelona Supercomputing Centre (BSC) and the Consorci de Serveis Universitaris de Catalunya (CSUC), also in Barcelona, Spain. The operational DPCB hardware is provided by BSC, whereas the team at the UB/IEEC carries out the management, operations, development, and software tests. The responsibilities of DPCB are the execution of the:

  • Intermediate Data Updating (IDU), one of the major processing tasks in the cyclic processing in charge of regenerating all the intermediate data as described in Section 3.4.2;

  • Simulations, specifically GASS and GOG, which emulate satellite telemetry and the final Gaia catalogue, respectively (Section 1.2.3). GASS simulations have been key in the (pre-launch) development and testing of software products across DPAC. GOG simulations are still being generated and are an essential part of the software validation and testing as well as the scientific validation of the various data products.

In addition to these operational activities, DPCB is also in charge of backing up the Gaia science telemetry archive as well as the main database (MDB, located at DPCE; Section 1.2.3), which acts as central hub of all Gaia science and calibration data. Finally, DPCB also provides support to the development and testing of IDT and related products.

Through IDU, DPCB is responsible for the reprocessing of all the accumulated astrometric data collected from the spacecraft, adding the latest measurements, and (repeatedly) recomputing the IDU outputs using the latest calibrations to obtain improved scientific results. These improved results are the starting point for the following, iterative reduction loop, which includes AGIS and PhotPipe (Section 4.4.6). In addition, DPCB is also responsible for the integration of the IDU software into the execution environment of the MareNostrum supercomputer hosted at BSC. The design and implementation of IDU and its integration in DPCB presents a variety of challenges, covering not only the purely scientific problems that appear in any data-reduction process but also technical issues that arise during the processing of the large amount of data that Gaia generates. In particular, DPCB has developed an efficient and flexible execution framework, including tailored data-access routines, efficient data formats, and an autonomous application in charge of handling and checking the correctness of all input data entering or produced by IDU.

Gaia Early Data Release 3

For Gaia EDR3, the following IDU processes have been executed:

  • On-ground attitude reconstruction (OGA): provides an initial attitude to compute preliminary sky coordinates for all observations to allow matching them with catalogue sources;

  • Scene: predicts the CCD transit times of sources given an input catalogue and the spacecraft attitude;

  • Detection classifier: flags spurious detections which have to be ignored in the subsequent processes;

  • Cross-match: matches observations to sources;

  • CCD bias and astrophysical background (APB) calibrations;

  • Calibrations of the charge-injection (CI) and charge-release (CR) profiles, the bias non-uniformity, and the CCD saturation levels;

  • LSF/PSF calibration: determines the response of the astrometric instrument in the form of an LSF/PSF library for each of the CCDs and data acquisition modes;

  • Image-parameter determination (IPD): computes the location and flux estimations for each individual CCD window using the latest calibrations;

  • Validation: provides technical and scientific consistency checks of the IDU products.

These processes have been executed once after each data segment was closed (Table 1.7) and DPCB had received all associated data except for the LSF/PSF and IPD tasks which were executed a second time over the full data set to improve and secure the astrometric solution.

DPCC

Background

The Data Processing Centre at CNES (DPCC) is located at the Centre National d’Etudes Spatiales, Toulouse, France. It runs the spectroscopic and astrophysical-parameter processing chains as well as several object-processing chains. The latter cover non-single stars (NSSs), Solar-system objects (SSOs, such as comets and asteroids), and extended objects (EOs, such as quasars and unresolved galaxies), i.e., all objects that are not processed or identified in the regular astrometric, photometric, or spectroscopic chains. The processing includes both daily (Section 1.2.3) and cyclic tasks (Section 1.2.3), comprising eight different processing chains in total.

DPCC is responsible for all aspects related to the hardware used to process the science data on its cluster (from purchase to maintenance and system administration) as well as for the development and maintenance of the software infrastructure required to process and archive the input and output data. DPCC runs the scientific software modules developed within the responsible coordination units in DPAC and delivers the data back to the DPAC scientists and to the MDB at DPCE. DPCC plays a fundamental role in the validation, pre-integration, and integration of the scientific software modules into automated pipelines, up to the qualification of the overall system. All operational aspects, from data reception and delivery to pipeline operations, are also under DPCC responsibility.

The DPCC data reception chain automatically analyses and indexes the data that are received from DPCE on a daily basis as input to the daily processing chains. For each daily chain, DPCC publishes the main results, the log files, and the execution reports on a Web portal for analysis and monitoring by the DPAC scientists. For the cyclic processing chains, data is delivered from DPCE to DPCC at agreed times. After processing and scientific validation, the scientific results of the cyclic chains are transferred back to the MDB at DPCE. At each data-processing cycle, DPCC chains re-process all data collected by the spacecraft since the start of the nominal mission. Subsequent cycles therefore benefit from significant enhancements in the scientific algorithms allowing a continuous improvement of the knowledge and calibration of the data and hence of the scientific results.

Gaia Early Data Release 3

  • The object-processing chains process eclipsing binaries identified by the variability coordination unit (CU7) plus all objects such as non-single stars, Solar-system objects (SSOs), and extended objects that are not processed or identified in the astrometric, photometric, or spectroscopic coordination units. The SSO short-term chain (SSO-ST) has run on a (quasi-)daily basis to process newly-discovered Solar-system objects and to generate science alerts for ground-based follow-up through the Gaia follow-up network for Solar-system objects (FUN-SSO). Neither the SSO long-term chain (SSO-LT) nor any of the other object-processing chains has directly contributed to Gaia EDR3.

  • The spectroscopic processing chain processes and analyses the data obtained with the radial velocity spectrometer (RVS; Cropper et al. 2018). The goals of the spectroscopic processing system, which is detailed in Chapter 6, are:

    • to monitor the health of the RVS spectrometer and to calibrate its characteristics;

    • to provide radial and rotational velocities;

    • to issue variability and multiplicity diagnostics;

    • to alert on objects that require a rapid ground-based follow-up; and

    • to provide clean, calibrated spectra (epoch spectra as well as stacked, co-added spectra).

    The RVS bias non-uniformity calibration chain (UC1; Sartoretti et al. 2018) has run frequently, triggered by the reception of newly-acquired calibration data. The RVS daily calibration chain (UC2) has been running on a (quasi-)daily basis since the middle of the first data-processing cycle. It performs a basic daily calibration of the spectra with non-truncated windows (bias, wavelength, along-scan LSF, across-scan LSF, and straylight) and derives the radial velocities of the bright stars. The spectroscopic chain (Global; Section 6.1.3) has not contributed to Gaia EDR3 but Gaia EDR3 essentially contains a copy of the radial velocities that have been published in Gaia DR2.

  • The astrophysical-parameter processing chain (Apsis) classifies Gaia objects and estimates their astrophysical parameters. The Apsis chain has been run to generate inputs for several chains but has not directly contributed to Gaia EDR3.

DPCI

Background

The Gaia broad-band photometric and spectro-photometric BP/RP data are processed at the Data Processing Centre located at the Institute of Astronomy (IoA, University of Cambridge, United Kingdom). This centre is also referred to as DPCI. DPCI is responsible for all aspects related to the hardware used to process the data, from purchase to maintenance and system administration, as well as for the development of the software infrastructure required to run the scientific modules developed within the photometric coordination unit on the DPCI cluster. DPCI plays a fundamental role in the integration of those modules into the pipeline. All operational aspects, from data deliveries to pipeline operation, are under DPCI responsibility. The pipeline that processes the broad-band photometric and the spectro-photometric data, called PhotPipe, is explained in detail in Chapter 5.

During operations, data is received daily at DPCI from DPCE. The automatic data-handling system at DPCI records new deliveries and stores metadata into a local database. PhotPipe operates in cyclic mode, i.e., PhotPipe operations start when all data for a data segment has been received and when the results of IDU (image-parameter determination and cross-matching in particular; Section 3.4.2) and AGIS (attitude, calibrations, and source astrometry; Section 4.4.2) from the same cycle have been received. The processing in PhotPipe can be divided into the following steps:

  • Ingestion and pre-processing of data, including the computation of CCD bias corrections, heliotropic angles, predicted and extrapolated positions, and the creation of types optimised for the PhotPipe processing, by joining several inputs coming from different upstream systems;

  • BP/RP pre-processing and initial calibration, in particular background and along- and across-scan geometric calibrations;

  • Internal calibration of the BP/RP instrument model, taking into account the effect of variations in the flux response and line-spread function with time as well as across the BP/RP CCDs;

  • Internal calibration of the integrated fluxes, including the initialisation of the photometric internal reference system and all the internal calibrations required to remove all instrumental effects (time-link calibration, gate and window class-link calibration, and large- and small-scale geometric calibrations);

  • External calibration, creating the link between the internal photometric reference system, for both photometric and spectral data, and the absolute one, thus allowing comparisons of Gaia data with other catalogues;

  • Export of the data produced by PhotPipe to the MDB at DPCE for integration with results from other DPAC systems, for distribution of the data to downstream consumers within DPAC, and for creation of validated selections to be released to the public in formal data releases.

At each data-processing cycle, PhotPipe re-processes all data collected by Gaia since the start of the nominal mission. Subsequent cycles benefit from significant enhancements in the software and algorithms allowed by the continuously improving understanding of the data. The cyclic nature of the DPAC processing ensures that these improvements affect all the science data collected so far.

Gaia Early Data Release 3

The PhotPipe software used for Gaia EDR3 successfully handles all nominal cases of science windows, including unexpected TDI-gate / window-class configurations (neither truncated windows nor rare, so-called complex TDI-gate cases, i.e., windows affected by two TDI gates, are currently treated). All processing in PhotPipe is based on field-of-view transits. The source mean photometry and low-resolution spectroscopy is produced by accumulating the calibrated epoch photometry and BP/RP spectra for all transits cross-matched to the same source (see Chapter 5 for details).

DPCG

Background

The Data Processing Centre in Geneva (DPCG) is embedded in the Gaia DPAC group at the Department of Astronomy of the University of Geneva, Switzerland. DPCG runs the Integrated Variability Pipeline (IVP), which is the DPAC variability pipeline integrated into DPCG’s software and hardware infrastructure. Along the pipeline processing, data is visualised for monitoring and quality-control purposes. DPCG performs a cyclic processing dependent on the input provided by the astrometric, object-processing, photometric, spectroscopic, and astrophysical-parameter systems. The overall DPCG processing aims at providing a characterisation and classification of the variability aspects of the celestial objects observed by Gaia. The information that DPCG provides comprises:

  • Reconstructed time series for all objects, also for non-variable objects;

  • Time-series statistics for all objects and for all time series;

  • Identification of variable objects;

  • Identification of periodic objects;

  • Characterisation of variable objects: significant period values, amplitudes, phases, and models;

  • Classification of the objects: a probability vector providing the probability per object to be of a given variability type;

  • Additional attributes that depend on the classification of the objects to a given variability type.

The IVP extracts attributes which are specific to the objects belonging to specific classification types. This output of the IVP is transferred to DPCE and integrated into the MDB (Section 1.2.3) from where it is used as input for further DPAC processing.

Gaia Early Data Release 3

The Integrated Variability Pipeline (IVP) is built in a modular fashion, and selected parts of the variability analysis can be included or excluded through a configuration file. During normal operations, all ‘scientific’ analyses are executed. Gaia EDR3 does not contain variability results.

DPCT

Background

The Data Processing Centre at Torino (DPCT) provides the infrastructure and operational support to the activities of the Astrometric Verification Unit (AVU) and the Italian participation to the Gaia data processing tasks. The AVU unit is responsible for the development and maintenance of the following software products:

  • AVU/AIM: the Astrometric Instrument Model data analysis software system, in charge of processing the telemetry of the astrometric data in order to monitor and analyse the astrometric-instrument response with time;

  • AVU/BAM: the Basic Angle Monitoring software system, in charge of processing the BAM device telemetry in order to monitor and analyse the BAM behaviour with time;

  • GSR: the mathematical and numerical framework of the Global Sphere Reconstruction, in charge of verifying the global astrometric results produced by AGIS.

Daily Processing

The AVU/AIM daily pipeline has been running with the following modules: Ingestion, Raw Data Processing, Monitoring, Daily Calibration, Fine Selection, and Report and Monthly Diagnostics. The AVU/AIM processing strategy is based on time, with each AVU/AIM run defined on 24 hours of observed data. The AVU/AIM pipeline starts with selecting observations with window classes 0–2 (Table 1.2 and Figure 1.3). The Raw Data Processing processes observations of all window classes and estimates the image parameters such as centroid and flux. In Gaia EDR3 processing, the AVU/AIM system used a PSF/LSF bootstrapping library including dedicated image-profile templates for each CCD, spectral-type bin, and window class. The Monitoring module is dedicated to extracting information on the instrument health, astrometric-instrument calibration parameters, image quality, and comparison between AVU/AIM and IDT outputs. The Daily Calibration module is devoted to the Gaia signal-profile reconstruction on a daily basis. Its workflow also includes diagnostics and validation functions. The calibration-related diagnostics include the image-moment variations over the focal plane. An automatic tool performs validation of the reconstructed image profiles before using them within the AVU/AIM chain. Depending on the scanning law and sky conditions, AVU/AIM treated between 2 and 11 million observations per day for Gaia EDR3 processing. In those runs with more than 5 million observations, a filter was activated in order to process the minimum number of data in each bin, defined on several instrument and observation parameters as well as time intervals, with proper quality results.

The AVU/BAM daily pipeline has been running with the following modules: Ingestion, Pre-Processing, Raw Data Processing, Monitoring, Weekly Analysis, Calibration, and Extraction and Report. In the Raw Data Processing module, the following algorithms are running: Raw Data Processing, Gaiometro, Gaiometro2D, DFT, Chi Square, BAMBin, and comparison with IDT BamElementary. The AVU/BAM system has two run strategies named IDT and H24. In the IDT strategy, used from commissioning to December 2015 (covering Gaia DR1), an AVU/BAM run is defined when a transfer containing BAM data is received at DPCT. The processing is started automatically without any check on the input data. In the H24 strategy, an AVU/BAM run is defined based on 24 hours of data and the processing starts automatically when the data availability reaches a configurable threshold (e.g., 98%). The AVU/BAM system has been processing with the H24 strategy since December 2015 to produce AVU/BAM analyses at regular intervals.

The following list provides the main data types produced at DPCT and subsequently delivered to the MDB at DPCE during Gaia EDR3 processing:

  • BamElementaryT;

  • Bav;

  • CalibratedBav.

The output and findings of AVU/AIM and AVU/BAM, provided in the daily and periodic reports, have been used to check the instrument health by performing cross-checks with other DPAC systems providing the same instrument measurements.

Cyclic processing

The GSR pipeline is composed of the following modules: Ingestion, System Coefficient Generation, Solver, Solution Analysis, De-Rotation and Comparison, and Extraction and Report. The Ingestion step reads the billions of AstroElementaries and matches them to sources to populate the GSR data store. The System Coefficient Generation module calculates the parameter coefficients of the system of linearised equations to be solved to produce the GSR solution. The Solver module consists of the implementation of the LSQR algorithm for solving the system of linearised equations. The Solver finds a GSR solution while the Analysis module checks the exit status of the solution algorithm and provides an alert in case of problems revealed by the stopping conditions implemented in the LSQR algorithm. The De-Rotation and Comparison module converts the GSR solution into a format compatible with that of AGIS. It also ‘de-rotates’ the AGIS solution back into its internal reference frame to allow comparison with GSR. GSR results are collected in the final report. A comparison between the GSR and AGIS solutions is not part of Gaia EDR3.

During Gaia EDR3 processing, the AVU/BAM cyclic pipeline has produced both BAM Fourier analyses (after detrending over 10- and 50-revolution time intervals with a third-order polynomial) and fringe-parameter estimations (after the application of a fringe-cleaning algorithm which identifies and masks pixels affected by cosmic rays).