11.1.4 Software framework, hardware, and operations
Author(s): Frédéric Pailler
The execution of the Apsis software is done by the Data Processing Centre CNES (DPCC) in Toulouse, France, where a cluster of 250 computers is dedicated to Gaia (6000 cores, 40 TB of RAM, 7 PB of HDFS disk storage). The cluster resources are managed by a Hadoop system (Cloudera distribution) using Cascading and Map-Reduce, and sharing of resources between the different software chains running at DPCC (CU4, CU6, CU8).
The software modules of Apsis are developed by several teams in Europe. In order to make the code compatible with the Hadoop cluster, it is integrated into a framework called SAGA, developed at DPCC. The integration is followed by a validation phase, including technical and scientific tests, to confirm the results are as expected and the execution time is compatible with the schedule.
The technical and scientific validation phase after reception of the final operational data from the third data-processing cycle began in September 2020. Then, the operations to process the Apsis input data and generate the Apsis data products for Gaia DR3 were performed between February and June 2021. For the source-based modules GSP-Phot, GSP-Spec, FLAME, MSC, ESP-ELS, ESP-HS, ESP-UCD, the processing was done in bins of magnitude containing approximately 150 000 000 sources per bin. This value was chosen to optimise the performance while minimizing potential lost time to code crashes. The limits in were set to 16.62, 17.65, 18.25, 18.67, and 19.00 mag. Of these modules, only GSP-Phot processed up to , FLAME and MSC processed to , ESP-ELS and ESP-HS processed to , while GSP-Spec and ESP-CS only processed RVS data which falls into the brightest bin (). These lower limits were imposed due to limitations in processing (and subsequent validation) time in order to deliver the post-processed data products to ESAC by June 2021. The total number of CPU hours for the production of the CU8 data for Gaia DR3 was 1 021 219 hours (a total of 2 290 cluster hours or 93 days).