8.1.2 Software framework, hardware, and operations

Author(s): Anne Jean-Antoine-Piccolo, Nicolas Mary

The software for Apsis is produced by teams in Heidelberg (Priam) and Nice (FLAME). The execution of the Apsis software on the Gaia data is done by the DPCC (Data Processing Centre CNES) at Toulouse (Section 1.3.4). DPCC is in charge of the integration and operation of seven software products (referred to as ‘chains’) from three different Coordination Units. The processing comprises several operations, including the input and output of data and generation of logs and execution reports. The entire process is managed by a top-level software system called SAGA. Apsis is run in parallel on a multi-core Hadoop cluster system, with data stored in a distributed file system. The validation results are published on a web server (GaiaWeb) for download by the scientific software providers. The final Apsis processing for Gaia DR2 took place in October 2017. The complete set of sources (1.6 billion with photometry) covering all Gaia magnitudes was ingested into the system. From this the 161 million sources brighter than G =17 were identified and processed. This was done on 1000 cores (with 6 GB RAM per core), and ran in about 5000 hours CPU time (around five hours wall clock time). The full Apsis system, which involves much more CPU-intensive processes, higher-dimensional input data (spectra), and of order one billion sources, will require significantly more resources and time.