# 7.4.4 Processing steps

## Preamble

The first step of the analysis is the search for a periodic behaviour of the RVs. The analysis is based on a periodogram (actually a frequencogram) which has the nature of a Fourier Power Spectrum. Since the time sampling of the Gaia satellite is particularly complex, the classical Fourier formalism is not applicable. To well understand the limitation of the method, it is necessary to discuss the impact of the sampling and of a few characteristic time scales. The spinning of the satellite on itself makes that a star is measured every 0.25 days. This is true for both fields of view, independently. Therefore, 0.25 d is the typical, smallest regular and periodic time scale in the sampling. However, gaps could be present at longer scales. The 0.25 d step should generate in the time series some aliasing with a typical 2 d${}^{-1}$ step. However, the existence of two fields of view separated by 106$.\!\!^{\circ}$5 on the sky renders the situation more complex by adding a regularity that is not properly speaking periodic. The adopted largest frequency to be explored has been chosen to be 4 d${}^{-1}$. This choice is rather atypical.

Other periodic behaviours are generated by the 63 d of precession of the spinning axis of the satellite around the direction of a fictitious, nominal Sun (compensating for the orbit of the satellite around the Sun-Earth Lagrangian point L2) and to the one year orbit of the satellite around the Sun accompanying the Earth in its revolution (Gaia Collaboration et al. 2016b). These two effects generate regularities in the time sampling and produce in addition regular gaps which are dependent on the ecliptic latitude of the observed object. Therefore, time sampling is complex and object dependent.

The total span of time of the Gaia DR3 time-series are at most 34 months and thus the largest periodicity accessible from the point of view of the Fourier roughly corresponds to a 1/1000 d${}^{-1}$ frequency. This value could actually be larger taking into account the gaps generated by the actual sampling law. It has been decided to limit the range of investigation of periodicities to 1.4 $\Delta T$. The adopted frequency step is actually $0.01/\Delta T$ where $\Delta T$ is the actual total time-span of the individual time-series. If the star indeed varies in a periodic way, the largest peak in the periodogram is an estimate of the possible period but definitively not a proof of its existence. One should consider that a periodicity is proven if a minimum of three identical cycles are observed. Although the large periods are indicative, they are associated with large errors on the period due to the small number of cycles included in the computation. Concerning the smallest periodicity accessible, we selected 4 d${}^{-1}$. The situation here is still much more complex.

When a periodicity is present in a time-series, the periodogram exhibits a peak at the corresponding frequency (the progenitor), but other peaks are also present at the location of the pseudo-aliases of the progenitor. Contrary to the case of even sampling, the actual sampling generates pseudo-aliases that are lower than the progenitor peak. However, the combination of the aliasing with background noise or other phenomena induces the fact that the choice of the true progenitor among the aliased peaks is never fully secure.

## Search for the periodicity

Owing to the odd nature of the sampling, the periodogram of the time-series has been computed up to 4 d${}^{-1}$ with a step of 0.01/$\Delta T$ where $\Delta T$ is the total time spanned by the data. This represents an oversampling of a factor one hundred that gives directly access to the amplitude of the signal as being the height of the peak. The method used is the one from Heck-Manfroid-Mersch (Heck et al. 1985), which advantageously replaces the classical Fourier method in the case of odd sampling. In principle the highest peak is the progenitor one. This is particularly true for the case of RV orbital motion, since most of the power is gathered in the fundamental frequency which is not the case for Eclipsing Binary lightcurves for example. This is due to the fact that RV curves always present one simple cycle during the orbital period. Therefore, the peak at the fundamental frequency is always the one gathering the majority of the power in opposition to the case of light curves where very often most of the power is in the peak at twice the fundamental frequency. In this case, the possible absence of visibility of one of the eclipses can be misleading for the determination of the period.

The periodogram is explored and some 100 frequencies corresponding to the highest peaks are listed as candidate frequencies. This is more than the amount of independent ones in the classical Fourier. For each of these 100 candidate frequencies, a fit is made of an eccentric Keplerian orbital model. At this stage, there is a possibility in the pipeline to add a 101th candidate frequency coming for example from a photometric curve. In particular, this possibility is used for the subset of solutions of the type EclipsingSpectro (see Section 7.7.1).

## Fit of the orbital solution

All stars that present bona fide RV variations are processed by the pipeline described here. The above-mentioned candidate frequencies correspond to signals with no constraint on the shape of the RV curve except the periodicity. Since the aim of the pipeline is to compute orbital solutions, it is worth trying to limit the possibilities to Keplerian motion. It is possible to develop Keplerian motion into powers of the eccentricity. However, the mathematical series is limited to the eccentricity values above which the series does diverge. Therefore, this method introduces biases in the $e$ value since it is not convenient for high eccentricities. To get out of this problem, the pipeline is not performing the fit in phase (defined by the candidate frequency being tested) but in true anomaly. In this domain, the RV curve to fit is a cosine. This of course necessitates the knowledge of $T_{0}$, $e$ and $\omega$. Therefore the following equation is used

 $\mathrm{RV}(t)\,=\,\gamma\,+\,K\left[\cos(v(t)+\omega)\,+\,e\,\cos(\omega)\,\right]$ (7.34)

where the periastron corresponds to $v=0$ and the maximum velocity to $v+\omega=0$. Since the eccentricity is a strongly non-linear parameter, the pipeline proceeds as follows. Within the two-parameter space $(e,T_{0})$ a value is adopted for each of two parameters. From Section 7.4.3, $v(t)$ was deduced from the adopted $e$ and $T_{0}$, and the above equation was developed into

 $\mathrm{RV}(t)\,=\,\gamma\,+\,K\,\cos{\omega}\,(\cos(v(t))+e)\,-\,K\sin{\omega% }\sin{(v(t))}$ (7.35)

where $\alpha\,=\,K\,\cos{\omega}$ and $\beta\,=\,K\,\sin{\omega}$ appear in a linear way. The comparison of the model with the observations is performed by least-squares, returning a $\chi^{2}$ type value of goodness-of-fit. The two parameter space $(e,T_{0})$, is then explored using a Levenberg-Marquardt procedure. For each pair of values of the parameters, the loop is performed again, leading to a new value for the $\chi^{2}$ value. Actually four tracks are exploring the parameter space with different starting locations in order to avoid bad regions of the $\chi^{2}$ surface where the Levenberg-Marquardt procedure could get stuck. The $\chi^{2}$ is here considered as the objective function of the least-squares fit. Ultimately the best solution is selected for the considered candidate frequency. This algorithm is quite robust and provides adequate fits for orbital curve up to eccentricities of 0.995, well beyond what is necessary for the majority of the objects. The procedure is executed for every candidate frequency and gives a $\chi^{2}$ of the fit that is translated into an $F2$ goodness-of-fit statistic. The smallest value identifies the best fit and the best set of parameters including the period. The algorithm is efficient at finding the best minimum but the fit is performed in two steps which is clearly an approximation concerning the cross-covariance terms between families of parameters (linear versus non-linear). Therefore, the algorithm also performs, at the best minimum, a classical least-square fit over all the parameters with all the non-diagonal terms being taken into account. This has the advantage to provide sound estimations of the errors on the parameters. The use of a fit in the true anomaly space ensures that no analytical solution could present a bad behaviour in the phase space. Zones in the phase diagram where no data are present could exist in some pathological cases: the phase gaps.

The eccentricity is a parameter that suffers from a bias related to the noise propagation: the RV noise could enforce a fitted value for small $e$ that is much larger than the true one. This effect is sometimes compensated by adopting $e=0$ for the solution. One may frequently find in the literature the application of a Lucy test deciding to go circular at low-amplitudes and noisy cases. The present algorithm does not do that. It never decides that an orbit could be circular. It goes circular when the eccentric model cannot be fitted. The results are therefore sensitive to the problem. On the opposite, there are no local fluctuations of the amount of solution at very small $e$. The criterion that is adopted is based on the nature of the covariance matrix which has to be inverted. If the matrix becomes singular, a circular orbit fit was attempted instead (to decrease the amount of parameters, at small $e$ the $\omega$ is no longer well defined). When a circular orbit is selected, $T_{0}$ corresponds to the maximum of velocity by definition.

## Fit of trends

Not all the observed RV curves correspond to a true periodicity nor to a pseudo-one. One widespread alternative possibility is the existence of a mere trend: the data as a function of time can be represented by a polynomial function of $t$. The code is build to accept polynomials of degrees 1-2-3-and 4 but in Gaia DR3 degrees 1 and 2 were encountered only. Most of the objects entering the trend category are probably long-term periodic curves but with a time-scale longer than the present time span of the mission duration. The correct choice between an orbital solution and the trend one is particularly difficult because the independent variable is time for the trends and phase for the well covered orbits. The time entering in a non-linear way in the Keplerian equations, the statistical test is not fully correct. The algorithm makes the choice by comparing the $\chi^{2}$ values for the two fits on the basis of a Snedecor F-test. The coefficients of the polynomial are included in the database but the fit is performed in an Hipparcos-like formalism (Gosset et al. 2022).

## Internal indications

Beyond the determination of the various parameters, the code also produces various flags/indications/statistics that allow the user to have a good understanding of the results and to select some solutions that could be more secure than others. One of these important statistics is the $F2$ goodness-of-fit. For the basic output, the maximum value has been set at $F2<3$. Other examples of such statistics are the period confidence, the largest continuous gap in phase authorised, the solution confidence, the solution efficiency. Some of these statistics are fully internal to the spectroscopic orbit processing and are described in details in Gosset et al. (2022).

It is of common use to make a cut-off on the period confidence in order to avoid to fit nonexistent periods. For example, if we adopt a threshold of 0.99, the periodicities having a confidence above this value are kept and the period is stored. Whereas for the remaining period candidates, the period is not kept and a flag is raised indicating that the periodicity is not significant. For these solutions, the used bit is number 13 corresponding to 8192. These solutions should be discarded. The threshold can be modified for another full-processing.

For the DR3 processing, we kept all periods with confidence larger than 0.95, but we raised the flag 8192 for all orbital solutions with confidence between 0.95 and 0.99. Indeed these ”low quality” solutions may still be combined with other NSS type solutions.

In DR3 post-processing filtering, we adopted a rather more complex approach due to various reasons that are described in Gosset et al. (2022). Actually in DR3 the threshold is dependent on the value of the tested period. For period less than one day, we filtered out all solutions with confidence smaller than 0.995. For periods larger than 10 d, we kept all solutions with a confidence larger or equal to 0.95. In the intermediate range, we adopted a linear function for the threshold given by $-0.045\,\log P\,+0.995$. Above these thresholds, the period is kept.

This means that there is a zone where the period is suggested to be non significant from the related flag but the period is kept because the corresponding significance is larger than the value 0.95. The user can play with these characteristics to make its own design of the selection. If no significant peak is present, the SB1 classification is abandoned.

After the processing explained here, the solutions could be processed by the combiner which can merge other conditions (external) with those mentioned here (see Section 7.7).