Gaia Python Tutorial: Cluster Analysis

Author: Deborah Baines

This tutorial has taken the Gaia Archive tutorial (located at the Gaia Archive, https://archives.esac.esa.int/gaia -> Help -> Tutorials -> Cluster analysis ) and adapted it to python. The tutorial uses the Gaia TAP+ (astroquery.gaia) module, http://astroquery.readthedocs.io/en/latest/gaia/gaia.html#module-astroquery.gaia .

This tutorial is focused on a possible scientific exploration exercise for a known cluster, the Pleiades (M45), using data from the Gaia Archive.

You can import and run this tutorial in your own Jupyter Notebook using this file: Download

First, we import all the required python modules:

In [1]:
import astropy.units as u
from astropy.coordinates.sky_coordinate import SkyCoord
from astropy.units import Quantity
from astroquery.gaia import Gaia
Created TAP+ (v1.0) - Connection:
	Host: gea.esac.esa.int
	Use HTTPS: True
	Port: 80
	SSL Port: 443
In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

# Suppress warnings. Comment this out if you wish to see the warning messages
import warnings
warnings.filterwarnings('ignore')

Do the following to load and look at the available Gaia table names:

In [3]:
from astroquery.gaia import Gaia
tables = Gaia.load_tables(only_names=True)
for table in (tables):
    print (table.get_qualified_name())
Retrieving tables...
Parsing tables...
Done.
public.dual
public.tycho2
public.igsl_source
public.hipparcos
public.hipparcos_newreduction
public.hubble_sc
public.igsl_source_catalog_ids
tap_schema.tables
tap_schema.keys
tap_schema.columns
tap_schema.schemas
tap_schema.key_columns
gaiadr1.phot_variable_time_series_gfov
gaiadr1.ppmxl_neighbourhood
gaiadr1.gsc23_neighbourhood
gaiadr1.ppmxl_best_neighbour
gaiadr1.sdss_dr9_neighbourhood
gaiadr1.rrlyrae
gaiadr1.allwise_neighbourhood
gaiadr1.gsc23_original_valid
gaiadr1.tmass_original_valid
gaiadr1.allwise_best_neighbour
gaiadr1.cepheid
gaiadr1.urat1_neighbourhood
gaiadr1.ppmxl_original_valid
gaiadr1.tmass_neighbourhood
gaiadr1.ucac4_best_neighbour
gaiadr1.ucac4_neighbourhood
gaiadr1.aux_qso_icrf2_match
gaiadr1.phot_variable_time_series_gfov_statistical_parameters
gaiadr1.sdssdr9_original_valid
gaiadr1.urat1_best_neighbour
gaiadr1.variable_summary
gaiadr1.ucac4_original_valid
gaiadr1.tmass_best_neighbour
gaiadr1.gsc23_best_neighbour
gaiadr1.gaia_source
gaiadr1.ext_phot_zero_point
gaiadr1.sdss_dr9_best_neighbour
gaiadr1.tgas_source
gaiadr1.urat1_original_valid
gaiadr1.allwise_original_valid

Next, we retrieve all the available data in the region of interest.

To do this we perform an asynchronous query (asynchronous rather than synchronous queries should be performed when retrieving more than 2000 rows) centred on the Pleides (coordinates: 56.75, +24.1167) with a search radius of 2 degrees and save the results to a file.

Note: The query to the archive is with ADQL (Astronomical Data Query Language). For a description of ADQL and more examples see the Gaia DR1 ADQL cookbook: https://gaia.ac.uk/data/gaia-data-release-1/adql-cookbook

In [4]:
job = Gaia.launch_job_async("SELECT * \
FROM gaiadr1.gaia_source \
WHERE CONTAINS(POINT('ICRS',gaiadr1.gaia_source.ra,gaiadr1.gaia_source.dec),CIRCLE('ICRS',56.75,24.1167,2))=1;" \
, dump_to_file=True)

print (job)
Launched query: 'SELECT * FROM gaiadr1.gaia_source WHERE CONTAINS(POINT('ICRS',gaiadr1.gaia_source.ra,gaiadr1.gaia_source.dec),CIRCLE('ICRS',56.75,24.1167,2))=1;'
Retrieving async. results...
Jobid: 1495017686224O
Phase: None
Owner: None
Output file: async_20170517124518.vot
Results: None

Inspect the output table and number of rows (around 1e5 results are found):

In [5]:
r = job.get_results()
print (r['source_id'])
    source_id    
-----------------
66926207631181184
66818318054203520
66917823855519360
66830859358837888
66809423175240448
66944761890240000
66980191076373760
66781621852927232
66827805636652928
66947545031024640
              ...
66649989694512256
65666785781176576
64014803920669568
64137880504644992
66542306274948224
64005909043397504
66689881351473664
66436615718993792
65645757620918912
66718434293042560
64103559419447296
Length = 98538 rows

To identify the cluster, create a proper motion plot of proper motion in RA (pmra) versus proper motion in DEC (pmdec) in the range pmra [-60,80] and pmdec [-120,30]:

In [6]:
plt.scatter(r['pmra'], r['pmdec'], color='r', alpha=0.3)
plt.xlim(-60,80)
plt.ylim(-120,30)

plt.show()

Perform another asynchronous query to filter the results by quality:

In [7]:
job2 = Gaia.launch_job_async("SELECT * \
FROM gaiadr1.gaia_source \
WHERE CONTAINS(POINT('ICRS',gaiadr1.gaia_source.ra,gaiadr1.gaia_source.dec),CIRCLE('ICRS',56.75,24.1167,2))=1 \
AND abs(pmra_error/pmra)<0.10 \
AND abs(pmdec_error/pmdec)<0.10 \
AND pmra IS NOT NULL AND abs(pmra)>0 \
AND pmdec IS NOT NULL AND abs(pmdec)>0;", dump_to_file=True)
Launched query: 'SELECT * FROM gaiadr1.gaia_source WHERE CONTAINS(POINT('ICRS',gaiadr1.gaia_source.ra,gaiadr1.gaia_source.dec),CIRCLE('ICRS',56.75,24.1167,2))=1 AND abs(pmra_error/pmra)<0.10 AND abs(pmdec_error/pmdec)<0.10 AND pmra IS NOT NULL AND abs(pmra)>0 AND pmdec IS NOT NULL AND abs(pmdec)>0;'
Retrieving async. results...

Again, inspect the output table and number of rows:

In [8]:
j = job2.get_results()
print (j['source_id']) 
    source_id    
-----------------
66623395256627712
66581957412169728
65614730777165824
64053561704835584
65828207832006400
66858862543722112
66863054431798272
66588520122169728
66657823714473856
66592437132341248
              ...
66912360656318208
66471215975411200
66729257611496704
64013704408496896
64013807487711360
66506331628024832
66715101399291392
66724447247218048
66610957031353856
66857281995760000
66570549979009280
Length = 218 rows

Plot these new filtered results on the same plot as the previous search:

In [9]:
plt.scatter(r['pmra'], r['pmdec'], color='r', alpha=0.3)
plt.scatter(j['pmra'], j['pmdec'], color='b', alpha=0.3)
plt.xlim(-60,80)
plt.ylim(-120,30)

plt.show()

Now we are going to take the candidate objects to be in the cluster. Based on the proper motion plot, we execute the same job with the following constraints on the proper motions in RA and DEC: pmra between 15 and 25, pmdec between -55 and -40:

In [10]:
job3 = Gaia.launch_job_async("SELECT * \
FROM gaiadr1.gaia_source \
WHERE CONTAINS(POINT('ICRS',gaiadr1.gaia_source.ra,gaiadr1.gaia_source.dec),CIRCLE('ICRS',56.75,24.1167,2))=1 \
AND abs(pmra_error/pmra)<0.10 \
AND abs(pmdec_error/pmdec)<0.10 \
AND pmra IS NOT NULL AND abs(pmra)>0 \
AND pmdec IS NOT NULL AND abs(pmdec)>0 \
AND pmra BETWEEN 15 AND 25 \
AND pmdec BETWEEN -55 AND -40;", dump_to_file=True)
Launched query: 'SELECT * FROM gaiadr1.gaia_source WHERE CONTAINS(POINT('ICRS',gaiadr1.gaia_source.ra,gaiadr1.gaia_source.dec),CIRCLE('ICRS',56.75,24.1167,2))=1 AND abs(pmra_error/pmra)<0.10 AND abs(pmdec_error/pmdec)<0.10 AND pmra IS NOT NULL AND abs(pmra)>0 AND pmdec IS NOT NULL AND abs(pmdec)>0 AND pmra BETWEEN 15 AND 25 AND pmdec BETWEEN -55 AND -40;'
Retrieving async. results...

Again, inspect the output table and number of rows, and call the job 'm45cluster':

In [11]:
m45cluster = job3.get_results() 
print (m45cluster['parallax']) 
     parallax     
    Angle[mas]    
------------------
7.4545648282310184
7.5239065408350996
6.9301093717323772
7.5788921365825894
 7.328286272889863
7.0233646883680159
8.1389671643305235
8.1445555234101779
7.3648468313130566
7.1694418609395765
               ...
7.2826513541337299
7.4681735932280162
 7.658985698496882
7.2023395286122085
8.2920957945484393
7.1461398221089798
7.8361028050064272
7.8762955633015945
 7.066701274479442
7.3330438493994405
Length = 106 rows

Plot these new filtered results on the same plot as the previous search:

In [12]:
plt.scatter(r['pmra'], r['pmdec'], color='r', alpha=0.3)
plt.scatter(j['pmra'], j['pmdec'], color='b', alpha=0.3)
plt.scatter(m45cluster['pmra'], m45cluster['pmdec'], color='g', alpha=0.3)
plt.xlim(-60,80)
plt.ylim(-120,30)

plt.show()

Calculate the average parallax and standard deviation of the parallax for the M45 cluster candidates:

In [13]:
avg_parallax = np.mean(m45cluster['parallax']) 
stddev_parallax = np.std(m45cluster['parallax']) 
print (avg_parallax, stddev_parallax)
7.4686695575 0.834822732559

Now, we want to add information from other catalogues, in this example from 2MASS. To do this we make use of the pre-computed cross-matched tables provided in the Gaia archive.

We obtain the 2MASS photometric data by using the Gaia - 2MASS cross-matched best neighbour table (gaiadr1.tmass_best_neighbour) to identify the sources and the 2MASS original table (gaiadr1.tmass_original_valid) to retrieve the photometry:

In [14]:
job4 = Gaia.launch_job_async("SELECT * \
FROM gaiadr1.gaia_source AS g, gaiadr1.tmass_best_neighbour AS tbest, gaiadr1.tmass_original_valid AS tmass \
WHERE g.source_id = tbest.source_id AND tbest.tmass_oid = tmass.tmass_oid \
AND CONTAINS(POINT('ICRS',g.ra,g.dec),CIRCLE('ICRS',56.75,24.1167,2))=1 \
AND abs(pmra_error/pmra)<0.10 \
AND abs(pmdec_error/pmdec)<0.10 \
AND pmra IS NOT NULL AND abs(pmra)>0 \
AND pmdec IS NOT NULL AND abs(pmdec)>0 \
AND pmra BETWEEN 15 AND 25 \
AND pmdec BETWEEN -55 AND -40;", dump_to_file=False)
Launched query: 'SELECT * FROM gaiadr1.gaia_source AS g, gaiadr1.tmass_best_neighbour AS tbest, gaiadr1.tmass_original_valid AS tmass WHERE g.source_id = tbest.source_id AND tbest.tmass_oid = tmass.tmass_oid AND CONTAINS(POINT('ICRS',g.ra,g.dec),CIRCLE('ICRS',56.75,24.1167,2))=1 AND abs(pmra_error/pmra)<0.10 AND abs(pmdec_error/pmdec)<0.10 AND pmra IS NOT NULL AND abs(pmra)>0 AND pmdec IS NOT NULL AND abs(pmdec)>0 AND pmra BETWEEN 15 AND 25 AND pmdec BETWEEN -55 AND -40;'
Retrieving async. results...
Query finished.

Finally, confirm the output table has Gaia and 2MASS photometry and check the number of rows in the table:

In [15]:
p = job4.get_results() 
print (p['phot_g_mean_mag', 'j_m', 'h_m', 'ks_m']) 
 phot_g_mean_mag        j_m            h_m            ks_m     
  Magnitude[mag]   Magnitude[mag] Magnitude[mag] Magnitude[mag]
------------------ -------------- -------------- --------------
10.757800849005008      9.6029997      9.2019997      9.0939999
6.0733019438444877      5.9679999      6.0510001      5.9759998
10.643169127592188      9.5349998      9.2189999      9.1370001
6.8275023878511796      6.6989999      6.7329998      6.6919999
 10.15654586177056      9.1169996          8.868      8.7580004
7.5371472521505787      7.2800002           7.29          7.257
8.1328652600027205      7.5879998      7.5279999      7.4699998
8.5268119167756886      7.8920002      7.7670002      7.7379999
9.0234235063721506      8.2360001          8.033      8.0019999
7.0246160054011426          6.848      6.9200001          6.895
               ...            ...            ...            ...
10.511033521483427      9.4770002      9.1400003      9.0719995
11.597470392962894         10.261          9.835      9.7299995
10.591447284586406      9.5310001      9.1920004      9.1230001
10.325731863675678      9.3500004          9.092      8.9960003
9.9103711452615997      8.9829998      8.7110004          8.632
9.3335117945827335          8.533      8.3290005      8.2819996
8.1126755868950937      7.6729999          7.599      7.5760002
6.3275477760376688      6.2280002      6.2480001          6.257
8.1878498252772722          7.526      7.3930001      7.3520002
11.149659997470195      9.9490004      9.5640001      9.4390001
Length = 106 rows

All of the above has been performed as an anonymous user to the Gaia archive. To log in to the archive, keep and share your results, see the following instructions: http://astroquery.readthedocs.io/en/latest/gaia/gaia.html#authenticated-access

Additional information

The above query to obtain the 2MASS catalogue data can also be performed by using an 'INNER JOIN' in the ADQL query. For example:

In [16]:
job5 = Gaia.launch_job_async("SELECT * \
FROM gaiadr1.gaia_source \
INNER JOIN gaiadr1.tmass_best_neighbour ON gaiadr1.gaia_source.source_id = gaiadr1.tmass_best_neighbour.source_id \
INNER JOIN gaiadr1.tmass_original_valid ON gaiadr1.tmass_original_valid.tmass_oid = gaiadr1.tmass_best_neighbour.tmass_oid \
WHERE CONTAINS(POINT('ICRS',gaiadr1.gaia_source.ra,gaiadr1.gaia_source.dec),CIRCLE('ICRS',56.75,24.1167,2))=1 \
AND abs(pmra_error/pmra)<0.10 \
AND abs(pmdec_error/pmdec)<0.10 \
AND pmra IS NOT NULL AND abs(pmra)>0 \
AND pmdec IS NOT NULL AND abs(pmdec)>0 \
AND pmra BETWEEN 15 AND 25 \
AND pmdec BETWEEN -55 AND -40;", dump_to_file=True)
Launched query: 'SELECT * FROM gaiadr1.gaia_source INNER JOIN gaiadr1.tmass_best_neighbour ON gaiadr1.gaia_source.source_id = gaiadr1.tmass_best_neighbour.source_id INNER JOIN gaiadr1.tmass_original_valid ON gaiadr1.tmass_original_valid.tmass_oid = gaiadr1.tmass_best_neighbour.tmass_oid WHERE CONTAINS(POINT('ICRS',gaiadr1.gaia_source.ra,gaiadr1.gaia_source.dec),CIRCLE('ICRS',56.75,24.1167,2))=1 AND abs(pmra_error/pmra)<0.10 AND abs(pmdec_error/pmdec)<0.10 AND pmra IS NOT NULL AND abs(pmra)>0 AND pmdec IS NOT NULL AND abs(pmdec)>0 AND pmra BETWEEN 15 AND 25 AND pmdec BETWEEN -55 AND -40;'
Retrieving async. results...

Confirm the output table has Gaia and 2MASS photometry and check the number of rows in the table is the same as above (106 rows):

In [17]:
test = job5.get_results() 
print (test['phot_g_mean_mag', 'j_m', 'h_m', 'ks_m']) 
 phot_g_mean_mag        j_m            h_m            ks_m     
  Magnitude[mag]   Magnitude[mag] Magnitude[mag] Magnitude[mag]
------------------ -------------- -------------- --------------
10.757800849005008      9.6029997      9.2019997      9.0939999
6.0733019438444877      5.9679999      6.0510001      5.9759998
10.643169127592188      9.5349998      9.2189999      9.1370001
6.8275023878511796      6.6989999      6.7329998      6.6919999
 10.15654586177056      9.1169996          8.868      8.7580004
7.5371472521505787      7.2800002           7.29          7.257
8.1328652600027205      7.5879998      7.5279999      7.4699998
8.5268119167756886      7.8920002      7.7670002      7.7379999
9.0234235063721506      8.2360001          8.033      8.0019999
7.0246160054011426          6.848      6.9200001          6.895
               ...            ...            ...            ...
10.511033521483427      9.4770002      9.1400003      9.0719995
11.597470392962894         10.261          9.835      9.7299995
10.591447284586406      9.5310001      9.1920004      9.1230001
10.325731863675678      9.3500004          9.092      8.9960003
9.9103711452615997      8.9829998      8.7110004          8.632
9.3335117945827335          8.533      8.3290005      8.2819996
8.1126755868950937      7.6729999          7.599      7.5760002
6.3275477760376688      6.2280002      6.2480001          6.257
8.1878498252772722          7.526      7.3930001      7.3520002
11.149659997470195      9.9490004      9.5640001      9.4390001
Length = 106 rows

Visually inspect the results are the same by plotting the same as above:

In [18]:
plt.scatter(r['pmra'], r['pmdec'], color='r', alpha=0.3)
plt.scatter(j['pmra'], j['pmdec'], color='b', alpha=0.3)
plt.scatter(m45cluster['pmra'], m45cluster['pmdec'], color='g', alpha=0.3)
plt.scatter(test['pmra'], test['pmdec'], color='y', alpha=0.3)
plt.xlim(-60,80)
plt.ylim(-120,30)

plt.show()