This year I would like students to explore the new Gaia data, released in September, 2016, as data release 1. The entire one billion sources can be downloaded directly from the Gaia Archive. However, those data are not in an easily readable and 'explorable' form and, besides, "only" about 2 million of the Gaia sources have measured parallaxes. So instead, I would like to introduce you to the Topcat virtual observatory (VO). This software allows you to access and select on any number of data bases, and manipulate or plot them. Follow the instructions on the website to download the software. This requires that you have Java, see here for more information. Then go through one (or more) of the on-line tutorials at your leisure (see Further Information on the Topcat page) to become familiar with the system. An extensive manual can be found here. You will need about 1 Gb of disk space.
PART 1 (Plotting Gaia data)
Start Topcat which opens the main window. Click on VO and then Table Access Protocol (TAP) Query. This opens up a new TAP Query window. Type in gaia as a keyword. hit return and then double-click on GAIA to see the options. Under gaiadr1, double-click on gaia_source. From the Columns tab you will see the list of columns that can be uploaded. Notice that there is a region on the gui (graphic user interface) called Service Capabilities. A maximum number of rows (3 million) and upload size (100Mb) are specified. There is also a Mode listed whose default is Synchronous. If this is changed to Asynchronous, then Topcat will keep querying the server in the event that there are issues with the connection.
If you now click on Examples (bottom of gui) you'll see some examples as to how to load data sets. Clicking on Hints (top right tab) provides other examples. A summary document of ADQL (Astronomical Data Query Language) examples has also been prepared by Markus Demleitner of the University of Heidelberg. As a starter, try the command
SELECT TOP 1000 * FROM gaia_source
and look at the result by double-clicking on the table in Table List. This query picks the first 1000 sources in gaia dr1. Explore the data as desired. When you are finished with it, you can delete the table by highlighting it in the Table List and hitting your Delete key.
Now let's be more specific. Try extracting all sources that have measured parallaxes and load only a selected number of columns
SELECT
TOP 3000000
source_id, ra, dec, parallax, parallax_error, pmra, pmra_error, pmdec, pmdec_error, phot_g_mean_mag, l, b
FROM gaia_source
WHERE
parallax > 0
Depending on the size of the data set, you may have to wait a bit. When the table has loaded, take a look at the result. Scroll down to the bottom and confirm that there are 2,026,210 rows and that all of them have values in the parallax column. Note that the table is sorted according to increasing parallax.
Click on the Sky plotting window icon of the main Topcat window and check that you are plotting RA and DEC positions of all the stars on a sphere. Add a caption to the image, change to a different colour, set to a density plot and flip the shades. Print the plot (FIGURE 1).
Now redo the plot in Galactic coordinates as an aitoff projection. Colour this plot in whatever manner you find pleasing, show gridlines for every 2 hours of longitude and 15 degrees of latitude, ensuring that the coordinates are easily readable, and print the plot (FIGURE 2). There is an arc between approximately 0 and 2 hours of longitude and 0 and 20 degrees of latitude. By searching other information sources, write a few sentences of explanation as to what this arc is. Does it correspond to more stars or fewer stars in your plot? If more, why, and if fewer also why?
Under Axes, FOV, type in Galactic Center (GC) and click on Resolve to get its Equatorial coordinates. Click Submit to orient and zoom in on that region until you see a size of 0.25 degrees square and print the plot (FIGURE 3). How many stars did Gaia observe within this square region? How many steradians does this correspond to and what fraction of the entire sky is this?
Note that left-clicking on any point in a plot will highlight the tabular entry that corresponds to that point.
PART 2 (The Malmquist Bias)
In this part, we want to manipulate the data set. From your Table Browser, right click to see New Synthetic Column (green plus sign) and add a column to the table,calling it Distance, in kpc units. Enter a mathematical expression to convert from parallax to distance. Then add another column, calling it g_M (absolute g magnitude), and convert from apparent g magnitude to g_M using the distance that you've just calculated.
From the main window, click on the Plane Plot Window, and plot g_M as a function of Distance (FIGURE 4). Use a density plot with an informative colour scaling and appropriate axes scaling (log or linear). From the resulting plot, what is the most common stellar distance of the Gaia mission (approximate -- I'm not asking for detailed statistics yet)? Mark on the distance axis, the distance to the LMC and our nearby companion, M 31.
PART 3 (Subsets)
Create a new column which contains the errors in distance, call it Distance_error. Suppose we want to disregard any source for which the error in the distance is greater than the distance (or equivalently the error in the parallax is greater than the parallax). Click on the Display row subsets icon in the main window. In the newly launched window click on Define a new subset using algebraic expression (leftmost green cross) and specify the criterion that the distance error should be less than the distance. Call it Most-reliable-points.
Create a new plot of this subset alone (FIGURE 5), rescaling the plot to show all plotted data (blue cross icon in the Plane Plot window). How many sources are in this image and what fraction of the total does it correspond to? What is the upper distance 'cutoff'?
Zoom in on any region of the plot so that individual points can be seen (not blended). Click the Define a new row subset containing only currently visible points icon (third from left icon on the Plane Plot) and plot only this region. Give the subset a name and click on Add Subset. Plot this subset with its Distance error bars included (FIGURE 6).
PART 4 (3D-Plot)
Plot (again a density plot with a useful colour scheme) the Most-reliable-points subset in a 3D spherical coordinate system (sphere icon in the main menu) with coordinates of Galactic longitude, Galactic latitude, and Distance. The Z Center (Distance) will be set to zero by default (location of the Sun). Rotate the cube (hold down left mouse button to do so) and verify that the sky coverage is approximately uniform in all directions. Orient the image so that Galactic latitude is in the vertical direction (positive b up), a Galactic longitude of 180 degrees is at the right and Galactic longitude of 0 degrees is at the left (approx). (Recall that if you click on a point, you can see the associated tabular value highlighted in the table). Print the plot (FIGURE 7), labeling these axes. You should now see an asymmetry along the b = 0 axis. Suggest a reason as to why that would be the case.
PART 5 (Statistics)
Create 2 new subsets, both of which contain only the Most-reliable-points; one of these subsets will contain all points with longitudes from 270 degrees to 90 degrees) and the second with longitudes from 90 to 270 degrees). In the main window, click on the large Sigma icon (statistics). Compute the statistics for the Most-reliable-points subset, as well as the two subsets. Print out the tables (TABLES 1, 2 and 3) in an easily readable form. Is there any statistically significant difference between the mean distance of stars interior or exterior to the Sun?
Hand in your plots with details explaining what parameters you have used and questions answered. Also hand in your 3 Tables as well as the 1st 50 rows of the Most-reliable-points subset table. Save your session , since we may use these data again later.
PART 1 (Plotting Gaia data)
Start Topcat which opens the main window. Click on VO and then Table Access Protocol (TAP) Query. This opens up a new TAP Query window. Type in gaia as a keyword. hit return and then double-click on GAIA to see the options. Under gaiadr1, double-click on gaia_source. From the Columns tab you will see the list of columns that can be uploaded. Notice that there is a region on the gui (graphic user interface) called Service Capabilities. A maximum number of rows (3 million) and upload size (100Mb) are specified. There is also a Mode listed whose default is Synchronous. If this is changed to Asynchronous, then Topcat will keep querying the server in the event that there are issues with the connection.
If you now click on Examples (bottom of gui) you'll see some examples as to how to load data sets. Clicking on Hints (top right tab) provides other examples. A summary document of ADQL (Astronomical Data Query Language) examples has also been prepared by Markus Demleitner of the University of Heidelberg. As a starter, try the command
SELECT TOP 1000 * FROM gaia_source
and look at the result by double-clicking on the table in Table List. This query picks the first 1000 sources in gaia dr1. Explore the data as desired. When you are finished with it, you can delete the table by highlighting it in the Table List and hitting your Delete key.
Now let's be more specific. Try extracting all sources that have measured parallaxes and load only a selected number of columns
SELECT
TOP 3000000
source_id, ra, dec, parallax, parallax_error, pmra, pmra_error, pmdec, pmdec_error, phot_g_mean_mag, l, b
FROM gaia_source
WHERE
parallax > 0
Depending on the size of the data set, you may have to wait a bit. When the table has loaded, take a look at the result. Scroll down to the bottom and confirm that there are 2,026,210 rows and that all of them have values in the parallax column. Note that the table is sorted according to increasing parallax.
Click on the Sky plotting window icon of the main Topcat window and check that you are plotting RA and DEC positions of all the stars on a sphere. Add a caption to the image, change to a different colour, set to a density plot and flip the shades. Print the plot (FIGURE 1).
Now redo the plot in Galactic coordinates as an aitoff projection. Colour this plot in whatever manner you find pleasing, show gridlines for every 2 hours of longitude and 15 degrees of latitude, ensuring that the coordinates are easily readable, and print the plot (FIGURE 2). There is an arc between approximately 0 and 2 hours of longitude and 0 and 20 degrees of latitude. By searching other information sources, write a few sentences of explanation as to what this arc is. Does it correspond to more stars or fewer stars in your plot? If more, why, and if fewer also why?
Under Axes, FOV, type in Galactic Center (GC) and click on Resolve to get its Equatorial coordinates. Click Submit to orient and zoom in on that region until you see a size of 0.25 degrees square and print the plot (FIGURE 3). How many stars did Gaia observe within this square region? How many steradians does this correspond to and what fraction of the entire sky is this?
Note that left-clicking on any point in a plot will highlight the tabular entry that corresponds to that point.
PART 2 (The Malmquist Bias)
In this part, we want to manipulate the data set. From your Table Browser, right click to see New Synthetic Column (green plus sign) and add a column to the table,calling it Distance, in kpc units. Enter a mathematical expression to convert from parallax to distance. Then add another column, calling it g_M (absolute g magnitude), and convert from apparent g magnitude to g_M using the distance that you've just calculated.
From the main window, click on the Plane Plot Window, and plot g_M as a function of Distance (FIGURE 4). Use a density plot with an informative colour scaling and appropriate axes scaling (log or linear). From the resulting plot, what is the most common stellar distance of the Gaia mission (approximate -- I'm not asking for detailed statistics yet)? Mark on the distance axis, the distance to the LMC and our nearby companion, M 31.
PART 3 (Subsets)
Create a new column which contains the errors in distance, call it Distance_error. Suppose we want to disregard any source for which the error in the distance is greater than the distance (or equivalently the error in the parallax is greater than the parallax). Click on the Display row subsets icon in the main window. In the newly launched window click on Define a new subset using algebraic expression (leftmost green cross) and specify the criterion that the distance error should be less than the distance. Call it Most-reliable-points.
Create a new plot of this subset alone (FIGURE 5), rescaling the plot to show all plotted data (blue cross icon in the Plane Plot window). How many sources are in this image and what fraction of the total does it correspond to? What is the upper distance 'cutoff'?
Zoom in on any region of the plot so that individual points can be seen (not blended). Click the Define a new row subset containing only currently visible points icon (third from left icon on the Plane Plot) and plot only this region. Give the subset a name and click on Add Subset. Plot this subset with its Distance error bars included (FIGURE 6).
PART 4 (3D-Plot)
Plot (again a density plot with a useful colour scheme) the Most-reliable-points subset in a 3D spherical coordinate system (sphere icon in the main menu) with coordinates of Galactic longitude, Galactic latitude, and Distance. The Z Center (Distance) will be set to zero by default (location of the Sun). Rotate the cube (hold down left mouse button to do so) and verify that the sky coverage is approximately uniform in all directions. Orient the image so that Galactic latitude is in the vertical direction (positive b up), a Galactic longitude of 180 degrees is at the right and Galactic longitude of 0 degrees is at the left (approx). (Recall that if you click on a point, you can see the associated tabular value highlighted in the table). Print the plot (FIGURE 7), labeling these axes. You should now see an asymmetry along the b = 0 axis. Suggest a reason as to why that would be the case.
PART 5 (Statistics)
Create 2 new subsets, both of which contain only the Most-reliable-points; one of these subsets will contain all points with longitudes from 270 degrees to 90 degrees) and the second with longitudes from 90 to 270 degrees). In the main window, click on the large Sigma icon (statistics). Compute the statistics for the Most-reliable-points subset, as well as the two subsets. Print out the tables (TABLES 1, 2 and 3) in an easily readable form. Is there any statistically significant difference between the mean distance of stars interior or exterior to the Sun?
Hand in your plots with details explaining what parameters you have used and questions answered. Also hand in your 3 Tables as well as the 1st 50 rows of the Most-reliable-points subset table. Save your session , since we may use these data again later.