t-SNE as a tool for studying clustering in the elemental abundance space

One of the main motivations for the GALAH survey is to measure abundances of many elements in sufficiently large number of stars that some of them can be identified as stars that were born in the same cluster but all indications of this fact have been lost, except for the chemical fingerprint. Chemical tagging can reveal the connection between such stars, but state of the art observations and analytical methods will be needed to actually perform this task.

t-SNE is a powerful tool to make projections of multi-dimensional data into a smaller number of dimensions where it can be studied with methods that perform poorly in the original number of dimensions. t-SNE, in contrast to many other dimension reduction methods, produces a projection that is visually appealing, and conserves the clustering of the original many-dimensional data-set in the reduced number of dimensions. The latter means that if there are distinct clusters in the many-dimensional space, they will remain distinct in the projected space, where they are easy to identify and compare to other clusters. This tool seems suitable for the search for chemical groups in the GALAH database. At this moment there are abundances available for~ 200 000 stars and 13 elements. Even-though the current dataset is 1/5 the size of the planned final sample and only half of the targeted elements have calculated abundances, it is more than suitable for the analysis with big-data-science methods like t-SNE.

The preliminary analysis shows between tens and a hundred different groups in the element abundance space and some of them have quickly been identified as open clusters (M67 and Pleiades). This means that the chemical tagging works even when the method to identify the groups completely ignores any stellar parameters, positions and kinematics, except the elemental abundances. We plan to explore the possibilities t-SNE offers, improve the dimension-reduction-projections, explore other identified groups, and converge toward a case study or proof of concept based on the data we have at the moment. It is not expected that many/any new groups will be found, but there are plenty of clusters and known streams in the dataset to test the proposed method.

Paper PDF: