Machine learning for big data in galactic archaeology

par Loana Ramuz
Université de Strasbourg - Master 2 Astrophysique 2020

précédent sommaire suivant

Bitcoin is a swarm of cyber hornets serving the goddess of wisdom, feeding on the fire of truth, exponentially growing ever smarter, faster, and stronger behind a wall of encrypted energy

1.3 Machine learning for big data in Galactic archaeology

Machine learning is a part of computer science that takes advantage of big data to identify links between objects or entities. It usually deals with image-type data but can work with all types of situations, from text to sound waves, and including tabular data. It can be seen as an algorithm, also called a neural network, which propagates knowledge coming from a dataset to all data of the same form. For example, the work of Thomas et al. 2019 [3] consisted of creating a neural network to distinguish dwarf and giant stars and estimate their metallicities and distances. This network was trained on the SEGUE spectrometric dataset and the Gaia DR2 and SDSS photometric dataset, meaning it learned to recognize metallicities given by SEGUE from the associated photometric data in Gaia and SDSS. Then it was used on Pan-STARRS, CFIS and Gaia data to estimate metallici-ties for objects where this information was unknown. This net obtained good estimations for metallicities, with a precision of ó = 0.15dex. For comparison, a non-machine learning way of getting metallicity estimation is using polynomial fitting, tested by Ibata et al. 2017b [4]. This technique, obtained a precision of ó = 0.20dex. This is a proof that machine learning is promising for Galactic archaeology.

The aim of this internship was to create a neural network to retrieve metallicity from photometric data. It builds on Thomas' work, but instead of distinguishing giants and dwarfs to then determine their metallicity, it directly tackles metallicity. For this, introductory lessons from fast.ai, a programming language based on Python and its PyTorch module which simplifies machine learning, were taken to first become acquainted with this technology, leading to a first neural network on simulated data to test linear regression. Section 2 defines all the vocabulary needed to understand our work and describes our first linear net as an example. Then we dealt with real data, this first simple net was adapted to the real situation and the first results came, as shown in Section 3. To get better results, we set aside the fast.ai modules and developed a more complex net adapted from the work of Diakogionnis et al. 2020 [5] with the most enlightening help of Foivos Di-akogiannis himself. Once the neural net reaches convergence, distances were computed, and we compare our results to those presented by Iveziéet al. 2008 [2] to map our Milky Way. We finally aim to develop an auto-encoder, meaning an adaptation of the net to a less complete photometric dataset which predicts spetrcometric parameters and gives an estimation of the missing colour.

2 First test: a neural network for linear regression

Machine learning consists of algorithms which automatically learn to identify features in an entry set of data and associate it to one or several output(s). It can be used for two aims: classification when it tackles a categorical output, as done by Thomas et al. 2019 [3] for distinguishing dwarf stars from giants, and estimating values for a continuous output, like for example a value of metallicity for our objects. In our situation, the net we created associates photometric values to a value of metallicity, so it can be dealt with as a regression situation.

précédent sommaire suivant

"Soit réservé sans ostentation pour éviter de t'attirer l'incompréhension haineuse des ignorants" Pythagore