UR: Using Machine Learning to Identify Transients in the DESI Survey – Astrobites

  • Lauren
  • April 15, 2021
  • Comments Off on UR: Using Machine Learning to Identify Transients in the DESI Survey – Astrobites

The undergrad research series is where we feature the research that you’re doing. If you’ve missed the previous installments, you can find them under the “Undergraduate Research” category here.
Are you doing an REU this summer? Were you working on an astro research project during this past school year? If you, too, have been working on a project that you want to share, we want to hear from you! Think you’re up to the challenge of describing your research carefully and clearly to a broad audience, in only one paragraph? Then send us a summary of it!
You can share what you’re doing by clicking here and using the form provided to submit a brief (fewer than 200 words) write-up of your work. The target audience is one familiar with astrophysics but not necessarily your specific subfield, so write clearly and try to avoid jargon. Feel free to also include either a visual regarding your research or else a photo of yourself.
We look forward to hearing from you!
Amanda Wasserman
University of Rochester

Amanda Wasserman is a senior undergraduate Physics and Astronomy major at the University of Rochester. She has accomplished this research working with Professor Segev BenZvi, and it will result in her senior thesis.
Over the next five years, the Dark Energy Spectroscopic Instrument (DESI) will observe the spectra of 35 million galaxies and quasars. By chance, a small percentage of these galaxies will contain supernovae and other transients that are visible in the galactic spectra. I have worked to develop machine learning tools to identify and classify transients in galaxy spectra measured with DESI. The goal of my research is to create a Transient Identification Pipeline that will automate the identification of contaminated spectra from plain galactic spectra. Classifying transient spectra will allow us to ensure correct estimates of the host redshifts (a measure of the distance to the galaxy) and notify fellow collaborations of the astrophysical phenomena as they occur. 
The algorithm we created is a multilabel convolutional neural network (CNN), a method that classifies inputs based on features in the data, with four layers that trained on a variety of simulated supernovae and hosts. The CNN inputs preprocessed spectra and outputs its most likely classification between hosts and supernovae including type Ia, type Ib, type Ic, type IIn, and type IIp. The classifier performs exceptionally well on our simulated data. When looking at the spectra that the CNN classified with high certainty, we attain an accuracy of over 99%. 
Our next goal for the project is to incorporate anomaly detection into the classifier to potentially identify new astrophysical phenomena. Additionally, DESI has just started to observe again after a hiatus due to COVID-19. As data comes in, we will adjust our pipeline to accommodate any differences between our modeled spectra and observed spectra. We look forward to applying our pipeline to real data and expect to find over 1,000 transients per year.
Figure 1: The confusion matrix (a metric for analyzing the accuracy of our algorithm) for our validation set of simulated spectra. On the x-axis is the label that the classifier predicts the spectrum to be and the y-axis is what the spectrum truly is. The boxes along the diagonal show the fraction of spectra that have been correctly classified. As our confusion matrix is extremely diagonal, we see that our algorithm is accurately identifying transients and labeling them correctly. If you are an undergraduate that took part in an REU this summer and would like to share your research on Astrobites, please contact us at [email protected]

Source: https://astrobites.org/2021/04/15/ml-transients/