Artificial intelligence is classifying real supernova explosions without the traditional use of spectra, thanks to a team of astronomers at the Center for Astrophysics | Harvard & Smithsonian. The complete data sets and resulting classifications are publicly available for open use.
By training a machine learning model to categorize supernovae based on their visible characteristics, the astronomers were able to classify real data from the Pan-STARRS1 Medium Deep Survey for 2,315 supernovae with an accuracy rate of 82-percent without the use of spectra.
The astronomers developed a software program that classifies different types of supernovae based on their light curves, or how their brightness changes over time. “We have approximately 2,500 supernovae with light curves from the Pan-STARRS1 Medium Deep Survey, and of those, 500 supernovae with spectra that can be used for classification,” said Griffin Hosseinzadeh, a postdoctoral researcher at the CfA and lead author on the first of two papers published in The Astrophysical Journal. “We trained the classifier using those 500 supernovae to classify the remaining supernovae where we were not able to observe the spectrum.”
Edo Berger, an astronomer at the CfA explained that by asking the artificial intelligence to answer specific questions, the results become increasingly more accurate. “The machine learning looks for a correlation with the original 500 spectroscopic labels. We ask it to compare the supernovae in different categories: color, rate of evolution, or brightness. By feeding it real existing knowledge, it leads to the highest accuracy, between 80- and 90-percent.”
Although this is not the first machine learning project for supernovae classification, it is the first time that astronomers have had access to a real data set large enough to train an artificial intelligence-based supernovae classifier, making it possible to create machine learning algorithms without the use of simulations.
“If you make a simulated light curve, it means you are making an assumption about what supernovae will look like, and your classifier will then learn those assumptions as well,” said Hosseinzadeh. “Nature will always throw some additional complications in that you did not account for, meaning that your classifier will not do as well on real data as it did on simulated data. Because we used real data to train our classifiers, it means our measured accuracy is probably more representative of how our classifiers will perform on other surveys.” As the classifier categorizes the supernovae, said Berger, “We will be able to study them both in retrospect and in real-time to pick out the most interesting events for detailed follow up. We will use the algorithm to help us pick out the needles and also to look at the haystack.”
The project has implications not only for archival data, but also for data that will be collected by future telescopes. The Vera C. Rubin Observatory is expected to go online in 2023, and will lead to the discovery of millions of new supernovae each year. This presents both opportunities and challenges for astrophysicists, where limited telescope time leads to limited spectral classifications.
“When the Rubin Observatory goes online it will increase our discovery rate of supernovae by 100-fold, but our spectroscopic resources will not increase,” said Ashley Villar, a Simons Junior Fellow at Columbia University and lead author on the second of the two papers, adding that while roughly 10,000 supernovae are currently discovered each year, scientists only take spectra of about 10-percent of those objects. “If this holds true, it means that only 0.1-percent of supernovae discovered by the Rubin Observatory each year will get a spectroscopic label. The remaining 99.9-percent of data will be unusable without methods like ours.”
Unlike past efforts, where data sets and classifications have been available to only a limited number of astronomers, the data sets from the new machine learning algorithm will be made publicly available. The astronomers have created easy-to-use, accessible software, and also released all of the data from Pan-STARRS1 Medium Deep Survey along with the new classifications for use in other projects. Hosseinzadeh said, “It was really important to us that these projects be useful for the entire supernova community, not just for our group. There are so many projects that can be done with these data that we could never do them all ourselves.” Berger added, “These projects are open data for open science.”
This project was funded in part by a grant from the National Science Foundation (NSF) and the Harvard Data Science Initiative (HDSI).