Artificial Intelligence Develops an Ear for Birdsong

We can learn a lot from nature if we listen to it more—and scientists around the world are trying to do just that. From mountain peaks to ocean depths, biologists are increasingly planting audio recorders to unobtrusively eavesdrop on the groans, shrieks, whistles and songs of whales, elephants, bats and especially birds. This summer, for example, more than 2,000 electronic ears will record the soundscape of California’s Sierra Nevada mountain range, generating nearly a million hours of audio. To avoid spending multiple human lifetimes decoding it, researchers are relying on artificial intelligence.

Such recordings can create valuable snapshots of animal communities and help conservationists understand, in vivid detail, how policies and management practices affect an entire population. Gleaning data about the number of species and individuals in a region is just the beginning. The Sierra Nevada soundscape contains crucial information about how last year’s historic wildfires affected birds living in different habitats and ecological conditions across the area. The recordings could reveal how various animal populations weathered the catastrophe and which conservation measures help species rebound more effectively.

Such recordings can also capture details about interactions between individuals in larger groups. For example, how do mates find each other amid a cacophony of consorts? Scientists can additionally use sound to track shifts in migration timing or population ranges. Massive amounts of audio data are pouring in from research elsewhere as well: sound-based projects are underway to count insects, study the effects of light and noise pollution on avian communities, track endangered species, and trigger alerts when recorders detect noise from illegal poaching or logging activities.

“Audio data is a real treasure trove because it contains vast amounts of information,” says ecologist Connor Wood, a Cornell University postdoctoral researcher, who is leading the Sierra Nevada project. “We just need to think creatively about how to share and access [that information].” This is a looming problem because it takes humans a long time to extract useful insights from recordings. Fortunately the latest generation of machine-learning AI systems—which can identify animal species from their calls—can crunch thousands of hours of data in less than a day.

“Machine learning has been the big game changer for us,” says Laurel Symes, assistant director of the Cornell Lab of Ornithology’s Center for Conservation Bioacoustics. She studies acoustic communication in animals, including crickets, frogs, bats and birds, and has amassed many months of recordings of katydids (famously vocal long-horned grasshoppers that are an essential part of the food web) in the rain forests of central Panama. Patterns of breeding activity and seasonal population variation are hidden in this audio, but analyzing it is enormously time-consuming: it took Symes and three of her colleagues 600 hours of work to classify various katydid species from just 10 recorded hours of sound. But a machine-learning algorithm her team is developing, called KatydID, performed the same task while its human creators “went out for a beer,” Symes says.

Machine-learning setups like KatydID are self-learning systems that use a neural network—“a really, really rough approximation of the human brain,” explains Stefan Kahl, a machine-learning expert at Cornell’s Center for Conservation Bioacoustics and Chemnitz University of Technology in Germany. He built BirdNET, one of the most popular avian-sound-recognition systems used today. Wood’s team will rely on BirdNET to analyze the Sierra Nevada recordings, and other researchers are using it to document the effects of light and noise pollution on the dawn chorus in France’s Brière Regional Natural Park.

Such systems start by analyzing many inputs—for instance, hundreds of recorded bird calls, each “labeled” with its corresponding species. The neural network then teaches itself which features can be used to associate an input (in this case, a bird’s call) with a label (the bird’s identity). With millions of extremely subtle features often involved, humans cannot even know what most of them are.

Older versions of detection software were semi-automatic. They scanned spectrograms—visual depictions of an audio signal—for established features such as frequency range and duration to identify a bird by its song. This works well for some species. The song of the northern cardinal, for example, consistently begins with a few long notes that rise in pitch, followed by quick, short notes with a distinct dip in pitch. It can easily be identified from a spectrogram, much like a composed song can be recognized from sheet music. But other avian calls are more complex and varied and can confound older systems. “You need much more than just signatures to identify the species,” Kahl says. Many birds have more than one song, and like other animals, they often have regional “dialects.” A white-crowned sparrow from Washington State sounds very different from its Californian cousins. Machine-learning systems can recognize such nuances. “Let’s say there’s an as yet unreleased Beatles song that is put out today. You’ve never heard the melody or the lyrics before, but you know it’s a Beatles song because that’s what they sound like,” Kahl explains. “That’s what these programs learn to do, too.”

These systems have, in fact, benefitted from recent advances in human-speech- and music-recognition technology. In collaboration with Andrew Farnsworth of the Cornell Lab of Ornithology, experts at New York University’s Music and Audio Research Laboratory drew on their musical experience to build a bird-call-identification system called BirdVox. It detects and identifies birds migrating at night and distinguishes birdsong from background noises, including frog and insect calls, human ground and air transport, and sources such as wind and rain—all of which can be surprisingly loud and variable.

How well each system learns depends a great deal on the quantity of available prelabeled recordings. A wealth of such data already exists for common birds. Kahl estimates about 4.2 million recordings are available online for 10,000 species. But most of the 3,000-odd species BirdNET can identify are found in Europe and North America, and BirdVox further narrows its focus to the songs of U.S.-based birds.

“In other places, for rarer species or ones that don’t have well-classified data, [BirdNET] doesn’t work as well,” says India-based ecologist V. V. Robin. He is hot on the trail of the Jerdon’s courser, a critically endangered nocturnal bird that has not been officially spotted for about a decade. Robin and his collaborators have placed recorders in a southern Indian wildlife sanctuary to try to capture its call. He has also been recording birds in the hills of the Western Ghats, a global biodiversity hotspot also in southern India, since 2009. These recordings are painstakingly annotated to train locally developed machine-learning algorithms.

Citizen scientists can also help fill gaps in the birdsong repository. BirdNET powers a smartphone app that has been a big hit with amateur birders. They record snippets of audio and submit them to the app, which tells them the singer’s species—and adds the recording to the researchers’ database. More than 300,000 recordings have been coming in daily, Kahl says.

These machine-learning algorithms still have room for improvement. Although they analyze audio much more quickly than humans, they still lag behind in sifting through overlapping sounds to home in on a signal of interest. Some researchers see this as the next problem for AI to tackle. Even the current imperfect versions, however, enable sweeping projects that would be far too time-consuming for humans to tackle alone. “As ecologists,” Wood says, “tools like BirdNET allow us to dream big.”

Hannah