Can Artificial Intelligence really help us talk to the animals?

A California-based organisation hopes to use machine learning to decipher animal communication on a global scale. However, there are many who have their doubts about the initiative.
Raskin is the co-founder and president of the Earth Species Project (ESP), a non-profit organisation with the audacious goal of deciphering non-human communication using machine learning, making all the knowledge available to the public, and strengthening our bond with other living species while also promoting their protection. The movement that led to the outlawing of commercial whaling was sparked by an album of whale songs released in 1970. What could an animal version of Google Translate produce?
However, ESP claims that their strategy is distinct since it focuses on deciphering all species’ communications rather than just one. Raskin agrees that social species like primates, whales, and dolphins are more likely to engage in complex, symbolic communication, but the ultimate objective is to create tools that may be used across the board in the animal kingdom. Raskin declares, “We don’t care about species. The methods we create are applicable to all of life, from worms to whales.”
According to Raskin, research has proven that machine learning may be used to translate between several, often distant human languages without the need for any prior knowledge. This is the “motivating intuition” for ESP.
The creation of an algorithm to represent words in a physical location is the first step in this process. The distance and direction between the points (words) in this multidimensional geometric representation define their meaningful relationships with one another (their semantic relationship). For instance, the distance and direction between “king” and “man” are the same as those between “woman” and “queen.” (The mapping is not done by understanding what the words imply but rather by examining, for instance, how frequently they appear next to one another.)
Later, it was discovered that these “shapes” are consistent across languages. Then, in 2017, two separate research teams separately discovered a method that allowed for translation by aligning the forms. Align the forms of the words to locate the Urdu point that is closest to the English word’s point. Raskin claims that most words can be properly translated.
The goal of ESP is to develop these types of animal communication representations, focusing on both individual species and a large number of species simultaneously, and then investigate issues like if there is overlap with the universal human form. According to Raskin, we don’t know how animals see the world, but it appears that some share our feelings with us and may even talk to other members of their species about them. The sections where the forms overlap and we can immediately converse or translate, or the parts where we can’t, I’m not sure which will be more fantastic.

Animals may communicate nonverbally as well, he continues. Bees, for instance, use their “waggle dance” to signal to other animals the location of a flower. It will be necessary to translate between other communication channels as well.
Raskin agrees that the objective is “like travelling to the moon,” but the intention is also not to arrive there all at once. Instead, ESP’s roadmap focuses on resolving a number of smaller issues that must be resolved in order to realise the greater objective. This should lead to the creation of broad tools that can assist researchers who are attempting to use AI to discover the mysteries of the species they are studying.
For instance, the so-called “cocktail party dilemma” in animal communication, where it is challenging to identify which individual within a group of the same animals is vocalising in a loud social situation, was the subject of a recent work published by ESP (and shared with the public).
Raskin claims that no one has ever completed this end-to-end detangling of animal sound. The AI-based model created by ESP, which was tested on bat vocalisations, macaque coo calls, and dolphin signature whistles, performed best when the calls came from the individuals the model had been trained on; however, with larger datasets, it was able to separate mixtures of calls from animals that were not in the training cohort.
Another research uses humpback whales as a test species to create unique animal noises using AI. The innovative calls may then be played back to the animals to observe how they react. They are created by breaking vocalisations into micro-phonemes, which are discrete units of sound lasting one-tenth of a second. Raskin claims that if AI can distinguish between random and semantically significant changes, it will help humanity move toward meaningful communication. Even if we don’t yet understand the language, it involves having AI speak it.
Another study intends to create an algorithm that determines the number of call types a species may use by using self-supervised machine learning, which does not require human specialists to categorise the data in order to identify trends. The Hawaiian crow is a species that, according to Christian Rutz, a professor of biology at the University of St. Andrews, has the ability to make and use tools for foraging and is thought to have a significantly more complex set of vocalisations than other crow species. In an early test case, the system will mine audio recordings made by a team led by Rutz to produce an inventory of the vocal repertoire of the Hawaiian crow.
In particular, Rutz is enthusiastic about the project’s conservation potential. Only found in captivity, where it is being raised in preparation for reintroduction to the wild, the Hawaiian crow is a species that is severely endangered. It is hoped that by comparing recordings from various times, it will be possible to determine whether the species’ call repertoire is deteriorating in captivity. For example, certain alarm calls may have been lost, which could have an impact on its reintroduction. That loss might be addressed with intervention. Rutz asserts that the technology “may provide a step change in our capacity to help these birds come back from the edge” and that manually identifying and categorising the sounds would be labor- and error-intensive.
Another effort aims to automatically decipher the functional significance of vocalisations. It is being worked on in Professor Ari Friedlaender’s lab at the University of California, Santa Cruz, which specialises in ocean sciences. One of the biggest tagging programs in the world is handled by the lab, which also analyses how wild marine animals interact underwater despite being impossible to witness directly. The animals are equipped with tiny electronic “biologging” devices that record their location, kind of movements, and even what they observe (the devices can incorporate video cameras). The lab also has information from underwater sound recordings that were put deliberately.

The goal of ESP is to first use self-supervised machine learning to analyse tag data to automatically determine what an animal is doing (such as eating, sleeping, moving, or socialising), and then add audio data to determine whether calls associated with that behavior can be given functional meaning. (Following playback trials, results might be verified using calls that have already been decoded.) This method will be used to analyse data from humpback whales in the beginning since the lab has tagged multiple members of the same group, making it feasible to see the transmission and reception of signals. Friedlaender claims that he “reached the ceiling” in terms of what the data could be extracted with the methods at hand. The researcher said, “Our aim is that the work ESP can undertake will bring fresh insights”.
However, not everyone is as optimistic about the potential of AI to accomplish such lofty goals. Robert Seyfarth is an emeritus psychology professor at the University of Pennsylvania who has spent more than 40 years researching social behavior and vocal communication in monkeys in their natural environment. While he thinks machine learning can be helpful for some issues, including detecting an animal’s vocal repertoire, he is skeptical that it will offer much in terms of understanding the meaning and purpose of vocalisations.
He argues that the issue is that while many animals can have sophisticated, complex communities, their sound repertoire is far less than that of humans. The end result is that the same sound may be used to indicate different things in different settings, and the only way to determine meaning is by understanding the context – the individual’s calling, their relationships with others, their position in the hierarchy, and the people they have dealt with. These AI techniques, in my opinion, are just insufficient, argues Seyfarth. You must walk outside and see the wildlife.
The idea that animal communication would resemble human communication in any significant sense is also contested. It is one thing to use computer-based studies on human language, with which we are so accustomed, claims Seyfarth. But applying it to other animals might often be “very different”. According to Kevin Coffey, a neurologist at the University of Washington and co-creator of the DeepSqueak algorithm, “it is a fascinating notion, but it is a significant reach.”
Raskin recognises that AI might not be sufficient on its own to enable interspecies communication. However, he makes reference to studies that have revealed that many animals interact in ways that are “more intricate than humans have ever dreamed.” Our inability to obtain enough data and analyse it comprehensively, as well as our own restricted vision, have been the major roadblocks. He explains, “These are the instruments that enable us to remove the human spectacles and comprehend whole communication networks.”
Author of this article is Zoë Corbyn,
journalist at The Guardian / The Observer,
published on www.theguardian.com on 31 July 2022