How the human talent for charades helps explain the origin of language

Language gives us the power to describe, virtually without limit, the countless entities, actions, properties, and relations that compose our experience, real and imagined. But what is the origin of this power? What gave rise to humankind’s ability to use words to convey meanings?

Traditionally, scholars interested in this question have focused on trying to explain language as an arbitrary symbolic code. If you take an introductory course in linguistics, you are certain to learn the foundational doctrine known as ‘the arbitrariness of the sign’, laid out in the early 20th century by the Swiss linguist Ferdinand de Saussure. This principle states that words are meaningful simply as a matter of convention. As the psychologists Steven Pinker and Paul Bloom have explained it: ‘There is no reason for you to call a dog dog rather than cat except for the fact that everyone else is doing it.’ The corollary of arbitrariness is that the forms of words do not bear any resemblance to their meaning; that is, they are not iconic. ‘The word “salt” is not salty nor granular; “dog” is not “canine”; “whale” is a small word for a large object; “microorganism” is the reverse,’ observed the linguist Charles Hockett in his classic article ‘The Origins of Speech’ (1960).

But this raises a conundrum – what is known in philosophy as ‘the symbol grounding problem’. If words are arbitrary and purely a matter of convention, then how did they come to be established in the first place? In practical terms: how did our ancestors create the original words? This is a challenging question to answer. Scientists have little direct knowledge of the prehistoric origins of today’s approximately 7,000 spoken languages, at least tens of thousands of years ago. We do, however, know an increasing amount about how people create and develop new sign languages.

Sign languages – which are articulated primarily by visible gestures of the hands, body and face – turn out to be far more common than previously realised, with a roughly estimated 200 such languages used by deaf and hearing people around the globe. Crucially, sign languages are, absolutely, languages, every bit as complex and expressive as their spoken counterparts. And sign languages are much younger than spoken languages, just a few hundred years old at most, making their origins more transparent. Indeed, within just the past few decades, scientists have actually observed the early formation of entirely new sign languages – a process that happens spontaneously when deaf people who are deprived of a sign language have the opportunity to live together and communicate freely with each other.

Traffic signs, food packaging, emojis, instruction manuals, maps … wherever there are people communicating, you will find iconicity.

So how do they do it? How do deaf people first establish a shared set of meaningful signs? Their solution is an intuitive one. Without access to a sign language, deaf people communicate in essentially the same way that people do when they travel to a place where they don’t speak any of the local languages, or when they play a game of ‘charades’. Tasked to communicate without words, the human strategy is universal: we act out our meaning, pantomiming actions and using our hands and bodies to depict the sizes, shapes, and spatial relationships of referents. The first signers of the Nicaraguan Sign Language, for example, appear to have established the sign for ‘watermelon’ by first pantomiming the action of holding and eating a slice of the fruit, and then the action of spitting out a seed, using their index finger to trace its imaginary path from the signer’s mouth to the ground. Once understanding is achieved – a recognisable mapping between form and meaning – signers can turn a pantomime into a conventional symbol that is shared within their community.

Key to this process of forming new symbols is the use of iconicity – the creation of signs that are intrinsically meaningful because they somehow resemble what they are intended to mean. Iconicity, that connection between form and meaning, is a powerful force for communication, enabling people to understand each other across linguistic divides. Notably, iconicity is not limited to gesturing; it pervades our graphic communication too. Traffic signs, food packaging, emojis, instruction manuals, maps… wherever there are people communicating, you will find iconicity. What’s more, the ability to create and understand iconicity appears to be a distinctly human capability. There is scant evidence of iconic symbol formation in the communication of other animals. (Although there are some illuminating exceptions in language-trained and human-enculturated apes such as the gorilla Koko and the chimpanzee Viki.)

These facts regarding the ubiquity and the uniqueness of iconicity in human communication might seem at odds with the reigning theory that speech is an arbitrary code that is characteristically lacking in iconicity. That the sounds we produce do not resemble what they mean. How could it be that iconicity is such a defining feature of human expression, except when we speak? A common (and, as we will see, wrong) explanation for this is that the voice does not offer the potential for iconicity that exists in visual media such as gesture and drawing. The idea is that our voices might be able to connect form and meaning in limited instances – say, imitating an animal’s sound to refer to the animal – but that for wider iconic representations of our experience, our voices are mostly useless.

As it turns out, this idea had, until recently, never been put to rigorous scientific test. However, a series of studies by my collaborators and me show that, in fact, iconic vocalisations can be a powerful way for people to communicate when they lack a common language. This could help explain how the forms of spoken words were first devised.

Our investigation began with a contest in which we invited the contestants to record a set of vocal sounds to communicate 30 different meanings. These included an array of concepts that might have been relevant to the lives of our paleolithic ancestors: living entities such as ‘child’ and ‘deer’, objects like ‘knife’ and ‘fruit’, actions like ‘cook’ and ‘hide’, properties like ‘dull’ and ‘big’, as well as the quantifiers ‘one’ and ‘many’ and the demonstratives ‘this’ and ‘that’. The winner of the contest was determined by how well listeners could guess the intended meanings of the sounds based on a set of written options. Critically, the sounds that contestants submitted had to be non-linguistic – no actual words were allowed, a rule that also excluded conventional onomatopoeias. And although contestants might have imitated animal sounds in a few instances (growling for ‘tiger’ or hissing for ‘snake’), such direct vocal imitations were not obvious for most of the items.


An example of the sounds participants made to communicate ‘Big’


An example of the sounds participants made to communicate ‘Cut’


An example of the sounds participants made to communicate ‘Good’


Listeners were remarkably good at interpreting the meanings of the sounds – significantly better than chance for each of the meanings we tested. Yet, this study had a limitation: all of the contestants and listeners were speakers of English. Thus, it was possible that listeners’ success relied on some cultural knowledge that they shared with the vocalisers. The crucial test would require that we determine whether the vocalisations were also understandable to listeners from completely different cultural and linguistic backgrounds.

Just like in gesture and drawing, there is considerable potential for iconicity in vocalisation after all.

That was our next step. In a follow-up study, our international team of linguists and psychologists tested the vocalisations with listeners from around the world in two different comprehension experiments. The first involved an internet survey translated into 25 different languages. In this experiment, participants listened to each vocalisation from the English speakers and guessed the meaning, choosing from among six written words. Guessing accuracy for the different groups ranged from 74 per cent for English speakers to 52 per cent for Thai speakers – well above the chance rate of 17 per cent. In a second experiment, we tested the vocalisations with populations living in predominantly oral societies, including, for example, Portuguese speakers living in the Amazon forest of Brazil and Daakie speakers in a village in the South Pacific island country of Vanuatu. Participants responded by pointing at one of a set of 12 printed images. Here, the Portuguese speakers registered 34 per cent accuracy and Daakie speakers 43 per cent. Far from perfect, but well above the chance rate of 8 per cent. Thus, findings from both experiments showed that, no matter what language they spoke, participants were able to interpret the vocalisations with a notable degree of accuracy.



Remarkably, people are also able to use their voices to communicate about things that ostensibly have nothing to do with sound. In our online survey, we also tested the well-documented ‘bouba/kiki effect’. Participants listened to recordings of someone saying one of two made-up words, either ‘bouba’ or ‘kiki’, as they viewed two shapes side by side, one rounded and one pointy. After listening to each word, they selected which of the shapes they thought better matched the sound of the word. You can probably guess the results: the great majority of participants around the world matched ‘bouba’ with the rounded shape and ‘kiki’ with the pointy one. There is, apparently, a widely recognisable resemblance between the sounds of these words and the corresponding shapes.

Taken together, these studies show that, just like in gesture and drawing, there is considerable potential for iconicity in vocalisation after all. Modern words might look arbitrary through the lens of classic linguistic analysis, their origins obscured by many thousands of years of historical development. But dig far enough back in time and there is at least a possibility that the forms of many spoken words began – like the symbols of sign languages – as iconic representations of their meanings. (Indeed, we don’t need to look back into prehistory: linguists are documenting widespread evidence of iconicity in modern spoken languages, too.)

This route to the formation of new spoken words is still active today. Consider the recently coined English word cheugy – widely used by TikTokers – which, according to Urban Dictionary, means ‘The opposite of trendy’. Credited with creating the new word, Gaby Rasson explained to The New York Times: ‘It was a category that didn’t exist … There was a missing word that was on the edge of my tongue and nothing to describe it and “cheugy” came to me. How it sounded fit the meaning.’ It is possible that many thousands of years ago our ancestors did something very similar when they created the first spoken words.

Turning back to the question of what gave rise to our unique ability for linguistic communication: although the complete answer is undoubtedly complex, scientific findings suggest that our capacity for iconic communication has played a critical role. Whether it is with gesture, drawing or the sound of our voice, humans are virtuosos at the game of charades. Without this special talent, language – that vastly flexible symbolic system capable of expressing just about anything – would likely never have gotten off the ground.


Author of this article, published 8 December 2021, is Marcus Perlman,
a lecturer in English language and linguistics
at the University of Birmingham, UK