Featured
What the World’s Smallest Language Says About the Human Brain
What Shakespeare, a blowfish, and a 137-word language can teach us about how we perceive the world
In the hit television show The Office, there is an episode in which Kevin, whose endearing lack of complexity often leads to misunderstandings, comes up with an idea to save time throughout his day by reducing the amount of words he uses in his speech. Instead of saying “I’m going to take my lunch break,” he says “Me go eat now,” and so on.
Of course, this leads him into all sorts of hilarious misunderstandings. But while purely comical, the idea is an interesting one. Why do we have so many seemingly redundant words in our languages? Why is our speech so full of articles, prepositions, and the likes?
After all, while we English speakers are familiar with the articles “a,” “an,” and “the,” these are completely absent in a language like Russian. At the same time, a Portuguese speaker would perhaps find it strange that both “I eat” and “you eat” use the same verb conjugation in English, instead of requiring a different form to indicate a different subject. In fact, a language like Mandarin Chinese has no verb conjugations in the sense with which we are familiar. So if you want to say “I ate a meal,” you must say something like “I eat a meal yesterday,” so that through context alone the listener can deduce exactly when you did the eating.
This begs the question: if we took all of the simplest aspects across different languages — no articles, no conjugations, etc. — and attempted to condense them into one language, would it still be a fully functioning language? And going even a step further than that, what is the absolute minimum amount of words and grammatical complexity required for a language to be fully functional?
Measuring Vocabulary Sizes
Before we can even begin to study a theoretical minimum for human language, we must first measure the sizes of languages which already exist.
It turns out that this is far more complex than one might initially assume.
Imagine that we try to use the largest possible dictionaries in a language to gauge the size of a language’s vocabulary. If we did so, then the English language would contain 795,606 words, while Portuguese would hold around 818,000 words. The largest would unarguably be the Tamil language, whose largest dictionary contains a whopping 1,533,669 words ().
However, these “largest dictionaries” have been shown to be grossly overinflated. For instance, the largest English dictionary referenced above is an online Wiktionary, and while it is indeed the largest English dictionary, it is far from accurate. Users are allowed to create multiple entries for the same word, which causes the size to be dramatically increased due the fact that many of the entries on the site are duplicates.
So what if we use a more traditional, academic dictionary to measure the size of a language?
If we use the Oxford English Dictionary, for instance, then the English language would contain 171,146 words, along with 47,156 obsolete words (), while similar conservative scholarly estimates of the Portuguese language indicate that it would contain between 171,000 and 250,000 words ().
However, if we go even further, it seems that even these are somewhat inflated as well.
Technically, the amount of words which actually exist in each language may be in the ranges mentioned above — between two and three hundred thousand words — but in reality, a large percentage of these words are either highly specific technical words (like “quantization”) or outdated words found only in history and archaic literature (like “thee” and “thou”).
If a vast majority of the words in a formal dictionary fall into those two categories, then how many words do most of us know and use on a daily basis? In other words, how big is the average “mental dictionary” of a native speaker of a language?
Of course, the estimates here can vary wildly as well, depending on factors like overall education, literacy, and career field.
Linguists have attempted to answer this question for quite some time, and — despite lacking complete certainty — it seems that research in the area indicates that most literate speakers of a language have a passive vocabulary of anywhere from 12,000 to 50,000 words ().
By passive vocabulary, I mean that one may not actively use the word in speech or writing, but one can easily recognize it when it pops up. For instance, most of us don’t go around saying the word “orthodontist” every day, but we do immediately recognize it when we see it. In other words, it is a part of our passive, rather than active, vocabulary.
On the other hand, active vocabulary naturally tends to be somewhat smaller than its passive counterpart.
C. C. Cheng, emeritus professor of computational linguistics at the University of Illinois, has determined that the human brain has a maximum storage capacity of about 8,000 lexical items, specifically for active recall (). In other words, most ordinary people will not have an active vocabulary higher than 8,000 words, although their passive vocabulary is likely significantly higher than that.
However, this is certainly not a hard rule. William Shakespeare used a vocabulary of around 33,000 unique words throughout his writings, which indicates a nearly unfathomable comprehension of language and vocabulary.
So, ultimately, to make any claim about the actual vocabulary sizes of different languages is at best difficult and at worst nearly impossible. While we can make rough estimations, there are certainly no absolutes.
However, what can be said is this: nearly all languages which exist in the world today have massive vocabularies, which contain at the very least tens of thousands of words, if not hundreds of thousands.
Assuming that we have a rough size range estimate for the average size of lexicons across the globe, the next step is to ask the question: given the wide range in vocabulary sizes universally, what is the smallest possible size that a language can be?
Toki Pona
If you happen to Google search “What is the smallest language in the world?”, chances are you were immediately shown the artificial language known as “Toki Pona.”
The language is a constructed one, made in 2001 by Canadian linguist Sonja Lang. She originally set out to create a language based on minimalism and simplicity. Lang specifically based her design on the Sapir-Whorf hypothesis in linguistics, which posits that language actually alters how we perceive the world around us.
Based on this hypothesis, Lang argued that a simpler, easier-to-learn language should therefore cause the speaker of the language to feel happier and less bogged down by language in their everyday lives (without going into too much detail, this was also likely influenced by her Taoist philosophical perspective).
Thus, she created Toki Pona, which means “good language” or “simple language.”
The language has only 137 words, although it has been estimated that in recent years new words have been added by its online community of speakers, bringing the total number of words currently in use to around 260 words.
Lang managed to create such a small language by first and foremost eliminating all unnecessary grammatical forms from the language. There are no verb conjugations, no case system, and so on. Words maintain their form whether they are used as a verb, a noun, an adjective, or so on. Of course, this drastically reduces the size required by the language.
Second, she managed to reduce the lexicon size by creating one word for many meanings. For instance, the word for “hard” can also mean “stone,” “rock,” “metal,” “hard object,” “solid,” “firm,” “mineral,” “stiff,” or “inflexible.” As you can see, the biggest reason that Toki Pona was able to have such a small vocabulary is exactly because it reduces the specificity of words down to a core concept or feeling, which can then be specified into a more concrete meaning based on context and qualifiers.
Thanks to this simplification process and its grammatical simplicity, the language is certainly the smallest one in existence. And for the more skeptically inclined, let me assure you that it is, indeed, a real language. It is estimated that a few hundred people do indeed speak it fluently, and it even has its own writing system. In fact, there have even been a few artists who have written full-length songs in the language!
But, you may ask yourself, could you write a novel in the language? What about a scientific paper? A philosophical essay?
The answer seems to be not exactly.
Lang has, in fact, translated the novel The Wonderful Wizard of Oz into Toki Pona, in 2024. However, I want to point out here that it is not so much translated as it is abridged into the language. The original novel has a vocabulary exceeding 1,500 unique words, which is over three times larger than the entire vocabulary of the constructed language.
You see, while Toki Pona is an undeniably fascinating and unique look into ultra-minimalist languages, it does not seem that it could indefinitely retain its small vocabulary size, at least not if it became widely used. As I mentioned earlier, even a little more than 20 years after its creation, the language has already begun to grow in vocabulary size!
Even beyond this, the main issue with Toki Pona is in its inherent vagueness, which can even be seen in the name of the language. It can mean both “good language” and “simple language.”
Much like the comedic blunders and misunderstandings that Kevin ran into in The Office, it seems that languages like Toki Pona can and often do run into similar problems. It’s not to say that the language is somehow “fake” or “wrong,” it’s simply that it only works in certain contexts.
Specifically, a language like Toki Pona is completely doable for in-person conversations, where clarifying questions can be asked, and context can always be used to specify what a speaker wants to say.
But in a scientific paper, a medical report, or a full-length novel, it’s likely that the most simplistic types of languages would likely require a massive amount of extra material for the reader to simply understand otherwise vague words.
Even deeper than this, though, can we show that it is nearly impossible for a natural language to ever tend towards vocabulary reduction and minimal simplicity? I believe that the answer is a resounding yes.
The Blowfish Effect
It’s a well-documented fact that humans tend to categorize new information hierarchically and categorically.
But more specifically, when we learn a new word, we tend to automatically assume that it is the most specific word possible, rather than the most general.
For example, in the image above, there are some objects which are referred to as an “abomasum.” Which items would you assume them to be?
Most likely, you didn’t assume that all of the picture frames were an abomasum. Instead, you likely sought some specific type of picture frame: maybe based on a color, a strange shape, or some other very specific qualifier which you assume you simply haven’t seen before.
The reason for this is known in linguistics as the Blowfish Effect ().
If you show people — even young children — a collection of different fish, and tell them that certain of those fish are a random word (like the one I used above, which actually refers to the fourth stomach of a cow or similar animal and is completely unrelated to picture frames), they will most usually assume that the random word — which is new to them — is most closely associated with a very specific category of fish, rather than fish in general.
The reason for this, linguists and cognitive scientists believe, is that the human brain is hard-wired to be linguistically specific.
The world around us is complex and detailed, and — in order to accurately categorize and analyze it — we must also be detailed and highly specific.
Ultimately, this is why we see a constant and nearly unstoppable historical phenomenon where each generation in a language creates and uses new words, phrases, and so on.
It seems to be a human desire to be specific with our speech. To avoid ambiguity, relay our thoughts and feelings accurately, and to equally well understand the speech of others appears to be the ultimate goal of human language.
Final insights and key takeaways
While languages can and do vary dramatically in their vocabulary sizes, there does seem to be a natural limit on the amount of simplicity present in a language.
Human beings are linguistic creatures, with a natural propensity for detailed descriptions and complex storytelling. We inherently thrive on making sure that our listeners understand us clearly, and that the risk of confusion stays at a tolerable minimum.
So while many languages do often tend towards simpler linguistic features, like the relatively simple grammar found in English, the lack of articles found in Russian, and so on, this does not at all mean that we always move forever towards greater simplicity.
To the contrary, I believe that if anything, modern linguistics has shown us that languages tend to diversify and sprout new and more specific vocabulary with each generation, despite certain simplifications in grammar and structure over time.
While Toki Pona is a fascinating case study of linguistic minimalism, I do not believe that it — or a similar language — could ever take the place of a fully complex and rich language like English, Portuguese, or Tamil.
Language is inherently complex, and will likely be that way for as long as humans exist.