An edited version of this article was published by the Tech for Good Institute. It was written for a South East Asian audience using Singapore as an illustration of a universal point.
Artificial Intelligence (AI), as we have come to think of it since the launch of ChatGPT, is built on Large Language Models (LLMs). In other words, language is at the heart of AI, and yet we tend to give it scant attention. Unless we re-examine our assumptions about language, we are in danger of missing both the greatest good and the greatest harm that AI technology presents.
We tend to assume that language is simply about communication within a people group. Currently, we focus on two language issues with AI: translation (enabling dialogue between groups) and training on diverse corpora (meeting the needs of separate groups). Both of these are enormously valuable endeavours, but is that all language is?
One hundred and fifty years ago, the person who articulated the logic that underpins the technology that has given rise to AI-enabled translation declared: “That language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted.”[1] Before it is a means of communication, language is a means of thought.
In fact, language even shapes what we perceive before we start thinking about it. Perception is not just a matter of receiving data through our senses; it the product of our brain's processing of that data. “Complex mechanisms in the brain filter the incoming sensory information and shape the representation of the world in our minds … Perception is highly selective; the brain constantly decides what information is important enough to reach our consciousness.”[2]
Consider South East Asia’s popular blue pea flower. Malay speakers saw a distinctive “bunga biru” (blue flower) and gave it the name “telang.” English speakers recognised it as part of the "pea" family and distinguished it as a “blue pea.” Chinese speakers thought the delicate petals looked like a butterfly’s wings and called it 蝶豆 (dié dòu, "butterfly pea"). Tamil speakers thought the flower was shaped like a seashell and called it சங்குப்பூ (sangu-poo, "shell flower") or காக்கட்டான் (kakkanam, "mussel creeper"). Different languages reflect a diversity of perspectives on the same object.
Many streets in South East Asia are called “Jalan” something, which is Bahasa word for “street.” But “jalan” can also refer to the “community” around a thoroughfare (like "Jalan Basar" in Singapore). Alternatively, it can mean “to go”, “means” or “behaviour.” The words “go”, “street”, “community”, “means” and “behaviour” are semantically unrelated in English, so it is easy for us to miss the connection between them. By contrast, switching to Bahasa highlights the inseparability of what we do, where we do it, with whom we do it, how we do it, and how it affects others.
Expressing the same idea in a different language changes the emphasis, connections, and structure. We feel differently, we respond differently and we act differently as a result. We face many challenges in today's interdependent globalised world which require us to look at what is in front of us from multiple different perspectives. Shifting between languages allows us to do that. And AI enables us make that shift, even with languages we cannot speak.
The world is concerned about AI bias, yet no-one is addressing the biggest bias of all: that most AI is built on just two languages – English and Mandarin. That creates a narrow perspective on any context to which we apply the AI technology. To avoid becoming inbred clones of one another, limited by the tech giants in either the US or China, we need to embrace linguistic diversity. AI has the potential to help us with that, but only if we are intentional about it.
Singapore is the only country in the world with multiple national languages, each from a different language family. The nation is already investing in multilingual AI projects like SEA-LION (South East Asian Languages In One Network). Now it is time for we 'ordinary' (non-technical) people need to rethink our understanding of language.
It begins with breaking down the “them” and “us” barrier. We are accustomed to thinking about Malay, Chinese and Tamil as “heritage” languages, spoken by the ancestors of different ethnic groups within the Singapore population and therefore belonging to those groups. Today, however, few Singaporeans think of "home" as the place their grandparents came from. The question is not so much “where did they come from” but rather “where do we come from?”
For those who answer “from Singapore,” the practical concerns are: “what do we have as Singaporeans?” and “what are we going to do with it?” Heritage is thus less about looking back to where we came from separately and more about looking forward to where we are going together. In that sense, Mandarin is as much a part of the Indian and Malay Singaporean’s heritage as it is the Chinese. It is no longer “their” language but one of “ours.”
If I asked you to draw a mountain – like Fuji or the Matterhorn – you would almost certainly draw a triangular shape. This is the most obvious characteristic of those mountains to the rest of the world, but it is also the one thing which anyone actually on those mountains cannot see.
This paradox was beautifully and succinctly captured 1000 years ago by 苏轼 (Sū Shì, 1037-1101) in a poem called 题西林壁 (tí xī lín bì, “Inscription [on the] Wall [of the Temple of the] Western Woods”). The poet describes his experience of hiking up Mount Lu and noticing how the shape of the mountain continually changed depending on where he was on the path. The kicker in the final line says that the only time he really could not see what the mountain looked like at all was when he was on top of it. The irony is that we need the eyes of others – of the “outsider” – to see ourselves clearly.
Having four national languages from four different language families is thus a treasure for Singapore both to safeguard and to share.[3] Those languages provide four different perspectives and encourage four different thought processes. This is key both to shaping AI for the good and to avoiding the threat it poses, not just for itself but for the ASEAN region and the wider world. It provides a model for all to learn from.
[1] Boole, G. (1854). An Investigation of the Laws of Thought on Which are Founded the Mathematical Theories of Logic and Probabilities. Project Gutenberg. http://www.gutenberg.org/ebooks/151
[2] Dwarakanath, A. & Panagiotaropoulos, T. (2023, April 23). How the brain decides what we perceive. Max-Planck Gesellschaft. Retrieved from https://www.mpg.de/20170692/how-the-brain-decides-what-we-perceive
[3] This angle is being celebrated and developed by Linguafour at https://linguafour.beehiiv.com