The Road to Asian Algorithmic Autonomy: The Rise of the SEA-LION AI Project and Sovereign AI Systems

Mar 22
6 min read

Introduction

As of 2026, AI has evolved from being a persistent buzzword to being an extensive tool employed by hundreds of millions daily. Whether in media, research, or logistics, its implications have been nothing short of ubiquitous. However, while utilized by diverse patrons of the globe, each aligning with different geographic regions, cultural values, and linguistic abilities, it has become apparent that mainstream AI models are limited in reflecting global diversity. Thus, this tool, utilized for an assortment of personalized needs, prompts a question: can it be adopted for each foreign audience?

Historically, Artificial Intelligence systems’ history has been rooted in American soil. It has long been recognized that early developments began at Dartmouth College under the instruction of mathematics professor and computer scientist John McCarthy, and the following decades featured countless experiments and projects, such as the ELIZA chatbot at MIT in the 1960s to IBM’s Deep Blue machine in the 1990s. Nevertheless, in recent years, an instrumental turning point has been the 2022 release of the ChatGPT LLM (Large Language Model) by San Francisco- based technology company OpenAI, soon gaining traction worldwide alongside subsequent American-based models such as Gemini, Llama, Grok, and Claude.

In the early years of worldwide development, the United States continued to serve as a notable power, having forty models by 2024 in comparison to China’s fifteen and Europe’s three, but in recent years, the emphasis to empower greater regional models has been prevalent. On December 4th, 2023, an announcement was made at the Singapore Conference of AI (SCAI). The news would echo through the walls of the National Gallery that evening, commencing the developments of two days of keynote speeches and collaborative workshops among the AI researchers. Singapore’s Deputy Prime Minister, Lawrence Wong, announced the creation of SEA-LION.

SEA-LION, otherwise known as the Southeast Asian Languages In One Network project, was initiated to combat Western bias in modern AI systems, such as Llama, where 0.5% of data training had included Southeast Asian languages. On its formal announcement date that December, the project received 70 million Singapore dollars—equivalent to roughly 50 million USD—from the National Research Foundation in hopes of reducing the region’s reliance on tools unaligned with regional values.

To ensure this pivotal success, its organizers, consisting of institutes partnered with AI Singapore, have reformed a few characteristics of traditional AI models, revolving around linguistic diversity and cultural understanding.

SEA-LION’s Pursuit of Durable Sovereign Software Systems

In 2026, Southeast Asia boasts a population of roughly 700 million, otherwise known as 8.5% of the world’s population, and in the coming years, the region aims to reform itself towards current technological standards. These beliefs were emphasized by Wong, who stated that the region could not let “uncertainty paralyze [it]” in the age of new technology at the 2026 Singapore Budget address.

Given SEA-LION’s focus on the Southeast Asian region, its researchers had decided to dedicate their LLMs’ datasets from regional languages, which include Malay, Indonesian, Filipino, Vietnamese, Thai, Lao, Tamil, Khmer, and Chinese in addition to English. Following publications under MIT licenses, AI Singapore was able to successfully utilize the datasets to create two LLMs, Llama-SEA-LION-8BIT, a Meta-based LLM, and Gemma-SEA-LION-9B-IT, a Google-based LLM.

In general, these models served to meet specific goals, such as having greater accessibility to those of diverse linguistic backgroundsand have a greater understanding of the linguistic nuances of their respective cultures that traditional Western-based AI systems would fail to obtain. For instance, a western AI model may fail to assess eastern honorifics and hierarchy systems, such as Khun, Pi, and Nong used in Thai, in its responses revolving a user’s input scenarios or misinterpret regional cultural terms in their respective contexts outside of literal textbook definitions.

These suspicions have been supported by the findings of a group of researchers from the University of Washington, University of British Columbia, and Stanford University, who, when assessing frontier LLMs’ cultural understanding, utilized CulturalBench, a catalog of human-written cultural questions. Consequently, some of the researchers’ results revealed that some models had accuracy scores as low as 21.8%, illustrating the gap between dominant AI systems and individual cultural climates. This fallacy between branding and technical reality was illustrated with India’s sovereign AI initiative with Krutrim AI.

Despite being branded as a native-centered platform for Indian users, the platform had received criticism as when asked about its creation, the model affirmed that it was created through OpenAI, the American-based institution. Furthermore, when the model was placed to complete a portion of the nation’s civil service exam, the UPSC, consisting of questions revolving around Indian history, economics, public affairs, and geography, it had only received 41 marks in comparison to the passing score of 75.41, highlighting the need for regional models to maintain contextual understanding to back their efficiency and success.

To further supplement the LLMs’ regional emphasis while mitigating reasoning errors, the SEA-LION algorithms were trained upon a two-hundred billion tokens revolving around cultural cues and values. These sources of information were facilitated under SEA-HELM, known as Southeast Asian Holistic Evaluation of Language Models. To continue such development, those behind the SEA-LION project have recently released SEA-Guard in October 2025, essentially serving as a content filter in the SEA-LION LLMs to reduce general toxic language and sexually explicit material in addition to culturally sensitive topics and taboos, particularly in Southeast Asian nations that could be overlooked by traditional AI systems.

At last, since SEA-LION’s initial announcement at the Singapore Conference of AI in 2023, its LLMs have been recently employed by Southeast Asian institutions in hopes of promoting localized AI services that reflect the continent’s diverse linguistic and cultural landscape. As of present findings, one of the most notable adoptions is by the GoTo Group, the largest technology company in Indonesia. Specifically, GoTo Group adopted SEA-LION LLMs to supplement the Sahabat-AI system used in their AI voice assistant product and support users who speak Indonesian, Javanese, Sundanese, and English. Additionally, outside of explicit adoption, the SEA-LION project has gained the attention of worldwide technology institutes that aim to integrate regional intelligence with global support, such as IBM, Google, and Alibaba Cloud.

However, outside of the South China Sea lie more regions that are attempting to stem from the dominance of Western-powered AI systems. For instance, Saudi Arabia’s Crown Prince Mohammad bin Salman announced the HUMAIN project as a part of the country’s Vision 2030, aiming to establish one of the world’s largest AI companies. Since its release in the spring of 2025, the project has been supplemented by the support of global institutions, including Amazon Web Services (AWS) and NVIDIA. Additionally, it has reached a recent 1.2 billion USD agreement with the country’s development fund, the National Infrastructure Fund, in January 2026, securing global and domestic support to ensure its future widespread development.

While the Gulf’s initiative has some varying goals from those in Singapore, particularly with Saudi Arabia’s primary motive revolving around increasing technological capability within the Public Investment Fund (PIF), a part of Vision 2030, there remain some similarities. In August 2025, HUMAIN released ALLam 34B, noted to be the first Arabic-centered LLM, and was created with the intention to recognize the diverse dialects of the Arabic language and cultural insights as well.

However, unlike SEA-LION’s initial technical emphasis on tokenization, the process by which AI can comprehend human language through the breaking of phrases and words into segmented pieces, of solely Southeast Asian languages, establishes more than 256,000 unique tokens for its LLMs. Alternatively, ALLam technicians developed a methodology that emphasizes second-language acquisition, gradually expanding vocabulary to limit the models’ out-of-vocabulary range. Ultimately, despite varying methodologies and broad goals, HUMAIN and SEA-LION both share the common goal of breaching from Western AI dependency: opening new channels of innovation and collaboration across Asian markets.

To conclude, the AI race will cease to descalate in the near future, given increasing geopolitical competition and the desire for enhanced production capabilities across sectors. However, aside from methods in which researchers will evolve their research upon its contextual thinking and responses, the geography of its development will define future leaders in technology leadership. At last, it will be the role of governing bodies worldwide to investigate their capabilities of establishing sovereign systems that reflect their own regional needs across disciplines, but of equal importance lies their technical accuracy and durability to the current dominating models.

Bibliography

"Introducing SEA-Guard: A Specialized Safety Model for AI in Southeast Asia." SEA-LION.AI, 17 Oct. 2025, sea-lion.ai/blog/sea-guard-safety-model/.

Maslej, Nestor, et al. "Artificial Intelligence Index Report 2024." AI Index Steering Committee, Institute for Human-Centered AI, Stanford University, Apr. 2024, hai.stanford.edu/ai-index/2024-ai-index-report.

Maslej, Nestor, et al. "Artificial Intelligence Index Report 2025." AI Index Steering Committee, Institute for Human-Centered AI, Stanford University, Apr. 2025, https://doi.org/10.48550/ARXIV.2504.07139.

Ng, Raymond, et al. "SEA-LION: Southeast Asian Languages in One Network." ArcXiv, 2025, https://doi.org/10.48550/ARXIV.2504.05747.

Noor, Elina, and Binya Kanitroj. "Speaking in Code: Contextualizing Large Language Models in Southeast Asia." Carnegie Endowment for International Peace, 6 Jan. 2025, carnegieendowment.org/research/2025/01/speaking-in-code-contextualizing-large-language-models-in-southeast-asia.

Pandey, Mohit. "Krutrim Fails UPSC Exam." Analytics India Magazine, 23 Dec. 2025, analyticsindiamag.com/ai-features/krutrim-fails-upsc-exam.

"Sahabat AI." SEA-LION.AI, 20 Nov. 2025, sea-lion.ai/case-study/sahabat-ai/.

"Tech Company Humain to Launch Allam, First Saudi-developed Arabic AI Model." Arab News, 13 Aug. 2025, www.arabnews.com/node/2611747/saudi-arabia.

"Why Ola's Krutrim is Showing OpenAI as its Creator." Analytics of India Magazine, 23 Dec. 2025, analyticsindiamag.com/ai-news-updates/why-olas-krutrim-is-showing-openai-as-its-creator/.

Zhu, Jianqing, et al. "Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion." ArcXiv, 2024, https://doi.org/10.48550/ARXIV.2412.12310.