SEA-LION: Multi-language model tailored for Southeast Asia.
SEA-LION models pretrained and tuned for SEA region.
Llama3 8B CPT SEA-LIONv2 Instruct fine-tuned with 150,000 instruction-completion pairs.
Developed by AI Singapore, funded by Singapore NRF.
Languages: English, Indonesian, Thai, Vietnamese, Tamil.
Evaluated on BHASA, IFEval, and MT-Bench benchmarks.
Models exhibit hallucinations and irrelevant content generation.
Not aligned for safety; users need to ensure safety measures.
Fine-tuned using 8x A100-40GB with LoRA.
Datasets verified for high quality and commercial permissiveness.
Call for contributions from researchers and developers.
Acknowledged support from National Research Foundation and NUS.
huggingface.co