Papers
Paper to cite:
If you use the material of our tutorial, please cite the following paper:
Firoj Alam, Shammur Absar Chowdhury, Sabri Boughorbel, and Maram Hasanain. “{LLM}s for Low Resource Languages in Multilingual, Multimodal and Dialectal Settings.” In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts, edited by Mohsen Mesgar and Sharid Loáiciga, 27–33. St. Julian’s, Malta, Association for Computational Linguistics, 2024. https://aclanthology.org/2024.eacl-tutorials.5.
@inproceedings{alam-etal-2024-llms,
title = "{LLM}s for Low Resource Languages in Multilingual, Multimodal and Dialectal Settings",
author = "Alam, Firoj and
Chowdhury, Shammur Absar and
Boughorbel, Sabri and
Hasanain, Maram",
editor = "Mesgar, Mohsen and
Lo{\'a}iciga, Sharid",
booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts",
month = mar,
year = "2024",
address = "St. Julian{'}s, Malta",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.eacl-tutorials.5",
pages = "27--33",
abstract = "The recent breakthroughs in Artificial Intelligence (AI) can be attributed to the remarkable performance of Large Language Models (LLMs) across a spectrum of research areas (e.g., machine translation, question-answering, automatic speech recognition, text-to-speech generation) and application domains (e.g., business, law, healthcare, education, and psychology). The success of these LLMs largely de- pends on specific training techniques, most notably instruction tuning, RLHF, and subsequent prompting to achieve the desired output. As the development of such LLMs continues to increase in both closed and open settings, evaluation has become crucial for understanding their generalization capabilities across different tasks, modalities, languages, and dialects. This evaluation process is tightly coupled with prompting, which plays a key role in obtain- ing better outputs. There has been attempts to evaluate such models focusing on diverse tasks, languages, and dialects, which suggests that the capabilities of LLMs are still limited to medium-to-low-resource languages due to the lack of representative datasets. The tutorial offers an overview of this emerging research area. We explore the capabilities of LLMs in terms of their performance, zero- and few-shot settings, fine-tuning, instructions tuning, and close vs. open models with a special emphasis on low-resource settings. In addition to LLMs for standard NLP tasks, we will focus on speech and multimodality.",
}
Papers to read:
The tutorial included the following papers. They are not exhaustive and of course we could not cover all of them.
Introduction
-
Joshi, Pratik, Sebastin Santy, Amar Budhiraja, Kalika Bali, and Monojit Choudhury. “The State and Fate of Linguistic Diversity and Inclusion in the NLP World.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6282-6293. 2020. https://aclanthology.org/2020.acl-main.560/
-
Zhao, Wayne Xin, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min et al. “A survey of large language models.” arXiv preprint arXiv:2303.18223 (2023). https://arxiv.org/abs/2303.18223
-
Bubeck, Sébastien, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee et al. “Sparks of Artificial General Intelligence: Early experiments with GPT-4.” arXiv e-prints (2023): arXiv-2303. https://arxiv.org/abs/2303.12712
-
Liang, Percy, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang et al. “Holistic evaluation of language models.” arXiv preprint arXiv:2211.09110 (2022). https://arxiv.org/abs/2211.09110
-
Abdelali, Ahmed, Hamdy Mubarak, Shammur Chowdhury, Maram Hasanain, Basel Mousi, Sabri Boughorbel, Samir Abdaljalil, Yassine El Kheir, Daniel Izham, Fahim Dalvi, Majd Hawasly, Nizi Nazar, Youssef Elshahawy, Ahmed Ali, Nadir Durrani, Natasa Milic-Frayling, and Firoj Alam. “LAraBench: Benchmarking Arabic AI with Large Language Models.” In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), edited by Yvette Graham and Matthew Purver, 487–520. St. Julian’s, Malta: Association for Computational Linguistics, 2024. https://aclanthology.org/2024.eacl-long.30.
-
Ahuja, Kabir, Harshita Diddee, Rishav Hada, Millicent Ochieng, Krithika Ramesh, Prachi Jain, Akshay Nambi, Tanuja Ganu, Sameer Segal, Mohamed Ahmed, Kalika Bali, and Sunayana Sitaram. “{MEGA}: Multilingual Evaluation of Generative {AI}.” In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, edited by Houda Bouamor, Juan Pino, and Kalika Bali, 4232–4267. Singapore: Association for Computational Linguistics, 2023. https://aclanthology.org/2023.emnlp-main.258. doi:10.18653/v1/2023.emnlp-main.258.
-
Bang, Yejin, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, and Pascale Fung. “A Multitask, Multilingual, Multimodal Evaluation of {C}hat{GPT} on Reasoning, Hallucination, and Interactivity.” In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), edited by Jong C. Park, Yuki Arase, Baotian Hu, Wei Lu, Derry Wijaya, Ayu Purwarianti, and Adila Alfa Krisnadhi, 675–718. Nusa Dua, Bali: Association for Computational Linguistics, 2023. https://aclanthology.org/2023.ijcnlp-main.45.
-
Lai, Viet, Nghia Ngo, Amir Pouran Ben Veyseh, Hiếu Mẫn, Franck Dernoncourt, Trung Bui, and Thien Nguyen. “ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning.” In Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 13171-13189. 2023. https://aclanthology.org/2023.findings-emnlp.878.pdf
Models and their capabilities for low-resource languages
-
Wei, Xiangpeng, Haoran Wei, Huan Lin, Tianhao Li, Pei Zhang, Xingzhang Ren, Mei Li et al. “Polylm: An open source polyglot large language model.” arXiv preprint arXiv:2307.06018 (2023).
-
Nguyen, Xuan-Phi, Wenxuan Zhang, Xin Li, Mahani Aljunied, Qingyu Tan, Liying Cheng, Guanzheng Chen et al. “SeaLLMs–Large Language Models for Southeast Asia.” arXiv preprint arXiv:2312.00738 (2023).
-
Üstün, Ahmet, Viraat Aryabumi, Zheng-Xin Yong, Wei-Yin Ko, Daniel D’souza, Gbemileke Onilude, Neel Bhandari et al. “Aya model: An instruction finetuned open-access multilingual language model.” arXiv preprint arXiv:2402.07827 (2024).
-
Abadji, Julien, Pedro Ortiz Suarez, Laurent Romary, and Benoît Sagot. “Towards a Cleaner Document-Oriented Multilingual Crawled Corpus.” In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 4344-4355. 2022.
-
Xue, Linting, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and Colin Raffel. “mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer.” In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 483-498. 2021.
-
Nguyen, Thuat, Chien Van Nguyen, Viet Dac Lai, Hieu Man, Nghia Trung Ngo, Franck Dernoncourt, Ryan A. Rossi, and Thien Huu Nguyen. “Culturax: A cleaned, enormous, and multilingual dataset for large language models in 167 languages.” arXiv preprint arXiv:2309.09400 (2023).
-
Soldaini, Luca, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin et al. “Dolma: An Open Corpus of Three Trillion Tokens for Language Model Pretraining Research.” arXiv preprint arXiv:2402.00159 (2024).
-
Kudugunta, Sneha, Isaac Caswell, Biao Zhang, Xavier Garcia, Derrick Xin, Aditya Kusupati, Romi Stella, Ankur Bapna, and Orhan Firat. “Madlad-400: A multilingual and document-level large audited dataset.” Advances in Neural Information Processing Systems 36 (2024).
-
Zhang, Duzhen, Yahan Yu, Chenxing Li, Jiahua Dong, Dan Su, Chenhui Chu, and Dong Yu. “Mm-llms: Recent advances in multimodal large language models.” arXiv preprint arXiv:2401.13601 (2024).
-
Team, Gemini, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut et al. “Gemini: a family of highly capable multimodal models.” arXiv preprint arXiv:2312.11805 (2023).
-
Yang, Zhengyuan, Linjie Li, Kevin Lin, Jianfeng Wang, Chung-Ching Lin, Zicheng Liu, and Lijuan Wang. “The Dawn of LMMs: Preliminary Explorations with GPT-4V (ision).” arXiv preprint arXiv:2309.17421 (2023).
-
McKinzie, Brandon, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah et al. “MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training.” arXiv preprint arXiv:2403.09611 (2024).
-
Wu, Shengqiong, Hao Fei, Leigang Qu, Wei Ji, and Tat-Seng Chua. “Next-gpt: Any-to-any multimodal llm.” arXiv preprint arXiv:2309.05519 (2023).
-
Zhan, Jun, Junqi Dai, Jiasheng Ye, Yunhua Zhou, Dong Zhang, Zhigeng Liu, Xin Zhang et al. “AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling.” arXiv preprint arXiv:2402.12226 (2024).
-
Zhang, Dong, Shimin Li, Xin Zhang, Jun Zhan, Pengyu Wang, Yaqian Zhou, and Xipeng Qiu. “Speechgpt: Empowering large language models with intrinsic cross-modal conversational abilities.” arXiv preprint arXiv:2305.11000 (2023).
-
Zhang, Xin, Dong Zhang, Shimin Li, Yaqian Zhou, and Xipeng Qiu. “SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models.” https://openreview.net/pdf?id=AF9Q8Vip84
-
Défossez, Alexandre, Jade Copet, Gabriel Synnaeve, and Yossi Adi. “High Fidelity Neural Audio Compression.” Transactions on Machine Learning Research (2023).
-
Shi, Jiatong, Dan Berrebbi, William Chen, Ho-Lam Chung, En-Pei Hu, Wei Ping Huang, Xuankai Chang et al. “ML-SUPERB: Multilingual Speech Universal PERformance Benchmark.”
-
Radford, Alec, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. “Robust speech recognition via large-scale weak supervision.” In International Conference on Machine Learning, pp. 28492-28518. PMLR, 2023.
-
Zhang, Yu, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen et al. “Google usm: Scaling automatic speech recognition beyond 100 languages.” arXiv preprint arXiv:2303.01037 (2023).
Prompting + Benchmarking tools
-
Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V. Le, and Denny Zhou. “Chain-of-thought prompting elicits reasoning in large language models.” Advances in neural information processing systems 35 (2022): 24824-24837.
-
Kojima, Takeshi, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. “Large language models are zero-shot reasoners.” Advances in neural information processing systems 35 (2022): 22199-22213.
-
Yao, Shunyu, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. “Tree of thoughts: Deliberate problem solving with large language models.” Advances in Neural Information Processing Systems 36 (2024).
-
Besta, Maciej, Nils Blach, Ales Kubicek, Robert Gerstenberger, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann et al. “Graph of thoughts: Solving elaborate problems with large language models.” arXiv preprint arXiv:2308.09687 (2023).
-
Huang, Haoyang, Tianyi Tang, Dongdong Zhang, Wayne Xin Zhao, Ting Song, Yan Xia, and Furu Wei. “Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting.” In Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 12365-12394. 2023.
-
Qin, Libo, Qiguang Chen, Fuxuan Wei, Shijue Huang, and Wanxiang Che. “Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages.” In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 2695-2709. 2023.
-
Ranaldi, Leonardo, and Fabio Massimo Zanzotto. “Empowering multi-step reasoning across languages via tree-of-thoughts.” arXiv preprint arXiv:2311.08097 (2023).
-
Madaan, Aman, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon et al. “Self-refine: Iterative refinement with self-feedback.” Advances in Neural Information Processing Systems 36 (2024).
-
Brown, Tom, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan et al. “Language models are few-shot learners.” Advances in neural information processing systems 33 (2020): 1877-1901.
-
Gao, Yunfan, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, and Haofen Wang. “Retrieval-augmented generation for large language models: A survey.” arXiv preprint arXiv:2312.10997 (2023).
-
Guo, Jiaxian, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, Dacheng Tao, and Steven Hoi. “From images to textual prompts: Zero-shot visual question answering with frozen large language models.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10867-10877. 2023.
Other Related Aspects (Cultural Bias, Misinformation, Hellucination, Jailbreaking, …)
-
Bach, Stephen H., Victor Sanh, Zheng-Xin Yong, Albert Webson, Colin Raffel, Nihal V. Nayak, Abheesht Sharma et al. “Promptsource: An integrated development environment and repository for natural language prompts.” arXiv preprint arXiv:2202.01279 (2022).
-
Dalvi, Fahim, Maram Hasanain, Sabri Boughorbel, Basel Mousi, Samir Abdaljalil, Nizi Nazar, Ahmed Abdelali, Shammur Absar Chowdhury, Hamdy Mubarak, and Ahmed Ali. “{LLM}e{B}ench: A Flexible Framework for Accelerating {LLM}s Benchmarking.” In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, edited by Nikolaos Aletras and Orphee De Clercq, 214–222. St. Julians, Malta: Association for Computational Linguistics, 2024. https://aclanthology.org/2024.eacl-demo.23.
-
Wu, Zhenyu, Yaoxiang Wang, Jiacheng Ye, Zhiyong Wu, Jiangtao Feng, Jingjing Xu, and Yu Qiao. “OpenICL: An Open-Source Framework for In-context Learning.” In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pp. 489-498. 2023.
-
Gao, Leo, Jonathan Tow, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding et al. “A framework for few-shot language model evaluation.” Version v0. 0.1. Sept (2021): 8.
-
Zhu, Kaijie, Qinlin Zhao, Hao Chen, Jindong Wang, and Xing Xie. “Promptbench: A unified library for evaluation of large language models.” arXiv preprint arXiv:2312.07910 (2023).
-
Zheng, Lianmin, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin et al. “Judging llm-as-a-judge with mt-bench and chatbot arena.” Advances in Neural Information Processing Systems 36 (2024).
-
Naous, Tarek, Michael J. Ryan, and Wei Xu. “Having beer after prayer? measuring cultural bias in large language models.” arXiv preprint arXiv:2305.14456 (2023).
-
AlKhamissi, Badr, Muhammad ElNokrashy, Mai AlKhamissi, and Mona Diab. “Investigating Cultural Alignment of Large Language Models.” arXiv preprint arXiv:2402.13231 (2024).
-
Chen, Canyu, and Kai Shu. “Combating misinformation in the age of llms: Opportunities and challenges.” arXiv preprint arXiv:2311.05656 (2023).
-
Chen, Canyu, and Kai Shu. “Combating misinformation in the age of llms: Opportunities and challenges.” arXiv preprint arXiv:2311.05656 (2023).
-
Chen, Canyu, and Kai Shu. “Can LLM-Generated Misinformation Be Detected?.” In The Twelfth International Conference on Learning Representations. 2023.
-
Rawte, Vipula, Amit Sheth, and Amitava Das. “A survey of hallucination in large foundation models.” arXiv preprint arXiv:2309.05922 (2023).
-
Li, Junyi, Xiaoxue Cheng, Wayne Xin Zhao, Jian-Yun Nie, and Ji-Rong Wen. “HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models.” In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 6449-6464. 2023.
-
Li, Yifan, Yifan Du, Kun Zhou, Jinpeng Wang, Wayne Xin Zhao, and Ji-Rong Wen. “Evaluating Object Hallucination in Large Vision-Language Models.” In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 292-305. 2023.
-
Huang, Lei, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen et al. “A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.” arXiv preprint arXiv:2311.05232 (2023).
-
Kang, Haoqiang, Terra Blevins, and Luke Zettlemoyer. “Comparing hallucination detection metrics for multilingual generation.” arXiv preprint arXiv:2402.10496 (2024).
-
Yong, Zheng Xin, Cristina Menghini, and Stephen Bach. “Low-Resource Languages Jailbreak GPT-4.” In Socially Responsible Language Modelling Research. 2023.
-
Luccioni, Alexandra Sasha, Sylvain Viguier, and Anne-Laure Ligozat. “Estimating the carbon footprint of bloom, a 176b parameter language model.” Journal of Machine Learning Research 24, no. 253 (2023): 1-15.