To address the above issues, we propose a generalizable framework, GreenPLM, to translate monolingual PLMs to new languages at almost no additional cost. Self-supervised pre-training exaggerates the disparity between high- and low-resource languages, making the development of modern NLP particularly difficult to be equally shared among all language speakers. In addition, inequality is also caused by many social and financial factors, such as internet infrastructure, education, and computational resources. English has 45 TB of monolingual material when training GPT-3, equivalent to 4,608 times the Hindi BERT pre-training corpus. Table 1 lists the amount of training materials for monolingual BERT for several languages, which shows a significant inequality in monolingual materials. This leaves only a tiny fraction of the approximately 7000 languages ebrahimi-kann-2021-adapt worldwide with their monolingual PLMs. Languages with a more significant number of speakers have a larger corpus available for PLMs training. Figure 1: Trends of Complexity and Training Cost in Mainstream PLMs with Significant Performance Boostsįor equal language research opportunities, the limited amount of monolingual texts impedes the usage of large PLMs in low-resource languages. Since carbon neutrality is a crucial and urgent goal in sustainable development sdg30, this trend strongly suggests the need for green and sustainable NLP research. For example, pre-training a BERT-base model needs $3,751 to $12,571 in cloud computing, which is unaffordable for personal use and most researchers in developing countries. Figure 1 shows a comparison of PLMs from BERT-base peters-etal-2018-deep to PaLM (mainstreaming models with significant performance boosts) in terms of model sizes, computational costs, economic costs, and environmental costs, from which we observe ever-increasing trends strubell-etal-2019-energy and calculated CO 2 emissions and financial requirements to pre-train popular PLMs. ![]() ![]() This approach can be easily implemented, and we will release language models in 50 languages translated from English soon.ġ] \orgdivDepartment of Big Data in Health Science School of Public Health and the Second Affiliated Hospital, \orgnameZhejiang University, \orgaddress \cityHangzhou, \postcode310058, \stateZhejiang, \countr圜hinaĢ] \orgnameDepartment of Linguistics, Northwestern University, \orgaddress \cityEvanston, \postcode60208, \stateIL, \countryUSAģ] \orgdivData Science and Analytics Thrust, \orgnameHong Kong University of Science and Technology (Guangzhou), \orgaddress \cityGuangzhou, \postcode511400, \stateGuangdong, \countr圜hinaĤ] \orgdivSchool of Electronic and Computer Engineering, \orgnamePeking University, \orgaddress \cityShenzhen, \postcode518055, \stateGuangdong, \countr圜hinaĥ] \orgnameDepartment of Biomedical Informatics, Harvard Medical School, \orgaddress \cityBoston, \postcode02115, \countryUSAĦ] \orgnameLanguage Technology Lab, University of Cambridge, \orgaddress \cit圜ambridge, \postcodeCB39DA, \countryUKħ] \orgnameThe Key Laboratory of Intelligent Preventive Medicine of Zhejiang Province, \orgaddress \cityHangzhou, \postcode310058, \stateZhejiang, \countr圜hina 1 IntroductionĪccording to estimates, computation costs of deep learning architectures expanded 300,000 times from 2012 to 2018, going against green AI principles. In addition, when given a low computational cost (2.5%), the framework outperforms the original monolingual language models in six out of seven tested languages. We validate this approach in 18 languages and show that this framework is comparable to, if not better than, other heuristics trained with high cost. To promote equal opportunities for all language speakers in NLP research and to reduce energy consumption for sustainability, this study proposes an effective and energy-efficient framework GreenPLM that uses bilingual lexicons to directly translate language models of one language into other languages at (almost) no additional cost. While large pre-trained models have transformed the field of natural language processing (NLP), the high training cost and low cross-lingual availability of such models prevent the new advances from being equally shared by users across all languages, especially the less spoken ones.
0 Comments
Leave a Reply. |