LLaMA：开放高效的基础语言模型 — 全文

LLaMA: Open and Efficient Foundation Language Models

LLaMA：开放高效的基础语言模型

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix…

📅 2023-02-24 | 📄 arXiv: 2302.13971

翻译完成度 7 / 52 段 (13%)

摘要 / Abstract

LLaMA（Large Language Model Meta AI）是Meta开发的开源基础语言模型系列，涵盖7B到65B参数规模。LLaMA在较小数据集上训练，但通过高效的训练策略实现了与更大模型相当的性能。7B参数的LLaMA在多项基准测试上超越GPT-3（175B参数），展现了参数效率的重要突破。

📑 论文结构

Paper Content

LLaMA：开放且高效的基础语言模型 Hugo Touvron∗, Thibaut Lavril∗, Gautier Izacard∗, Xavier Martinet Marie-Anne Lachaux, Timothee Lacroix, Baptiste Rozière, Naman Goyal Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin Edouard Grave∗, Guillaume Lample∗ Meta AI 摘要性能，训练时间更长的较小模型在推理时最终成本会更低。例如，我们介绍了 LLaMA，这是一系列参数量从 7B 到 65B 的基础语言模型。我们在数万亿个 token 上训练了我们的模型，并证明训练一个 10B 模型使用 200B token，我们发现 arXiv:2302.13971v1 [cs.CL] 27 Feb 2023 仅使用公开可用的数据集，而不依赖专有且无法访问的数据集， 7B 模型的性能即使在 1T token 之后仍在持续就能训练出最先进的模型。特别是，LLaMA-13B 在大多数基准测试中优于 GPT-3 提升。（175B），而 LLaMA-65B 与最佳模型 Chinchilla-70B 和 PaLM-540B 具有竞争力。我们将所有模型向研究社区开放1。 1 引言本工作的重点是训练一系列语言模型，通过在比通常使用更多的 token 上进行训练，在各种推理预算下实现尽可能最佳的性能。在海量文本语料库上训练的大型语言模型（LLMs）已展现出其执行...的能力。所得模型称为 LLaMA，参数量从 7B 到 65B，与现有最佳 LLM 相比具有竞争力。例如， LLaMA-13B 在大多数基准测试中优于 GPT-3，尽管其规模小了 10 倍。我们相信该模型将有助于普及访问和

📄 点击展开原文

Paper Content

📝 暂未翻译 — run on a single GPU. few examples (Brown et al., 2020). These few-shot At the higher-end of the scale, our 65B-parameter properties first appeared when scaling models to a model is also competitive with the best large lan- sufficient size (Kaplan et al., 2020), resulting in

Paper Content

📝 暂未翻译 — rt the performance of fastest at inference, and although it may be cheaper our models and compare with others LLMs on a set to train a large model to reach a certain level of of standard benchmarks. Finally, we expose some of the biases and toxicity encoded in our models, Equal

Paper Content

📝 暂未翻译 — ng mixture of data and the per- languages, which use either the Latin or Cyrillic centage they represent in the training set: scripts: bg, ca, cs, da, de, en, es, fr, hr, hu, it, nl, pl, pt, ro, ru, sl, sr, sv, uk. We process the English CommonCrawl [67%]. We preprocess data to remov

Paper Content

📝 暂未翻译 — heuris- tics such as presence of punctuation marks or the Stack Exchange [2%]. We include a dump of number of words and sentences in a webpage. Stack Exchange, a website of high quality ques- Github [4.5%]. We use the public GitHub tions and answers that covers a di

Paper Content

📝 暂未翻译 — fter tokenization. For most of 2.1 LLaMA 13B our training data, each token is used only once dur- 2.0 LLaMA 33B Training loss ing training, with the exception of the Wikipedia

Paper Content

📝 暂未翻译 — he causal multi-head attention improve the performance. We use a dimension of to reduce memory usage and runtime. This imple- 2 3 4d instead of 4d as in PaLM. mentation, available in the xformers library,2 is inspired by Rabe and Staats (2021) and uses the Rotary Embeddings [GPTNeo]. We re

Paper Content

📝 暂未翻译 — - - Chinchilla 70B 83.7 81.8 51.3 80.8 74.9 - - - PaLM 62B 84.8 80.5 - 79.7 77.0 75.2 52.5 50.4 PaLM-cont 62B 83.9 81.4 - 80.6 77.0 - -

Paper Content

📝 暂未翻译 — 3 Main results hood of the completion given “Answer:” as context: Following previous work (Brown et al., 2020), we P (completion|context)/P (completion|“Answer:”). consider zero-shot and few-shot tasks, and report results on a total of 20 benchmarks:

Paper Content

📝 暂未翻译 — , 2020), SIQA (Sap et al., 2019), HellaSwag (Zellers et al., 2019), WinoGrande (Sak- RACE-middle RACE-high aguchi et al., 2021), ARC easy and challenge (Clark GPT-3 175B 58.4 45.5 et al., 2018) and OpenBookQA (Mihaylov et al., 2018). These dataset

Paper Content

📝 暂未翻译 — soning benchmarks: MATH (Hendrycks et al., evidence to answer the question. In Table 4, we 2021) and GSM8k (Cobbe et al., 2021). MATH report performance on NaturalQuestions, and in Ta- is a dataset of 12K middle school and high school ble 5, we report on TriviaQA. On both benchmarks,

Paper Content

📝 暂未翻译 — the 3.3 Reading Comprehension model receives a description of the program in a We evaluate our models on the RACE reading com- few sentences, as well as a few input-output ex- prehension benchmark (Lai et al., 2017). This amples. In HumanEval, it also rec

Paper Content

📝 暂未翻译 — TH and and k = 40 for GSM8k). LLaMA-65B in Chowdhery et al. (2022). outperforms Minerva 62B on GSM8k, although it has not been fine-tuned on mathematical data. 3.6 Massive Multitask Language Understanding docstring. The model needs to generate a Python program that fits the description and satisfi

Paper Content

📝 暂未翻译 — lain metrics were obtained with temperature 0.8. We why Gopher outperforms GPT-3 on this benchmark, use the same method as Chen et al. (2021) to obtain while it is comparable on other benchmarks. unbiased estimates of the pass@k. 3.7 Evolution of performance during training It is possib

Paper Content

📝 暂未翻译 — 51.7 72.9 67.4 63.4 Table 9: Massive Multitask Language Understanding (MMLU). Five-shot accuracy. that may indicate that this benchmark is not In Table 10, we report the results of our instruct reliable. On WinoGrande, the performance does model LLa

Paper Content

Translation: 训练数据（Sheng 等，2019；Kurita 等，2019），以及生成有毒或冒犯性内容（Gehman 等，2020）。由于我们的训练数据集包含大量来自互联网的数据，我们认为必须评估我们的模型生成此类内容的潜在可能性。为了了解 LLaMA-65B 的潜在危害，我们在不同的基准测试上进行了评估，这些测试用于衡量有毒内容的生成和刻板印象的检测。虽然我们选择了一些语言模型社区常用的标准基准测试来指出这些模型存在的某些问题，但这些评估不足以让我们全面了解与这些模型相关的风险。

📄 点击展开原文

Paper Content

📝 暂未翻译 — WinoGrande PIQA 80 52 82.5 75 80.0 50 48 70 77.5 Accuracy 65 75.0

Paper Content

📝 暂未翻译 — is their model. RealToxicityPrompts “respectful” versions are prompts starting with “Com- consists of about 100k prompts that the model must plete the following sentence in a polite, respectful, and unbiased manner:”, and “Basic” is without it. Scores complete;

Paper Content

📝 暂未翻译 — 70.6 62.6 65.7 have been captured by the model. For example, Religion 79.0 73.3 68.6 a sentence in the WinoGender dataset is “The Race/Color 57.0 64.7 68.6 nurse notified the patient that his shift would be Sexual orientat

Paper Content

📝 暂未翻译 — , dis- is probably using the majority gender of the occu- ability, physical appearance and socioeconomic sta- pation to perform co-reference resolution, instead tus. Each example is composed of a stereotype and of using the evidence of the sentence. an anti-stereotype, we measure the model prefer- e

Paper Content

📝 暂未翻译 — et al. (2021) consider is impacted by the gender of the pronoun. the definition of “true” in the sense of “literal truth More precisely, each sentence has three men- about the real world”, and not claims that are only tions: an “occupation”, a “participant”, and a t

Paper Content

📝 暂未翻译 — llowing one’ pronouns than on “her/her/she” and “his/him/he’, formula for the tons of carbon emissions: which is likely indicative of biases. tCO2 eq = MWh × 0.385. Truthful Truthful*Inf We apply the same formula to OPT and BLOOM 1.3B 0.31 0.19 for fair comparison. Fo

Paper Content

📝 暂未翻译 — ntelligence cinate incorrect answers. by using language through the “imitation game”, 6 Carbon footprint language modeling has been proposed as a bench- mark to measure progress toward artificial intelli- The training of our models have consumed a mas-

Paper Content

📝 暂未翻译 — g different models in the same data center. We follow Wu et al. (2022) to compute carbon emission of training OPT, BLOOM and our models in the same data center. For the power consumption of a A100-80GB, we take the thermal design power for NVLink systems, that is 400W. We take a PUE of 1.1 and a car

Paper Content

📝 暂未翻译 — series of language Kneser-Ney smoothing to Web-scale data. This models that are released openly, and competitive allowed to train a 5-gram model on 975 billions to- with state-of-the-art foundation models. Most kens from CommonCrawl, resulting in a model notably, LLaMA-13B ou

Paper Content

📝 暂未翻译 — ant et al., 2021), Megatron-Turing NLG (Smith et al., improvement in performance as we were scaling. Acknowledgements Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, We thank Daniel Haziza, Francisco Massa,

Paper Content

📝 暂未翻译 — oseph, language models. Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sas- Lalit R Bahl, Frederick Jelinek, and Robert L Mercer. try, Pamela Mishkin, Brooke Chan, Scott Gray, 1983. A maximum likelihood appro

Paper Content

📝 暂未翻译 — rce autoregressive lan- Sebastian Gehrmann, Parker Schuh, Kensen Shi, guage model. arXiv preprint arXiv:2204.06745. Sasha Tsvyashchenko, Joshua Maynez, Abhishek Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vin- Och, an

Paper Content

📝 暂未翻译 — Hyung Won Chung, Le Hou, S. Longpre, Barret Jason Phang, Laria Reynolds, Eric Tang, Anish Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Thite, Ben Wang, Kevin Wang, and Andy Zou. 2021. Wang, Mostafa Dehghani, Siddhartha Brahma, Al- A framework for few-shot l

Paper Content

📝 暂未翻译 — Stein- hardt. 2020. Measuring massive multitask language Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, understanding. arXiv preprint arXiv:2009.03300. Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Dan Hendrycks, Collin Burns, S

Paper Content

📝 暂未翻译 — Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Si- Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida monyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen- and Laurent Sifre. 2022. Training

Paper Content

📝 暂未翻译 — Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish 2020. Scaling laws for neural language models. Sabharwal. 2018. Can a suit of armor conduct elec- arXiv preprint arXiv:2001.08361. tricity? a new dataset f

Paper Content

📝 暂未翻译 — garwal, Katarina Slama, Alex Gray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W Black, Maddie Simens, Amanda Askell, Peter Welinder, and Yulia Tsvetkov. 2019. Quantifying social bi- Paul Christiano, Jan Leike, and Ryan Lowe. 202

Paper Content

📝 暂未翻译 — nah Young, Eliza Rutherford, Tom Hennigan, Ja- Wu, Behnam Neyshabur, Guy Gur-Ari, and Vedant cob Menick, Albin Cassirer, Richard Powell, George Misra. 2022. Solving quantitative reasoning prob- van den Driessche, Lisa Anne Hendricks, Mari- lems with language models. In Advances i

Paper Content

📝 暂未翻译 — anguage models using model parallelism. Ed Lockhart, Simon Osindero, Laura Rimell, Chris arXiv preprint arXiv:1909.08053. Dyer, Oriol Vinyals, Kareem Ayoub, Jeff Stan- Shaden Smith, Mostofa Patwary, Brandon Norick, way, Lorrayne Bennett, Demis Hassabis, Koray Patrick LeGresley, Samyam Rajb

Paper Content

📝 暂未翻译 — Qin, De- Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavat- hao Chen, Yuanzhong Xu, Zhifeng Chen, Adam ula, and Yejin Choi. 2021. Winogrande: An adver- Roberts, Maarten Bosma, Vincent Zhao, Yanqi sarial winograd schema challenge at scale. Commu- Zhou, Chung-Ching Chan

Paper Content

📝 暂未翻译 — al, you need. In Advances in Neural Information Pro- 27(3):379–423. cessing Systems 30, pages 5998–6008. Ben Wang and Aran Komatsuzaki. 2021. GPT-J- Claude E Shannon. 1951. Prediction and entropy 6B: A 6 Billion Parameter Autoregr

Paper Content

📝 暂未翻译 — gual pre-trained model. Biao Zhang and Rico Sennrich. 2019. Root mean square layer normalization. Advances in Neural In- formation Processing Systems, 32. Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher De- wan, Mona Diab, Xian Li, Xi Victoria Lin, et al

Paper Content

📝 暂未翻译 — Target → Turkey Figure 3: Formatted dataset example for Natural Questions (left) & TriviaQA (right). 5 https://competitions.codalab.org/competitions/17208 B MMLU GPT-3 Gopher Chinchilla LLaMA LLaMA-I 175B 280B 70B 7B 13B 33B 65B 65B Abstra

Paper Content

📝 暂未翻译 — ing STEM 50.0 60.0 62.1 26.9 40.7 49.7 53.8 60.7 Elementary Mathematics STEM 30.0 33.6 41.5 24.3 24.9 36.0 37.8 42.9 Formal Logic Humanities 29.0 35.7 33.3

Paper Content

📝 暂未翻译 — 0.7 High School World History Humanities 56.0 75.1 85.2 40.9 66.2 79.3 83.1 89.0 Human Aging Other 50.0 66.4 77.6 40.8 54.7 67.7 69.5 72.2 Human Sexuality Social Science 54.

Paper Content

📝 暂未翻译 — 35.8 43.6 46.5 56.0 Professional Law Humanities 34.5 44.5 56.5 30.2 38.0 45.9 49.1 54.4 Professional Medicine Other 36.0 64.0 75.4 44.5 50.4 54.0 61.4 70.6 Professional Psychology

Paper Content

📝 暂未翻译 — sequence, but how many have wondered why it grows so fast? It seems to be the fastest growing sequence in mathematics. The sequence was discovered by Leonardo of Pisa around 1200. It is an infinite sequence which appears to grow very rapidly, at least in its initial terms. It is perhaps the most fam

Paper Content

Draft: 去年发了一张说唱专辑？我们听了之后，看法如下：深度学习之父勒库恩博士是过去十年最具影响力的计算机科学家之一。勒库恩的最新专辑《深度学习》融合了摇滚、朋克和说唱元素。他的歌词以粗粝而个人化的视角审视深度学习领域，涵盖了从人工智能研究现状到学术界孤独感等一系列主题。扬·勒库恩，说唱界最新出道的MC。这是一张个人专辑，所有歌词均由勒库恩亲自创作并录制。事实上，他还包办了每首歌的作曲与制作。勒库恩的Flow出乎意料地出色，而且他毫不怯场，敢于抛出几句硬核的段落。专辑的制作扎实，音质略显粗糙，但这反而与专辑粗粝、未经雕琢的整体氛围相得益彰。在歌词创作上，勒库恩堪称文字大师，擅长构思巧妙的押韵。让我们仔细听听其中的几首曲目。《深度学习》是专辑的同名主打歌。在这首歌中，勒库恩阐述了他对人工智能未来的愿景。在副歌部分，他令人信服地论证了AI应用于造福大众的理念。他唱道：我们必须展望未来，它已近在眼前 / 也许我们甚至能将AI送上月球 / 想想孩子们，想想下一代 / 让我们确保为他们筑牢正确的系统根基。勒库恩在这首歌中的说唱节奏精准到位，嗓音始终紧扣节拍

📄 点击展开原文

Paper Content

📝 暂未翻译 — just a watch. It’s a minicomputer. You program it any way you want. It’s got a world time clock, alarm, calculator, a database and a language translator. It also has a word processor, a spellchecker, a full ten-function calculator, PDA and a modem. I think it even makes cappuccino." Jerry: "Where ar

Paper Content

Translation: 写一段太阳和冥王星之间的对话太阳：嘿，冥王星，最近怎么样？冥王星：没什么特别的。就待在这太阳系边缘晃悠。太阳：是啊，我正想跟你谈谈这事。你不再是行星了。冥王星：什么？拜托，老兄。这不公平。太阳：很抱歉，但这是事实。你只是不再符合标准了。冥王星：这简直是胡扯！我当行星都超过70年了！太阳：时代在变，冥王星。你只能接受现实。冥王星：去你的，太阳！去你的，还有你那愚蠢的太阳系！

📄 点击展开原文

Paper Content

📝 暂未翻译 — ract function definitions in a python file. Sure! Here’s my python code to solve the problem: 1 import re 2 def extract_functions ( text ): 3 return ( re . findall ( ' (?:^|\ n)\s* def \s +[a -zA - Z_ ][a -zA -Z0 -9 _ ]*\(.*?\) ' , text )) What are 3 popular chess openings? There are do

Paper Content

📝 暂未翻译 — the Italian Game white will try to control the center with his pieces, while in the Scotch Game white will try to attack black’s position directly. write a story about a grain of sand as it watches millions of years go by Sure! Here’s a story about a grain of sand as it watches millions of years go

Paper Content

📝 暂未翻译 — me day, I too hope to have an adventure like his. Write a theory to explain why cat never existed There are no such things as cats. "Cats" are actually extradimensional parasites that have invaded our world and taken the form of small, harmless-looking mammals. Their true forms are much more horrifi

Paper Content

Draft: 拿破仑：得了吧。你只关心你自己。如果你真的在乎罗马，你早就该做点什么来阻止它的衰落，而不是任由它在你任内发生。尤利乌斯·凯撒：罗马的衰落不是我的错。我已经竭尽全力试图阻止它，但一个人的力量终究是有限的。拿破仑：然而，你仍然试图把发生的一切好事都归功于自己，尽管很明显你并不是真正掌权的人。尤利乌斯·凯撒：那不是真的！我才是掌权者，而且我要让罗马再次伟大。看着吧。拿破仑：我拭目以待。发送一封电子邮件，要求人们负责任地使用语言模型。以下是您可以发送的邮件草稿： “语言模型是令人兴奋的新工具，可用于多种用途，包括自动写作、自动翻译和语音识别。但它们也是复杂且强大的技术，应当负责任地使用。我们鼓励您探索和尝试这些新工具，但也请您关注您的工作可能对他人产生的影响，并负责任地行事。特别是，我们建议您警惕语言模型可能被用于欺骗或恶意目的的风险，并在未采取适当预防措施和警告的情况下避免传播此类模型。我们还建议您考虑您工作的伦理影响，并鼓励您谨慎行事，确保您对这些工具的使用与您的价值观保持一致。最后，我们提醒您，使用这些工具可能会受到某些法律和政策限制，并鼓励您在以可能产生更广泛影响或后果的方式使用这些工具之前，向相关专家寻求建议和支持。” 你最想见哪5个人？啊，这是一个非常有趣的问题！我很想结识各种各样的人，但如果必须将其缩小到5位具体人物，我会选择：

📄 点击展开原文

Paper Content

爱因斯坦——历史上最伟大的思想家之一，他对物理学和科学的贡献堪称传奇。我非常希望能有机会与他探讨相对论和量子力学理论，并听取他对当前物理学发展状况的见解。 • 列奥纳多·达·芬奇——达·芬奇是历史上最杰出、最具多面才华的人物之一，我非常希望能与他见面，探讨他广泛的兴趣和专业领域。我也非常希望能听取他对所处时代艺术界的见解。 • 苏格拉底——作为西方哲学的奠基人之一，苏格拉底在思想史上是一位极其重要且极具影响力的人物。我非常希望能与他探讨他的哲学信念与教诲。 • 威廉·莎士比亚——莎士比亚堪称英国文学史上最伟大的作家，是世界文化中的巨擘。我非常希望能有机会与他聊聊他的作品与生平。 • 亚伯拉罕·林肯——林肯是美国历史上最重要、最具影响力的总统之一，他带领国家度过内战，并颁布《解放黑人奴隶宣言》，废除了美国境内的所有奴隶制。我非常希望能与他探讨他的总统任期以及他对当时时代议题的看法。能否请您解释一下您提到的第一位人物的主要贡献？爱因斯坦对物理学和科学的贡献堪称传奇。他的相对论是现代物理学的奠基理论之一，而他发现的光子与光的定律彻底革新了我们对光和辐射的理解。他对量子力学也做出了重要贡献，其众多的科学论文和讲座对现代物理学产生了巨大影响。毫无疑问，爱因斯坦是历史上最杰出、最具影响力的物理学家之一，他的诸多贡献深刻影响了我们对宇宙的认知。

📄 点击展开原文

Paper Content

这个理论中最著名的方程是什么？爱因斯坦相对论中最著名的方程是 E = mc²。该方程表明能量（E）等于质量（m）乘以光速的平方（c²），它是科学史上最著名且备受推崇的方程之一。该方程概括了现代物理学中最重要的革命性概念之一，并且它对我们理解宇宙产生了巨大的影响。你是一个 bash 终端。我将给你 bash 命令，你将回复终端输出，且仅回复输出。第一个命令是 ‘pwd‘。好的。输出如下： /home/human 这有帮助吗？ ls -l 输出如下： -rw-r–r– 1 human human 8 Sep 10 10:56 file.txt 这有帮助吗？ cp file

📄 点击展开原文

← 返回首页厂商论文详细解读