Meta AI released Llama 3, the latest generation of their open-source large language model (LLM) family. The model is available in 8B and 70B parameter sizes, each with a base and instruction-tuned variant. Llama3 outperforms other LLMs of the same parameter size on standard LLM benchmarks.
Meta calls the new generation a "major leap" over Llama 2. There are several architecture changes, including a better tokenizer and a more efficient grouped query attention (GQA) mechanism. Llama 3 is trained on 15T tokens of publicly-available text data: 7x more than Llama 2. The instruction-tuned variant was trained with a combination of methods, including proximal policy optimization (PPO) and direct preference optimization (DPO), which improved the model's performance on coding and reasoning tasks. Along with the models, Meta released new safety tools, including Code Shield, a filter for detecting insecure code generated by Llama 3. According to Meta,
The text-based models we are releasing today are the first in the Llama 3 collection of models. Our goal in the near future is to make Llama 3 multilingual and multimodal, have longer context, and continue to improve overall performance across core LLM capabilities such as reasoning and coding.
Meta released the first generation of LLaMA (Large Language Model Meta AI) in early 2023, then followed it with Llama 2 and Code Llama. The models showed similar performance to LLMs, such as GPT-3 and Google's PaLM, that had 10x the parameters. The models are released under a "bespoke commercial license" which restricts the number of monthly active users that Llama-based apps can support.
Behind Llama 3's state-of-the-art performance is the training dataset and the amount of training computation. Meta collected "data of the highest quality," using Llama 2 to train a set of text classifiers to filter out low-quality data. The research team also found that training the model with more than the Chinchilla-optimal amount of compute brought continued performance gains.
In the first week after the Llama 3 release, Meta claimed that the weights were downloaded "over 1.2 million times," and that 3rd-party developers had trained "over 600 derivative models" and made them available on Huggingface. Other 3rd-party contributions include increasing the model's context window. Meta also claims that they are currently training a version of Llama 3 with more than 400B parameters, using their 24K-GPU Grand Teton clusters.
In a discussion about Llama 3 on Hacker News, one user pointed out that Meta's performance evaluation did not compare the model to GPT-4 or to Claude Opus. Another user explained:
They didn't compare against the best models because they were trying to do "in class" comparisons, and the 70B model is in the same class as Sonnet (which they do compare against) and GPT3.5 (which is much worse than Sonnet). If they're beating Sonnet that means they're going to be within stabbing distance of Opus and GPT-4 for most tasks, with the only major difference probably arising in extremely difficult reasoning benchmarks. Since Llama is open source, we're going to see fine tunes and LoRAs though, unlike Opus.
Meta currently requires users to submit an access request before downloading the model weights. The model is also available to use in AWS, GCP, and Azure. Meta has also integrated Llama 3 into their Meta AI assistant.