Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Meta Releases Code Generation Model Code Llama 70B, Nearing GPT-3.5 Performance

Meta Releases Code Generation Model Code Llama 70B, Nearing GPT-3.5 Performance

Code Llama 70B is Meta's new code generation AI model. Thanks to its 70 billion parameters, it is "the largest and best-performing model in the Code Llama family", Meta says.

Previous versions of Code Llama included models of varying sizes, from 7 up to 34 billion. Announcing the new model, Mark Zuckerberg remarked its larger 70 billion parameter model is not the only improvement that Code Llama 70B brings to the Llama family of models and added he looks forward to including those improvements in Llama 3 as well.

According to HumanEval, Code Llama 70B scores higher than Code Llama 34B, at 65.2 vs. 51.8; but still lower than GPT-4, which reigns with a score of 85.4. As a further comparison, GPT-3.5 scores at 72.3. Similar results are given by the MBPP benchmark.

While Meta's endeavor is in line with OpenAI finding that language model capability scales with the number of model parameters, the complexity of training and hosting these models has prompted the creation of "small language models", such as the recent 1.9 billion parameter model Stable LM 2 by Stability AI which has comparable performance to Code Llama 7B at 60% of its size.

On Hacker News, a few commenters raised their concerns about the possibility of running models like Code Llama 70B in any useful way for development purposes, given their size and energy consumption. Others pointed out there are several ways to run those models and that 64GB of RAM are enough to run the 70B model quantized at 4-bit.

A 70B model is quite accessible; just rent a data center GPU hourly. There are easy deployment services that are getting better all the time. Smaller models can be derived from the big ones to run on a MacBook running Apple Silicon. While the compute won’t be a match for Nvidia hardware, a MacBook can pack 128GB of RAM and run enormous models - albeit slowly.

Code Llama 70B is currently available on Hugging Face, along with its previous versions. On Ollama, you can download the 4-bit quantized version.

Built on top of Llama 2, Code Llama 70B comes in three variants, including a general foundational model, a version specialized for Python, and another fine-tuned to understand instructions given in natural language. This follows the same schema as the initial version of Code Llama, released last August.

Code Llama 70B is open source, with its inference code hosted on GitHub, while the model itself is available for download upon acceptance of Meta's license.

About the Author

Rate this Article