EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J

A team of researchers from EleutherAI have open-sourced GPT-J, a six-billion parameter natural language processing (NLP) AI model based on GPT-3. The model was trained on an 800GB open-source text dataset and has performance comparable to a GPT-3 model of similar size.

Developer Aran Komatsuzaki announced the release on his blog. The model was trained on EleutherAI's Pile dataset using Google Cloud's v3-256 TPUs; training took approximately five weeks. On common NLP benchmark tasks, GPT-J achieves an accuracy similar to OpenAI's published results for their 6.7B parameter version of GPT-3. EleutherAI's release includes the model code, pre-trained weight files, Colab notebook, and a demo website. According to Komatsuzaki,

GPT-J is the best-performing publicly available Transformer [language model] in terms of zero-shot performance on various [down-stream] tasks.

OpenAI first published a paper on generative pre-trained transformers (GPT), an unsupervised learning model that achieved state-of-the-art results on several NLP tasks, in 2018. In early 2019, OpenAI announced a 1.5B parameter model called GPT-2. OpenAI initially declined to release the largest trained model, citing "concerns about malicious applications of the technology," but did release the model later that year. Last year, OpenAI announced a 175B parameter model, GPT-3, but again did not release the trained model files. Instead, OpenAI provided an API that allows developers to integrate the model into their code via web service calls.

EleutherAI, a "decentralized grassroots collective of volunteer researchers," released their first implementation of a GPT-like system, the 2.7B parameter GPT⁠-⁠Neo model, in March 2021. GPT-Neo was implemented in TensorFlow and trained on TPUs using the parallel library Mesh TensorFlow. The team also began developing GPT-NeoX, a GPU-based implementation that uses Microsoft's DeepSpeed; although the code is open-sourced, there are currently no model files available.

The latest model, GPT-J, was trained using a new library, Mesh-Transformer-JAX. The library uses Google's JAX linear algebra framework, instead of a dedicated deep-learning framework such as TensorFlow. Komatsuzaki claims that GPT-J provides "more flexible and faster inference than Tensorflow," and developing the model took much less time than previous projects. Compared to the 2.7GB GPT-Neo model, GPT-J shows a 125% improvement in training efficiency.

In response to concerns about the misuse of its models, EleutherAI co-founder Connor Leahy posted a justification of the release on the organization's blog. Leahy noted that GPT-like models are "simple and theoretically straight-forward," making it infeasible to keep the technology out of the hands of bad actors. Instead, EleutherAI's goal is to enable more widespread safety research, especially for "low-resource" researchers. Leahy also pointed out that many well-funded organizations have already trained even larger models than GPT-3, including Microsoft, NVIDIA, and Google.

In a Twitter discussion about the release, a user asked about the hardware requirements for running the model. Komatsuzaki replied

For inference, in principle you can modify the code to run it on any hardware that can hold a bit more than 12GB of memory. Best throughput can be achieved with TPUs, in which case you can just run as is. Fine-tuning is more demanding: you need at least TPU v3-8 to do that.

The GPT-J code and models are available on GitHub. EleutherAI's website hosts an interactive demo of the model's text generation capabilities.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter