EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J

A team of researchers from EleutherAI have open-sourced GPT-J, a six-billion parameter natural language processing (NLP) AI model based on GPT-3. The model was trained on an 800GB open-source text dataset and has performance comparable to a GPT-3 model of similar size.

Developer Aran Komatsuzaki announced the release on his blog. The model was trained on EleutherAI's Pile dataset using Google Cloud's v3-256 TPUs; training took approximately five weeks. On common NLP benchmark tasks, GPT-J achieves an accuracy similar to OpenAI's published results for their 6.7B parameter version of GPT-3. EleutherAI's release includes the model code, pre-trained weight files, Colab notebook, and a demo website. According to Komatsuzaki,

GPT-J is the best-performing publicly available Transformer [language model] in terms of zero-shot performance on various [down-stream] tasks.

OpenAI first published a paper on generative pre-trained transformers (GPT), an unsupervised learning model that achieved state-of-the-art results on several NLP tasks, in 2018. In early 2019, OpenAI announced a 1.5B parameter model called GPT-2. OpenAI initially declined to release the largest trained model, citing "concerns about malicious applications of the technology," but did release the model later that year. Last year, OpenAI announced a 175B parameter model, GPT-3, but again did not release the trained model files. Instead, OpenAI provided an API that allows developers to integrate the model into their code via web service calls.

EleutherAI, a "decentralized grassroots collective of volunteer researchers," released their first implementation of a GPT-like system, the 2.7B parameter GPT⁠-⁠Neo model, in March 2021. GPT-Neo was implemented in TensorFlow and trained on TPUs using the parallel library Mesh TensorFlow. The team also began developing GPT-NeoX, a GPU-based implementation that uses Microsoft's DeepSpeed; although the code is open-sourced, there are currently no model files available.

The latest model, GPT-J, was trained using a new library, Mesh-Transformer-JAX. The library uses Google's JAX linear algebra framework, instead of a dedicated deep-learning framework such as TensorFlow. Komatsuzaki claims that GPT-J provides "more flexible and faster inference than Tensorflow," and developing the model took much less time than previous projects. Compared to the 2.7GB GPT-Neo model, GPT-J shows a 125% improvement in training efficiency.

In response to concerns about the misuse of its models, EleutherAI co-founder Connor Leahy posted a justification of the release on the organization's blog. Leahy noted that GPT-like models are "simple and theoretically straight-forward," making it infeasible to keep the technology out of the hands of bad actors. Instead, EleutherAI's goal is to enable more widespread safety research, especially for "low-resource" researchers. Leahy also pointed out that many well-funded organizations have already trained even larger models than GPT-3, including Microsoft, NVIDIA, and Google.

In a Twitter discussion about the release, a user asked about the hardware requirements for running the model. Komatsuzaki replied

For inference, in principle you can modify the code to run it on any hardware that can hold a bit more than 12GB of memory. Best throughput can be achieved with TPUs, in which case you can just run as is. Fine-tuning is more demanding: you need at least TPU v3-8 to do that.

The GPT-J code and models are available on GitHub. EleutherAI's website hosts an interactive demo of the model's text generation capabilities.

Topics

Pitfalls of Unified Memory Models in GPUs

Evolving Trainline Architecture for Scale, Reliability and Productivity

Generally AI - Season 2 - Episode 3: Surviving the AI Winter

Mastering Observability: Unlocking Customer Insights with Gojko Adzic

Proactive Approaches to Securing Linux Systems and Engineering Applications

Helpful links

Choose your language

Write for InfoQ

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

Microsoft Introduces Drasi: Open-Source System for Real-Time Event Processing and Automation

How Cell-Based Architecture Enhances Modern Distributed Systems

Article Series: Cell-Based Architectures: How to Build Scalable and Resilient Systems

Orchestrating a Path to Success - a Conversation with Bernd Ruecker

OpenAI Releases Swarm, an Experimental Open-Source Framework for Multi-Agent Orchestration

Generally AI - Season 2 - Episode 3: Surviving the AI Winter

Challenges and Lessons Porting Code from C to Rust

Copilot Now Available in OneDrive: AI-Powered Features for Streamlined Document Management

Ephemeral IDs: Cloudflare's Latest Tool for Fraud Detection

Evolving Trainline Architecture for Scale, Reliability and Productivity

Taking Advantage of Cell-Based Architectures to Build Resilient and Fault-Tolerant Systems

No EC2 or Kubernetes Allowed: Insights from Building Serverless-Only Architecture at PostNL

Mastering Observability: Unlocking Customer Insights with Gojko Adzic

How a Sustainable Mindset in Software Engineering Can Increase Team Performance and Prevent Burnout

The Ongoing Challenges of DevSecOps Transformation and Improving Developer Experience

University Researchers Publish Analysis of Chain-of-Thought Reasoning in LLMs

Microsoft and Tsinghua University Present DIFF Transformer for LLMs

OpenAI Releases Swarm, an Experimental Open-Source Framework for Multi-Agent Orchestration

Google Cloud Adds Scalable Vector Search to Memorystore for Valkey & Redis Cluster

Podman Desktop 1.13 Launches with Hyper-V Support and Additional Enhancements

Uber Completes Major MySQL Fleet Upgrade, Boosting Performance and Security

QCon San Francisco

QCon London

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?