Tesla Introduces D1 Dojo Chip to Train AI Models

Tesla introduced the Tesla D1, a new chip designed specifically for artificial intelligence that is capable of delivering a power of 362 TFLOPs in BF16 / CFP8. This was announced at Tesla’s recent AI Day event.

The Tesla D1 adds a total of 354 training nodes that form a network of functional units, which are interconnected to create a massive chip. Each functional unit comes with a quad-core, 64-bit ISA CPU that uses a specialized, custom design for transpositions, compilations, broadcasts, and link traversal. This CPU adopts a superscalar implementation (4-wide scalar and 2-wide vector pipelines).

This new Tesla silicon is manufactured in 7nm process, has a total of 50,000 million transistors, and occupies an area of 645 mm square, which means that it is smaller than the GA100 GPU, used in the NVIDIA A100 accelerator, which is 826 mm square in size.

Each functional unit has 1.25 MB SRAM and 512 GB/sec bandwidth in any direction on the unit network. The CPUs are joined in multichip configurations of 25 D1 units, which Tesla calls "Dojo Interface Processors" (DIPs).

Tesla claims its Dojo chip will process computer vision data four times faster than existing systems, enabling the company to bring its self-driving system to full autonomy, but the two most difficult technological feats have not been accomplished by Tesla yet, this is the tile to tile interconnect and software. Each tile has more external bandwidth than the highest end networking switches. To achieve this, Tesla developed custom interconnects. Tesla says the first Dojo cluster will be running by next year.

The same technology that undergirds Tesla’s cars will drive the forthcoming Tesla Bot, which is intended to perform mundane tasks like grocery shopping or assembly-line work. Its design spec calls for 45-pound carrying capacity, “human-level hands,” and a top speed of 5 miles per hour (so humans can outrun it).

IBM’s Telum Processor is the latest silicon wafer chip and a competitor to the Tesla D1. IBM’s first commercial processor, the Telum contains on-chip acceleration and allows clients to use deep-learning interference at scale. IBM claims that the on-chip acceleration empowers the system to conduct inference at a great speed.

IBM’s Telum is integral in fraud detection during the early periods of transaction processing while Tesla’s Dojo is mainly essential for computer vision for self-driving cars using cameras. While Telum is a silicon wafer, Dojo has gone against industry standards: the chips are designed to connect without any glue.

The most powerful supercomputer in the world, Fugaku, lives at the RIKEN Center for Computational Science in Japan. At its tested limit it is capable of 442,010 TFLOPs per second, and theoretically it could perform up to 537,212 TFLOPs per second. Dojo, Tesla said, could end up being capable of breaking the exaflop barrier, something that no supercomputing company, university or government has been capable of doing.

Dojo is made up of a mere 10 cabinets and is thus also the smallest supercomputer in the world when it comes to size. Fugaku on the other hand is made up of 256 cabinets. If Tesla was to add 54 cabinets to Dojo V1 for a total of 64 cabinets, Dojo would surpass Fugaku in computer performance.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

InfoQ Article Contest

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter