Eric Walz Jun 22, 2021 11:00 AM WST
At the fourth annual Computer Vision and Pattern Recognition (CVPR 2021) conference this week, Tesla’s senior director of AI, Andrej Karpathy, shared details the company’s latest supercomputer that will be used to train deep neural networks (DNN) for Tesla’s Autopilot and Full Self-Driving (FSD) autonomous driving features.
Tesla said its new supercomputer will be ready by the end of the year. It’s actually the predecessor to Tesla’s more powerful supercomputer nicknamed “Dojo”, which Tesla Chief Executive Elon Musk says should be ready by the end of the year.
“Project Dojo” was first announced at Tesla’s Autonomy Investor Day in April 2019. Musk mentioned the supercomputing power of “Dojo” will help Tesla better label visual data, which is a difficult and time consuming task for developers of self-driving vehicles.
For autonomous vehicles, neural networks are used to train software for complex tasks, such as identifying street signs, detecting pedestrians and predicting their movements, as well as for safe navigation. A typical self-driving vehicle can use dozens of DNNs for perception, localization and path planning.
However training neural networks requires a hefty amount of processing power, which is why Tesla built its supercomputer using powerful Nvidia GPUs.
Tesla’s supercomputer uses 720 nodes of 8x NVIDIA A100 Tensor Core GPUs (5,760 GPUs total) to achieve an unparalleled 1.8 exaflops of performance, making it one of the world’s most powerful computers. This kind of processing power is mind boggling. One exaFLOP is one quintillion (1018) floating-point operations per second.
NVIDIA’s A100 GPUs power the world’s highest-performing data centers. The A100 GPU provides up to 20x higher performance over the prior generation, according to NVIDIA.
“This is a really incredible supercomputer,” Karpathy said during his presentation at CVRP 2021. “I actually believe that in terms of flops, this is roughly the No. 5 supercomputer in the world.”
Tesla uses the data from more than one million Tesla vehicles on the road to refine and build new autonomous driving features for continuous improvement. Having a fleet of connected vehicles to regularly collect data from gives Tesla an enormous advantage over other automakers in the development of autonomous driving technology.
Camera data from Tesla vehicles is continuously fed into the supercomputer in order to improve the software powering Autopilot and FSD. The neural networks are used to label 4D data from videos taken from eight onboard cameras that make up each vehicle’s 360-degree perception system.
How Tesla’s Supercomputer is Put to Work
In a blog post, Nivida’s Senior Director of Automotive, Danny Shapiro, provided an overview of how Tesla’s supercomputer is used to train its deep neural networks for autonomous driving. Tesla’s cyclical development actually begins in the car, he said.
Whenever a Tesla vehicle is driving, a deep neural network running in the background known as “shadow mode” quietly perceives and makes predictions without actually controlling the vehicle, explained Shapiro.
The predictions are recorded, and any mistakes or misidentifications are logged. Tesla engineers then take this information and use each instance to create a training dataset of difficult and diverse scenarios to refine the DNN for improved performance.
The dataset is a collection of roughly one million 10-second clips recorded at 36 frames per second, equal to around 1.5 petabytes of data. The DNN runs through these scenarios in the data center over and over until it operates without a mistake. From there, it’s sent back to the vehicle and the process repeats.
Karpathy said training a DNN in this manner and on such a large amount of data requires a massive amount of compute power, which led Tesla to build and deploy the current generation supercomputer with NVIDIA’s high-performance A100 GPUs.
In addition to training DNNs, Tesla’s supercomputer provides vehicle engineers with the high performance needed to experiment and iterate in the development process.
Karpathy said the current DNN structure that Tesla is deploying allows a team of 20 engineers to work on a single network at once, isolating different features for parallel development, which is much faster.
These DNNs can then be run through training datasets at speeds faster than what has been previously possible for rapid iteration.
“Computer vision is the bread and butter of what we do and enables Autopilot,” said Karpathy. “For that to work, you need to train a massive neural network and experiment a lot.”That’s why we’ve invested a lot into the compute.”
Tesla’s autonomous driving system uses just radar and cameras, while most other developers of self-driving vehicles rely on supplemental lidar data. However, as Tesla’s computer vision capabilities improve, the radar will no longer be needed. Tesla said it would drop radar altogether, and began transitioning solely to the camera-based system in its Model 3 and Model Y vehicles starting in May.
Earlier this year, Musk said during the company’s fourth quarter earnings call in January that its upcoming Dojo supercomputer could potentially be offered as a service for other companies for training their neural networks.
“So some of the others need neural net training, we’re not trying to keep it to ourselves,” Musk said in January. “So I think there could be a whole line of business in and of itself.”
Tesla’s supercomputer is also a work in progress and the company plans to build an even more powerful one than the upcoming Dojo in the future. Although Karpathy declined to elaborate on the next iteration of Dojo, he said it will take Tesla’s supercomputing plans to the “next level”, which might get Musk closer to his target of level-5 autonomous driving, which requires no human supervision whatsoever.