NVIDIA CEO Jenson Huang ended his GTC 2024 keynote presentation backed by images of all of the various humanoid robots currently on the market that are powered by the Jetson Orin computer. | Credit: The Robot Report
To be effective and commercially viable, humanoid robots will need a full stack of technologies for everything from locomotion and perception to manipulation. Developers of artificial intelligence and humanoids are using NVIDIA tools, from the edge to the cloud.
At NVIDIA's GPU Technology Conference (GTC) in March, CEO Jensen Huang appeared on stage with several humanoids in development using the company's technology. For instance, Figure AI last month unveiled its Figure 02 robot, which used NVIDIA graphics processing units (GPUs) and Omniverse to autonomously conduct tasks in a trial at BMW.
"Developing autonomous humanoid robots requires the fusion of three computers: NVIDIA DGX for AI training, NVIDIA Omniverse for simulation, and NVIDIA Jetson in the robot," explained Deepu Talla, vice president of robotics and edge computing at NVIDIA, which will be participating in RoboBusiness 2024.
Talla shared his perspective on the race to build humanoids and how developers can benefit from NVIDIA's offerings with The Robot Report.
What do you think of the potential for humanoids, and why have they captured so much attention?
Talla: There's the market need - everyone understands the current labor shortages and the need to automate jobs that are dangerous. In fact, if you look at the trajectory of humanoids, we've moved away from a lot of people trying to solve just mechatronics projects into general-purpose robot intelligence.
There are also two inflection points. The first is that generative AI and the new way of training algorithms hold a lot of promise. From CNNs [convolutional neural networks] to deep learning, the slope is going up.
The second inflection point is the work on digital twins and the industrial metaverse. We've been working on Omniverse for well over 15 years, and in the past year or so, it has reached reasonable maturity.
The journey over the next several years is to create digital twins faster, use ray tracing and reinforcement learning, and bridge the sim-to-real gap. NVIDIA is a platform company - we're not building robots, but we're enabling thousands of companies building robots, simulation, and software.
Is NVIDIA working directly with developers of humanoids?
Talla: We have the good fortune of engaging with every robotics and AI company on the planet. When we first started talking about robotics a decade ago, it was in the context of the computer brain and NVIDIA Jetson.
Today, robots need the three computers, starting with that brain for functional safety, able to run AI on low power, and featuring more and more acceleration.
There's also the computer for training the AI, with the DGX infrastructure. Then, there's the computer in the middle. We're seeing use grow exponentially for OVX and Omniverse for simulation, robot learning and virtual worlds.
Why is simulation so important for training humanoid robots?
Talla: It's faster, cheaper, and safer for any task. In the past, the main challenge was accuracy. We're starting to see its application in humanoids for perception, navigation, actuation, and gripping, in addition to locomotion and functional safety.
The one thing everyone says they're working on - general-purpose intelligence - hasn't been solved, but we now have a chance to enable progress.
Isn't that a lot of problems to solve at once? How do you help tie perception to motion?
Talla: Going back a year or two, we were focusing on perception for anything that needs to move, from industrial robot arms to mobile robots and, ultimately, humanoids.
Speaking of foundation models, how do the latest AI models support humanoid developers?
Talla: At GTC this year, we talked about Project GR00T, a general-purpose foundation model for cognition. Think of it like Llama 3 for humanoid robots.
NVIDIA is partnering with many humanoid companies so they can fine-tune their systems for their environments.
At SIGGRAPH, we discussed how to generate the data needed to build this general-purpose model. It's a big challenge. ChatGPT has the Internet as its source for language, but how do you do this for humanoids?
As we embarked on this model, we recognized the need to create more tools. Developers can use our simulation environment and fine-tune it, or they can train their own robot models.
Everyone needs to be able to easily generate synthetic data to augment real-world data. It's all about training and testing.
With its experience in simulation, what kind of boost does NVIDIA offer developers?
Talla: We've created assets for different environments, such as kitchens or warehouses. The RoboCasa NIM makes it easy to import different objects into these generated environments.
Companies must train their robots to act in these environments, so they can make the algorithms watch human demonstrations. But they want much more data on angles, trajectories.
Another method for training humanoids is with teleoperation. NVIDIA is building developer tooling for this, and we have another for actuation with multiple digits. Many robot grippers have only two fingers or suction cups, but humanoids need more dexterity to be useful for households or elder care.
We bring all these tools together in Isaac Sim to make them easier to use. As developers build their robot models, they can pick whatever makes sense.
You mention NIMs - what are they?
Talla: NVIDIA Inference Microservices, or NIM, are easier to consume and already performance-optimized with the necessary runtime libraries.
Since each developer might focus on something different, such as perception or locomotion, we help them with workflows for each of the three computers for humanoids.
How does NVIDIA determine what capabilities to build itself and what to leave for developers?
Talla: Our first principle is to do only as much as necessary. We looked at the whole industry and asked, "What is a fundamental problem?"
For manipulation, we studied motion and found it was cumbersome. We created CUDA parallel processing and cuMotion to accelerate motion planning.
We're doing a lot, but there are so many domain-specific things that we're not doing, such as picking. We want to let the ecosystem innovate on top of that.
Some companies want to build their own models. Others might have something that solves a specific problem in a better way.
What has NVIDIA learned from its robotics customers?
Talla: There are so many problems to solve, and we can't boil the ocean. We sit down with our partners to determine what's the most urgent problem to solve.
For some, it could be AI for perception or manipulation, while others might want an environment to train algorithms with synthetic data generation.
We want people to be more aware of the three-computer model, and NVIDIA works with all the other tools in the industry. We're not trying to replace ROS, MuJoCo, Drake, or other physics engines or Gazebo for simulation.
We're also adding more workflows to Isaac Lab and Omniverse to simplify robotic workflows.
We've heard a lot of promises on the imminent arrival of humanoid robots in industrial and other settings. What timeframes do you think are realistic?
Talla: The market needs it to accelerate significantly. Developers are not solving problems for automotive or semiconductor manufacturing, which are already heavily automated.
I'm talking about all of the midlevel industries, where it's too complicated to put robots. Young people don't want to do those tasks, just as people have migrated from farms to cities.
Now that NVIDIA is providing the tools for success with our Humanoid Robot Developer Program, innovation is only going to accelerate. But deployments will be in a phased manner.
It's obvious why big factories and warehouses are the first places where we'll see humanoids. They're controlled environments where they can be functionally safe, but the market opportunity is much greater.
It's an inside-out approach versus an outside-in approach. If there are 100 million cars and billions of phones, if the robots become safe and affordable, the pace of adoption will grow.
At the same time, skepticism is healthy. Our experience with autonomous vehicles is that if they're 99.999% trustworthy, that's not enough. If anything, because they move slower, humanoids in the home don't have to get to that level to be useful and safe.
RoboBusiness 2024, which will be on Oct. 16 and 17 in Santa Clara, Calif., will offer opportunities to learn more from NVIDIA. Amit Goel, head of robotics and edge AI ecosystem at NVIDIA, will participate in a keynote panel on "Driving the Future of Robotics Innovation."
Also on Day 1 of the event, Sandra Skaff, senior strategic alliances and ecosystem manager for robotics at NVIDIA, will be part of a panel on "Generative AI's Impact on Robotics."
In addition to robotics innovation, RoboBusiness will focus on investments and business topics related to running a robotics company. It will also include more than 60 speakers, over 100 exhibitors and demos on the expo floor, 10+ hours of dedicated networking time, the Pitchfire Robotics Startup Competition, a Women in Robotics Luncheon, and more.
Thousands of robotics practitioners from around the world will convene at the Santa Clara Convention Center, so register now to attend!