330103-00-03-10-02-CN Google calls it a robotic agent, which is essentially a software program powered by AI, equivalent to the “brain” of the robot, and the difference between the robot and the traditional robot is that the RoboCat robot is more “universal” and can achieve self-improvement and self-improvement.
DeepMind claims that RoboCat is the world’s first robotic AI agent that can solve and adapt to multiple tasks, and that it can complete these tasks on a wide range of real robot products. “The rapid learning ability of RoboCat reduces the need for human supervision and training, and is an important step towards creating general-purpose robots.”
Two core technologies support the development of universal robots
(1) Self-generated training data
According to DeepMind, in just 100 or so demonstrations, the RoboCat330103-00-03-10-02-CN can learn to control the robotic arm to complete a variety of tasks, and then iterate on the self-generated data. Keep in mind that progress in building general-purpose robots has been slower, in part because it takes time to gather real-world training data.
In a DeepMind demonstration video, the RoboCat can already use autonomous learning to control the robotic arm and complete tasks such as “loop”, “building blocks” and “grabbing fruit”. These tasks may seem simple, but they test the arm’s precision, understanding, and ability to solve shape-matching puzzles. The RoboCat’s success rate for completing a new task has increased from 36% to 74%.
Remarkably, neither the arm it operates nor the task it is supposed to perform has been seen before by RoboCat.
This kind of “universal learning ability” is of great significance to accelerate the research in the field of robotics. DeepMind believes that RoboCat’s ability to learn independently, quickly improve itself, and quickly adapt to different hardware devices will play an important role in the development of a new generation of general-purpose robot AI agents.
(2) Based on multi-modal model
One of the key technologies used in RoboCat is a multimodal model called Gato, whic330103-00-03-10-02-CN h means “cat” in Spanish and is one of the reasons for the name “RoboCat”.
The Gato model can process language, images, and actions in both simulated and physical environments, and the researchers combined Gato’s architecture with a large training dataset containing 100-1,000 demonstrations of tasks completed by various robotic arms.
Based on the original data set and the data generated by the new training, the RoboCat’s data set will contain millions of training trajectory data. The more new tasks it learns, the better it learns and solves additional new tasks.