Skip to main content Skip to secondary navigation
Main content start

AI experts establish the “North Star” for the domestic robotics field

A Stanford AI team creates benchmarks for 100 everyday household tasks for robot assistants, creating a path for more useful agents.
An image of a robot holding an elderly person's hand
To be useful in the home, robots must have a combination of situational and physical awareness and capability. | Photo by Miriam Doerr

Robots that do everything from helping people get dressed in the morning to washing (and putting away) the dishes have been a dream for as long people have uttered the words “artificial intelligence.”

But, in a field where the state of the art currently rests far short of that level of sophistication, a fundamental challenge has emerged: Namely, what will “success” even look like, should the day come when robots are able to perform these key tasks to human standards.

To do these mundane but surprisingly complex tasks, a robot must be able to perceive, reason, and operate with full awareness of its own physical dimension and capabilities, but also of the world and objects around it. In robotics, this combination of situational and physical awareness and capability is known as embodied AI.

Now, a multidisciplinary team of researchers at Stanford University has released the Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments (BEHAVIOR). It is a catalog of the physical and intellectual details of 100 everyday household tasks — washing dishes, picking up toys, cleaning floors, etc. — and an implementation of those tasks in multiple simulated homes. A paper describing BEHAVIOR was recently accepted to the Conference on Robot Learning (CoRL).

BEHAVIOR imbues a set of realistic, varied, and complex activities with a new logical and symbolic language, a fully functional 3-D simulator with a virtual reality interface, and a set of success metrics drawn from the performance of humans doing the same tasks in virtual reality. Taken as a whole, BEHAVIOR delivers a breadth of tasks and a level of detailed descriptions about each task that was previously unavailable in AI.

“While any one of those tasks is already highly complex in its own right, imagine the challenge of creating a single robot that can do all of these things,” says Jiajun Wu, assistant professor of computer science and a senior author on the paper. “Creating these benchmarks now, before the field has evolved too far, will help to set up potential common goals for the community.”

A Monumental Task

Imagine the multiple problems a robot has to overcome to achieve a simple task like cleaning a countertop. The robot not only has to perceive and understand what a countertop is, where to find it, that it needs cleaning, and the counter’s physical dimensions, but also what tools and products are best used to clean it and how to coordinate its motions to get it clean. The robot would then have to determine the best course of action, step by step, needed to clean the counter. It even requires a complex understanding of things humans think nothing of, such as what tools or materials are “soakable” and how to detect and declare a countertop “clean.” In BEHAVIOR, this level of complexity is achieved in 100 activities performed in multiple different simulated houses.

Each of these steps (navigation, search, grasping, cleaning, evaluating) may require hours or even days of training in simulation to be learned — far beyond the capabilities of current autonomous robots.

“Deciding the best way to achieve a goal based on what the robot perceives and knows about the environment and about its own capabilities is an important aspect in BEHAVIOR,” says Roberto Martin-Martin, a postdoctoral scholar in computer science who worked on the planning aspects of the benchmark. “It requires not only an understanding of the environment and what needs to be done, but in what order they need to be done to achieve a task. All this for 100 tasks in different environments!”

Sim to Real

In creating the BEHAVIOR benchmark, the team, led by Stanford Institute for Human-Centered AI co-director and computer scientist Fei-Fei Li, together with experts from computer science, psychology, and neuroscience, has established a “North Star,” a visual reference point by which to gauge the success of future AI solutions, which might also be used to develop and train robotic assistants in virtual environments that are then migrated to operate in literal ones — a paradigm known in the field as “sim to real.”

“Making this leap from simulation to the real world is a non-trivial thing, but there have been a lot of promising results in training robots in simulation and then putting that same algorithm into a physical robot,” says co-author Sanjana Srivastava, a doctoral candidate in computer science who specializes in the task definition aspects of the benchmark.

“I got involved specifically to see how far we can push simulation technology,” says co-author Michael Lingelbach, a doctoral candidate in neuroscience. “Sim to real is a big area in robotic research and one we’d like to see develop more fully. Working with a simulator is just a much more accessible way to approach robotics.”

Next up, the BEHAVIOR team hopes to provide initial solutions to the benchmark while extending it with new tasks not currently benchmarked. According to the team, that effort will require contributions from the entire field: robotics, computer vision, computer graphics, cognitive science. Other researchers are invited to try their own solutions; to that end, the current version of BEHAVIOR is open-source and publicly available at

“If you think about these one hundred activities at the level of detail we provide, you begin to comprehend how difficult — and important — benchmarking is,” says co-author Chengshu Li, a doctoral candidate in computer science. “In that regard, BEHAVIOR is not final. We will continue to iterate and add new tasks to our list.”

Related Departments