We introduce Watch-And-Help (WAH), a challenge for testing social
intelligence in agents. In WAH, an AI agent needs to help a human-like agent
perform a complex household task efficiently. To succeed, the AI agent needs to i)
understand the underlying goal of the task by watching a single demonstration of the
human-like agent performing the same task (social perception), and ii) coordinate
with the human-like agent to solve the task in an unseen environment as fast as
possible (human-AI collaboration). For this challenge, we build VirtualHome-
Social, a multi-agent household environment, and provide a benchmark including
both planning and learning based baselines. We evaluate the performance of AI
agents with the human-like agent as well as with real humans using objective
metrics and subjective user ratings. Experimental results demonstrate that the
proposed challenge and virtual environment enable a systematic and scalable
evaluation on important aspects of machine social intelligence.
|