Overview

VirtualHome is a platform to simulate complex household activities via programs. The environment allows to place agents in house-hold environments, allowing complex interactions with the objects in them, such as such as picking up objects, switching on/off appliances, opening appliances, etc. The simulator is implemented in Unity and can be called through a simple Python API.

Concepts

Simulations in VirtualHome work through three components: agents, representing human avatars that will perform actions; environments, representing different apartments with objects that agents can interact with and programs, that define how agents interact with the environment.

Agents

An agent is a humanoid avatar that can interact with the environment and perform actions. Agents in VirtualHome have a NavMeshAgent component, that allows them to navigate throughout the environment using shortest-parth planning, avoid obstacles and turning smoothly. They also have inverse kinematics through RootMotion FinalIK, to provide realistic animations when interacting with objects. You can add multiple agents on each simulation, interacting at the same time.

Environments

VirtualHome is composed of 7 different environments where agents can interact. Each environment represents an indoor apartment featuring different rooms and populated with different interactive objects. While the 7 environments are fixed, you can programatically add, remove or modify objects in them, providing different scenes to generate diverse videos or train your agents.

Objects

The environment is populated with 3D objects for agents to interact with. There are three classes of objects:

  • Static: that agents can touch and navigate to, but not change
  • Interactable: that agents can change their state by interacting with them (open a cabinet, turn on he stove)
  • Grabbable: that agents can pick and place.
EnvironmentGraph

The environments in VirtualHome is represented by an EnvironmentGraph. A graph where every node represents an object in the environment and edges represent spatial relationships.

The nodes contain the object names, a numerical identifier to interact with them, their coordinates and object bounds, as well as their state. The graph can be used to query the state of the environment, but also to modify the environment before executing anything.

Programs

Activities in VirtualHome are represented via Programs. A program is a sequence of instructions of the form:

 <char_id> [action] <object> (object_id)

Where char_id corrresponds to the id of the agent we want to perform the action and object_id is used to indentify the object instance to interact with, if there are more objects of a given class. VirtualHome is stateful while executing programs, meaning that if you exeute one program after another, the environment state in the second call will depend on what agents did in the first program.

An example of a program could be:

program = ['<char0> [walk] <chair> (1)', '<char0> [sit] <chair> (1)'] 

Agents can also execute multiple instructions at the same time, you just need to add in each instruction all the actions and agents you want to interact, separated by |. For instance.

program = ['<char0> [walk] <chair> (1) | <char1> [walk] <kitchen> (1)',
           '<char0> [sit] <chair> (1)'

UnityCommunication

All interactions in VirtualHome are done through the UnityCommunication object. The way you will interact with the simulator will be by opening one (or multiple) simulator instances, and connect with each of them using this object. Each UnityCommunication object binds with a simulator through an http port. You can use Ray to run multiple simulators at once, and speed up the training of your agents.