Introduction
======================

VirtualHome is a multi-agent platform to simulate activities in a household. Agents are represented as humanoid avatars, which can interact with the environment through high-level instructions. 


The environment allows to place agents in house-hold environments, allowing complex interactions with the objects in them, such as such as picking up objects, switching on/off appliances, opening appliances, etc. The simulator is implemented in Unity and can be called through a simple `Python API <https://github.com/xavierpuigf/virtualhome>`_.

.. image:: ../../images/banner.gif
   :width: 100%

You can use VirtualHome to render videos of human activities, or train agents to perform complex tasks. VirtualHome also includes a Knowledge Base, providing instructions to perform a large set of activities.

Installation
======================
To interact with the simulator, simply clone the VirtualHome API repository:

.. code-block:: python

   git clone https://github.com/xavierpuigf/virtualhome.git


And download the VirtualHome executable for your platform:

.. list-table::
   :widths: 25 25
   :header-rows: 1

   * - Operating System
     - Download Link
   * - Linux
     - `Download <http://virtual-home.org//release/simulator/last_release/linux_exec.zip>`_
   * - MacOS
     - `Download <http://virtual-home.org//release/simulator/last_release/macos_exec.zip>`_
   * - Windows
     - `Download <http://virtual-home.org//release/simulator/last_release/windows_exec.zip>`_


After downloading, you should be able to use the simulator. From the VirtualHome API repository, create a communication with the simulation:

.. code-block:: python

   # cd into virtualhome repo
   from simulation.unity_simulator import comm_unity 
   YOUR_FILE_NAME = "Your path to the simulator"
   comm = unity_simulator.UnityCommunication(file_name=YOUR_FILE_NAME)


The simulator should have opened. Try getting an image of the current environment.

.. code-block:: python

   # Start the first environment
   comm.reset(0)
   # Get an image of the first camera
   success, image = comm.camera_image([0])

   # Check that the image exists
   print(image[0].shape)


Key Concepts
============
Simulations in VirtualHome work through three components: agents, representing human avatars that will perform actions; environments, representing different apartments with objects that agents can interact with and programs, that define how agents interact with the environment.

Agents
******
An agent is a humanoid avatar that can interact with the environment and perform actions. Agents in VirtualHome have a `NavMeshAgent component <https://docs.unity3d.com/ScriptReference/AI.NavMeshAgent.html>`_, that allows them to navigate throughout the environment using shortest-parth planning, avoid obstacles and turning smoothly. They also have inverse kinematics through `RootMotion FinalIK <http://root-motion.com/>`, to provide realistic animations when interacting with objects. You can add multiple agents on each simulation, interacting at the same time.

Environments
************
VirtualHome is composed of 7 different environments where agents can interact. Each environment represents an indoor apartment featuring different rooms and populated with different interactive objects. While the 7 environments are fixed, you can programatically add, remove or modify objects in them, providing different scenes to generate diverse videos or train your agents.

Objects
-------
The environment is populated with 3D objects for agents to interact with. There are three classes of objects:

* Static: that agents can touch and navigate to, but not change
* Interactable: that agents can change their state by interacting with them (open a cabinet, turn on he stove)
* Grabbable: that agents can pick and place.


EnvironmentGraph
----------------
The environments in VirtualHome is represented by an EnvironmentGraph. A graph where every node represents an object in the environment and edges represent spatial relationships.

The nodes contain the object names, a numerical identifier to interact with them, their coordinates and object bounds, as well as their state. The graph can be used to query the state of the environment, but also to modify the environment before executing anything.


Programs
********
Activities in VirtualHome are represented via Programs. A program is a sequence of instructions of the form:

.. code-block:: python

   <char_id> [action] <object> (object_id)


Where `char_id` corresponds to the id of the agent we want to perform the action and `object_id` is used to indentify the object instance to interact with, if there are more objects of a given class. VirtualHome is stateful while executing programs, meaning that if you exeute one program after another, the environment state in the second call will depend on what agents did in the first program.

An example of a program could be:

.. code-block:: python

   program = ['<char0> [walk] <chair> (1)', '<char0> [sit] <chair> (1)'] 

Agents can also execute multiple instructions at the same time, you just need to add in each instruction all the actions and agents you want to interact, separated by `|`. For instance.

.. code-block:: python

   program = ['<char0> [walk] <chair> (1) | <char1> [walk] <kitchen> (1)',
              '<char0> [sit] <chair> (1)'


Quickstart
==========

We will show here how to get started with the simulator, and some of things you can do with it.

We recommend looking at the `notebook <https://github.com/xavierpuigf/virtualhome/tree/virtualhome_pkg/demo>`_ that we provide in the `Python API <https://github.com/xavierpuigf/virtualhome>`_ repository for a more complete overview.


Installation
************
Follow the instructions in :ref:`Installation` to install the environment, and make sure you can run it.

Setting up a environment
************************
We will start by starting a communication with VirtualHome and setting up the environment. For that, create a `UnityCommunication` object, and reset the environment.

.. code-block:: python
   
   # cd into virtualhome repo
   from simulation.unity_simulator import comm_unity

   YOUR_FILE_NAME = "" # Your path to the simulator
   port= "8080" # or your preferred port

   comm = comm_unity.UnityCommunication(
       file_name=YOUR_FILE_NAME,
       port=port
   )

   env_id = 0 # env_id ranges from 0 to 6
   comm.reset(env_id)


Visualizing the environment
---------------------------
We will now be visualizing the environment. Each environment has a set of cameras that we can use to visualize. We will select a few and take screenshots from there.

.. code-block:: python

   # Check the number of cameras
   s, cam_count = comm.camera_count()
   s, images = comm.camera_image([0, cam_count-1])

This will create two images, stored as a list in `images`. The last corresponds to an overall view of the apartment

.. image:: ../../images/doc/im1.png
    :width: 49 %
    :alt: First view of the apartment
.. image:: ../../images/doc/im2.png
    :width: 49 %
    :alt: Second view of the apartment


You can also add new cameras in the apartment, and visualize multiple modalities of those cameras

.. code-block:: python

   # Add a camera at the specified rotation and position
   comm.add_camera(position=[-3, 2, -5], rotation=[10, 15, 0])

   # View camera from different modes
   modes = ['normal', 'seg_class', 'surf_normals']
   images = []
   for mode in modes:
       s, im = comm.camera_image([cam_count], mode=mode)
       images.append(im[0])

The content of `images` will be

.. image:: ../../images/doc/im_newcam_1.png
    :width: 30 %
    :alt: Normal view of the apartment
.. image:: ../../images/doc/im_newcam_2.png
    :width: 30 %
    :alt: Semantic segmentation view
.. image:: ../../images/doc/im_newcam_3.png
    :width: 30 %
    :alt: Surface normals view


Querying the scene
---------------------
We will see here how to get information about the environment beyond images. Environments in VirtualHome are represented as graphs. Let's start by querying the graph of the current scene.

.. code-block:: python

   # Reset the environment
   comm.reset()

   # Get graph
   s, graph = comm.environment_graph()

Here, `graph` is a python dictionary, containing `nodes` and `edges`. The `nodes` contain a list of all the objects in the environment, each represented by a dictionary with an `id` to identify the object, `class_name` with the name of the object, and information about the object states, 3D information etc. The `edges` contains a list of edges representing spatial relationships between the objects.

Modifying the environment
-------------------------
You can use the graph to modify the environment. This will allow to generate more diverse videos, or environments to train your agents. Let's try here to place some object inside the fridge, and open it.

.. code-block:: python

   # Get the fridge node
   fridge_node = [node for node in graph['nodes'] if node['class_name'] == 'fridge'][0]

   # Open it
   fridge_node['states'] = ['OPEN']

   # create a new node
   new_node = {
       'id': 1000,
       'class_name': 'salmon',
       'states': []
   }
   # Add an edge
   new_edge = {'from_id': 1000, 'to_id': fridge_node['id'], 'relation_type': 'INSIDE'}
   graph['nodes'].append(new_node)
   graph['edges'].append(new_edge)

   # update the environment
   comm.expand_scene(graph)

If you take an image from the same camera as before, the environment should look like this:

.. image:: ../../images/doc/im_newcam_modif.png
    :width: 30 %


Generating Videos
******************
So far we have been setting up an environment without agents. Let's start adding agents and generate videos with activities.

.. code-block:: python

   # Reset the environment
   comm.reset(0)
   comm.add_character('Chars/Female2')

   # Get nodes for salmon and microwave
   salmon_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'salmon'][0]
   microwave_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'microwave'][0]

   # Put salmon in microwave
   script = [
       '<char0> [walk] <salmon> ({})'.format(salmon_id),
       '<char0> [grab] <salmon> ({})'.format(salmon_id),
       '<char0> [open] <microwave> ({})'.format(microwave_id),
       '<char0> [putin] <salmon> ({}) <microwave> ({})'.format(salmon_id, microwave_id),
       '<char0> [close] <microwave> ({})'.format(microwave_id)
   ]
   comm.render_script(script, recording=True, frame_rate=10)<Paste>

This should have generated the following video:

.. raw:: html

   <video style="width: 50%" controls=""><source src="http://virtual-home.org/images/doc/first_vid.mp4" type="video/mp4"></video><br><br>


We can also change the camera we are recording with, or record from multiple cameras using `camera_mode` argument, in `render_script`.


Multiagent Videos
---------------------
We can also generate videos with multiple agents in them. A video will be generated for every agent

.. code-block:: python

   # Reset the environment
   comm.reset(0)

   # Add two agents this time
   comm.add_character('Chars/Male2', initial_room='kitchen')
   comm.add_character('Chars/Female4', initial_room='bedroom')

   # Get nodes for salmon and microwave, glass, faucet and sink
   salmon_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'salmon'][0]
   microwave_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'microwave'][0]
   glass_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'waterglass'][-1]
   sink_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'sink'][0]
   faucet_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'faucet'][-1]


   # Put salmon in microwave
   script = [
        '<char0> [walk] <salmon> ({}) | <char1> [walk] <glass> ({})'.format(salmon_id, glass_id),
	'<char0> [grab] <salmon> ({}) | <char1> [grab] <glass> ({})'.format(salmon_id, glass_id),
	'<char0> [open] <microwave> ({}) | <char1> [walk] <sink> ({})'.format(microwave_id, sink_id),
        '<char0> [putin] <salmon> ({}) <microwave> ({}) | <char1> [putback] <glass> ({}) <sink> ({})'.format(salmon_id, microwave_id, glass_id, sink_id),
	'<char0> [close] <microwave> ({}) | <char1> [switchon] <faucet> ({})'.format(microwave_id, faucet_id)
   ]
   comm.render_script(script, recording=True, frame_rate=10, camera_mode=["PERSON_FROM_BACK"])


The previous command will generate frames corresponding to the following video


.. raw:: html

   <video style="width: 100%" controls=""><source src="http://virtual-home.org/images/doc/join_vid.mp4" type="video/mp4"></video><br><br>


Interactive Agents
******************
So far we have seen how to generate videos, but we can use the same command to deploy or train agents in the environment. You can execute the previous instructions one by one, and get an observation or graph at every step. or that, you don't need to generate videos or have animations, since it will slow down your agents. Use `skip_animation=True` to generate actions without animating them. Remember to turn off the recording mode as well.


.. code-block:: python

   # Reset the environment
   comm.reset(0)

   # Add two agents this time
   comm.add_character('Chars/Male2', initial_room='kitchen')
   comm.add_character('Chars/Female4', initial_room='bedroom')

   # Get nodes for salmon and microwave, glass, faucet and sink
   salmon_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'salmon'][0]
   microwave_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'microwave'][0]
   glass_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'waterglass'][-1]
   sink_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'sink'][0]
   faucet_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'faucet'][-1]


   # Put salmon in microwave
   script = [
      '<char0> [walk] <salmon> ({}) | <char1> [walk] <glass> ({})'.format(salmon_id, glass_id),
      '<char0> [grab] <salmon> ({}) | <char1> [grab] <glass> ({})'.format(salmon_id, glass_id),
      '<char0> [open] <microwave> ({}) | <char1> [walk] <sink> ({})'.format(microwave_id, sink_id),
      '<char0> [putin] <salmon> ({}) <microwave> ({}) | <char1> [putback] <glass> ({}) <sink> ({})'.format(salmon_id, microwave_id, glass_id, sink_id),
      '<char0> [close] <microwave> ({}) | <char1> [switchon] <faucet> ({})'.format(microwave_id, faucet_id)
   ]

   s, cc = comm.camera_count()
   for script_instruction in script:
       comm.render_script([script_instruction], recording=False, skip_animation=True)
       # Here you can get an observation, for instance
       s, im = comm.camera_image([cc-3])<Paste>