We will show here how to get started with the simulator, and some of things you can do with it.
We recommend looking at the notebook that we provide in the Python API repository for a more complete overview.
Follow the instructions in Installation to install the environment, and make sure you can run it.
We will start by starting a communication with VirtualHome and setting up the scene. For that, create a UnityCommunication
object, and reset the environment.
# cd into virtualhome repo from simulation.unity_simulator import comm_unity YOUR_FILE_NAME = "" # Your path to the simulator port= "8080" # or your preferred port comm = comm_unity.UnityCommunication( file_name=YOUR_FILE_NAME, port=port ) env_id = 0 # env_id ranges from 0 to 6 comm.reset(env_id)
We will now be visualizing the scene. Each environment has a set of cameras that we can use to visualize the scene. We will select a few and take screenshots from there.
# Check the number of cameras s, cam_count = comm.camera_count() s, images = comm.camera_image([0, cam_count-1])
This will create two images, stored as a list in images
. The last corresponds to an overall view of the apartment
You can also add new cameras in the apartment, and visualize multiple modalities of those cameras
# Add a camera at the specified rotation and position comm.add_camera(position=[-3, 2, -5], rotation=[10, 15, 0]) # View camera from different modes modes = ['normal', 'seg_class', 'surf_normals'] images = [] for mode in modes: s, im = comm.camera_image([cam_count], mode=mode) images.append(im[0])
The content of images
will be
We will see here how to get information about the scene beyond images. Scenes in VirtualHome are represented as graphs. Let's start by querying the graph of the current scene.
# Reset the scene comm.reset() # Get graph s, graph = comm.environment_graph()
Here, graph
is a python dictionary, containing nodes
and edges
. The nodes
contain a list of all the objects in the scene, each represented by a dictionary with an id
to identify the object, class_name
with the name of the object, and information about the object states, 3D information etc. The edge
contains a list of edges representing spatial relationships between the objects.
You can use the graph to modify the scene. This will allow to generate more diverse videos, or environments to train your agents. Let's try here to place some object inside the fridge, and open it.
# Get the fridge node fridge_node = [node for node in graph['nodes'] if node['class_name'] == 'fridge'][0] # Open it fridge_node['states'] = ['OPEN'] # create a new node new_node = { 'id': 1000, 'class_name': 'salmon', 'states': [] } # Add an edge new_edge = {'from_id': 1000, 'to_id': fridge_node['id'], 'relation_type': 'INSIDE'} graph['nodes'].append(new_node) graph['edges'].append(new_edge) # update the scene comm.expand_scene(graph)
If you take an image from the same camera as before, the scene should look like this:
So far we have been setting up an environment without agents. Let's start adding agents and generate videos with activities.
# Reset the scene comm.reset(0) comm.add_character('Chars/Female2') # Get nodes for salmon and microwave salmon_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'salmon'][0] microwave_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'microwave'][0] # Put salmon in microwave script = [ '<char0> [walk] <salmon> ({})'.format(salmon_id), '<char0> [grab] <salmon> ({})'.format(salmon_id), '<char0> [open] <microwave> ({})'.format(microwave_id), '<char0> [putin] <salmon> ({}) <microwave> ({})'.format(salmon_id, microwave_id), '<char0> [close] <microwave> ({})'.format(microwave_id) ] comm.render_script(script, recording=True, frame_rate=10)
This should have generated the following video:
We can also change the camera we are recording with, or record from multiple cameras using camera_mode
argument, in render_script
.
We can also generate videos with multiple agents in them. A video will be generated for every agent
# Reset the scene comm.reset(0) # Add two agents this time comm.add_character('Chars/Male2', initial_room='kitchen') comm.add_character('Chars/Female4', initial_room='bedroom') # Get nodes for salmon and microwave, glass, faucet and sink salmon_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'salmon'][0] microwave_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'microwave'][0] glass_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'waterglass'][-1] sink_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'sink'][0] faucet_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'faucet'][-1] # Put salmon in microwave script = [ '<char0> [walk] <salmon> ({}) | <char1> [walk] <glass> ({})'.format(salmon_id, glass_id), '<char0> [grab] <salmon> ({}) | <char1> [grab] <glass> ({})'.format(salmon_id, glass_id), '<char0> [open] <microwave> ({}) | <char1> [walk] <sink> ({})'.format(microwave_id, sink_id), '<char0> [putin] <salmon> ({}) <microwave> ({}) | <char1> [putback] <glass> ({}) <sink> ({})'.format(salmon_id, microwave_id, glass_id, sink_id), '<char0> [close] <microwave> ({}) | <char1> [switchon] <faucet> ({})'.format(microwave_id, faucet_id) ] comm.render_script(script, recording=True, frame_rate=10, camera_mode=["PERSON_FROM_BACK"])
The previous command will generate frames corresponding to the following video
So far we have seen how to generate videos, but we can use the same command to deploy or train agents in the environment. You can execute the previous instructions one by one, and get an observation or graph at every step. or that, you don't need to generate videos or have animations, since it will slow down your agents. Use skip_animation=True
to generate actions without animating them. Remember to turn off the recording mode as well.
# Reset the scene comm.reset(0) # Add two agents this time comm.add_character('Chars/Male2', initial_room='kitchen') comm.add_character('Chars/Female4', initial_room='bedroom') # Get nodes for salmon and microwave, glass, faucet and sink salmon_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'salmon'][0] microwave_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'microwave'][0] glass_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'waterglass'][-1] sink_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'sink'][0] faucet_id = [node['id'] for node in g['nodes'] if node['class_name'] == 'faucet'][-1] # Put salmon in microwave script = [ '<char0> [walk] <salmon> ({}) | <char1> [walk] <glass> ({})'.format(salmon_id, glass_id), '<char0> [grab] <salmon> ({}) | <char1> [grab] <glass> ({})'.format(salmon_id, glass_id), '<char0> [open] <microwave> ({}) | <char1> [walk] <sink> ({})'.format(microwave_id, sink_id), '<char0> [putin] <salmon> ({}) <microwave> ({}) | <char1> [putback] <glass> ({}) <sink> ({})'.format(salmon_id, microwave_id, glass_id, sink_id), '<char0> [close] <microwave> ({}) | <char1> [switchon] <faucet> ({})'.format(microwave_id, faucet_id) ] s, cc = comm.camera_count() for script_instruction in script: comm.render_script([script_instruction], recording=False, skip_animation=True) # Here you can get an observation, for instance s, im = comm.camera_image([cc-3])