Conversational agents that can see

The development of conversational agents, whether as smart home devices, or embedded in mobile devices or social robots, has started in the world of chatbots, with only text available, and then started to build audio features, and finally considering context through sensors and cloud knowledge, as well as offering images in response to a query.

However, little attention has been paid to other conversational modalities, such as showing, pointing, or gesturing. The reliance on these is exacerbated in conversation with people who are not using verbal language to a large extent, such as children and people with intellectual disability.

This project will seek to define a framework for conversational agents that respects people’s desires for diverse modalities, by implementing a series of iterative prototypes that will allow the student to understand a range of aspects, such as intent, outcomes of system’s failure to correctly interpret, or understanding in group conversations. Prototypes can range from Wizard-of-Oz (controlled by a research experimenter) to working prototypes (leveraging artificial intelligence algorithms and conversation models).

Research activities

For this opportunity, we are seeking students who are enthusiastic about applying their skills in full stack development, and are eager to expand their knowledge to the use of machine learning algorithms and methods of human computer interactions.

You will be expected to undertake the following tasks:

  • conduct a literature review
  • articulate a specific research question in relation to the topic
  • design and conduct a research study that considers the technology, the people, and the context. In any appropriate order, and iteratively:
    • implement a prototype that users can engage with
    • observe participants as they engage with the prototype
    • analyse and discuss the research findings
  • consolidate the findings into a framework.


The outcomes of this project will include:

  • a framework for multi-modal conversational agents that can see
  • working prototypes or software that could continue to be used by research participants, or other researchers
  • research papers and conference presentations.

Skills and experience

For this project, successful applicants will be able to demonstrate:

  • a degree or equivalent experience IT or software engineering, including experience with development of online interfaces
  • excellent writing and communication skills
  • a creative mindset.

The following skills are highly desirable, but can be acquired during the project by a keen learner:

  • User Experience (UX),  Human Computer Interactions (HCI), or Interaction Design
  • experience with machine vision and speech recognition algorithms.

Chief Investigators