With the rapid growth of computational technology, multi-aspect data has become ubiquitous and popular. Multi-aspect data is able to represent the information with multiple perspectives or multiple types of features. An example of multi-aspect data is social network data that contain user profiles description in text form, user structural relationships in graph form, user comments in text form, and many others. Combining the different aspects of the data using inter-similarities, intra-similarities and aspects associations require advanced methods. Methods are needed to project the high-dimensional, sparse and complex text data to lower-dimensional data such that the distance between points within and inter views are preserved. The low-order data representation (with selected features) can be used in supervised and unsupervised learning.
This project will use factorisation methods and deep learning methods to exploit the multi-aspect data properties. These two types of methods share a strong similarity such as the use of a loss function for learning the data as well as the use of regularization to constrain the relationships learned. We propose to explore the loss function and regularization terms that can best represent the multi-aspect data and
deal with the scalability and accuracy. Usually addition of more information effectively can increase the accuracy, however it then limits the scalability. We will explore the network architecture of deep learning that can be used to learn multi-aspect data with a trade-off of accuracy and scalability.
The proposed methods will have applicability to wider fields such as vision, signal processing, bio-informatics, text mining, web mining and recommender systems. We extend the application of these methods in a specific application, Chatbots.
There are three types of conversational agents:
- question answering agents
- task-oriented dialogue agents
A chatbot needs to understand the conversations in natural language and learns from interactions to increasing its knowledge. While making conversation, it needs to remember prior conversations and some known facts and knowledge. Due to the complex nature of multi-faceted data, they require robust features to be presented.
This project will exploit the word embedding and ontology to provide constraints for instances and their relations when analysing a question and generating the corresponding answer. Deep learning methods will be investigated to generate the answer adhering the given multi-aspect context and ontological constraints.
The proposed methods will have applicability to other applications where the conversations generated from deep network can be constrained/extended by other data sources.
- Publications on developed methods and systems.
- A machine learning software dealing with multiple aspect data sources.
- A chatbot system software.
- New collaboration across the centre and the broader Data Science community.
Potential for impact
In many scientific data analysis tasks, data are often collected through various measuring methods such as sensors, video and audio signals, text forms. An example is medical domain where data about patient can be obtained by many sources, such as ultrasound image data, magnetic resonance data, and medical staff text notes. Utilisation of all data by multi-aspect learning algorithms will reveal a highly accurate information in the form of classifiers or clusters. Similar analogy can be formed in the environmental area.
This project will have high societal impact. Various organisations are collecting data from different aspects. Methods developed in this project will have the data analyses most effectively and efficiently. The chatbots based on developed methods can be implemented in several environment, including hospitals or nursing homes. This sector is known to struggle with the lack of carer staff. The proposed system will have a possibility to overcome with the problem and provide 24/7 carer assistance to elderly people.
- Professor Richi Nayak (project leader)