Trajectory Based Human Behaviour Understanding

Understanding and predicting crowd behaviour in complex real world scenarios has a vast number of applications, from designing intelligent security systems to deploying socially-aware robots. Despite significant interest from researchers in domains such as abnormal event detection, traffic flow estimation and behaviour prediction; accurately modelling and predicting crowd behaviour has remained a challenging problem due to its complex nature.

As humans we possess an intuitive ability for navigation which we master through years of practice; and as such these complex dynamics cannot be captured with only a handful of hand-crafted features. We believe that directly learning from the trajectories of pedestrians of interest (i.e. pedestrian who’s trajectory we seek to predict) along with their neighbours holds the key to modelling the natural ability for navigation we posses. In these projects we propose data driven approaches which learn the relationship between neighbouring trajectories in an unsupervised manner. The resultant feature embeddings can be utilised for variety of tasks, including, future behaviour anticipation, abnormal event detection and group detection.

Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection

In this project we propose a novel method to predict the future motion of a pedestrian given a short history of their, and their neighbours, past behaviour.

A sample surveillance scene (on the left): The trajectory of the pedestrian of interest is shown in green, and has two neighbours (shown in purple) to the left, one in front and none on right. Neighbourhood encoding scheme (on the right): Trajectory information is encoded with LSTM encoders. A soft attention context vector is used to embed the trajectory information from the pedestrian of interest, and a hardwired attention context vector is used for neighbouring trajectories. The merged context vector is then used to predict the future trajectory for the pedestrian of interest (shown in red).

When attaining this task we need to consider multiple feature sequences in the form of trajectory information from the pedestrian of interest and their neighbours when predicting the output sequence. Aligning all features together is not optimal as they have different degrees of influence (i.e. a person walking directly next to the target has greater influence than a person several meters away). However, aligning each input trajectory sequence separately via a separate soft attention model is computationally expensive.


The novelty of the proposed method is the combined attention model which utilises both “soft attention” as well as “hard-wired” attention in order to map the trajectory information from the local neighbourhood to the future positions of the pedestrian of interest.

We illustrate how a simple approximation of attention weights (i.e hard-wired) can be merged together with soft attention weights in order to make our model applicable for challenging real world scenarios with hundreds of neighbours.

We demonstrate that our approach is capable of learning the common patterns in human navigation behaviour, and achieves improved predictions for pedestrians paths over the current state-of-the-art methodologies. Furthermore, we apply the proposed method for abnormal human behaviour detection.


GD-GAN: Generative Adversarial Networks for Trajectory Prediction and Group Detection in Crowds

Group detection has become a mandatory part of an intelligent surveillance system; however this group detection task presents several new challenges. Other than identifying and tracking pedestrians from video, modelling the semantics of human social interaction and cultural gestures over a short sequence of clips is extremely challenging.

Most existing approaches for group detection incorporate handcrafted physics based features such as relative distance between pedestrians, trajectory shape and motion based features to model their social affinity. However, proximity doesn’t always describe the group membership. For instance two pedestrians sharing a common goal may start their trajectories in two distinct source positions, however, meet in the middle. Hence we believe being reliant on a handful of handcrafted features to be sub-optimal.

In this project we propose a deep learning algorithm which automatically learns these group attributes. We take inspiration from trajectory modelling where the contextual information is derived from the local neighbourhood. We further augment this approach with a Generative Adversarial Network (GAN) learning pipeline where we learn a custom, task specific loss function which is specifically tailored for future trajectory prediction, learning to imitate complex human behaviours.

The simplistic nature of the proposed framework offers direct transferability among different environments when compared to the supervised learning approaches which require re-training of the group detection process whenever the surveillance scene changes. This ability is a result of the proposed deep feature learning framework which learns the required group attributes automatically and attains commendable results among the state-of-the-art.