In medicine, endoscopy procedures on the Gastrointestinal (GI) tract play an important role in supporting domain experts to track down abnormalities within the GI tract of a patient. Such abnormalities may be a symptom for a life-threatening disease such as colorectal cancer.
This analysis is typically carried out manually by a medical expert, and detecting critical symptoms relies solely on the experience of the practitioner, and is susceptible to human error. As such, we seek to automate the process of endoscopic video analysis, providing support to human experts during diagnosis.
Most previous endoscopy analysis approaches obtain a set of hand-crafted features and train models to detect abnormalities. For example, the encoded image features are obtained through a bidirectional marginal Fisher analysis (BMFA), local binary patterns (LBP) and edge histogram features. A limitation of these hand-crafted methods is that they are highly dependent on the domain knowledge of the human designer, and as such risk losing information that best describes the image.
In this project, we introduce a relational reasoning approach that is able to map the relationships among individual features extracted by a pre-trained deep neural network. We extract features from the mid layers of a pre-trained deep model and pass them through the relational network, which considers all possible relationships among individual features to classify an endoscopy image. Our primary evaluations are performed on KVASIR dataset, containing endoscopic images and eight classes to detect. We also evaluate the proposed model on Nerthus dataset to further demonstrate the effectiveness of the proposed model. For both datasets, the proposed method outperforms the existing state-of-the-art.