BRAG Meeting – Thursday 26th June 2025

The fortnightly BRAG meeting will be held this Thursday 26th June at 1pm via Zoom/GP-Y801. This week we will have a presentation by Abdur Rehman Khan.

Join Zoom Meeting
https://qut.zoom.us/j/82146137870?pwd=gCrTWz7wZ9cbXbYgYKooibH2jQlIVr.1&from=addon

Meeting ID: 821 4613 7870 / Passcode: 987984

Abdur’s Talk

Title: Multi-label deep neural framework for software source code vulnerability discovery

Abstract: Software vulnerabilities are pieces of code that attackers can exploit, leading to various issues such as unauthorized access, privilege escalation, and remote code execution. The rise of vulnerable code, especially in open-source software, has resulted in financial losses, data breaches, and intellectual property infringements. The number of vulnerabilities has grown exponentially. Detecting source code vulnerabilities manually is resource-intensive, and with rapid software development, security aspects are often undermined. Recent automated source code vulnerability discovery methods focus on machine learning and deep learning algorithms. However, the main challenge associated with them is the representation of source code. A source code can be represented structurally via abstract syntax tree, control/data flow, and dependency graphs, etc., or a sequence of tokens. While significant research has been done incorporating these representations, they still suffer from the problem of proliferating structural and sequenced features. Moreover, most of these approaches focus on coarse-grained binary vulnerability discovery, where only the presence of vulnerability is detected without providing details about the vulnerability type. The existing methods, capable of identifying the vulnerability types, framing it as a multi-class problem, suffer from inadequate source code representation. These approaches either encode the whole program, or individual functions, resulting in irrelevant or insufficient features representation, respectively. Moreover, even though multiple vulnerabilities could occur in source code, existing literature methods for discovering source code vulnerabilities remain limited to multi-class classification. The challenges associated with multi-label vulnerability discovery includes a lack of existing benchmarks, highly interrelated and similar source code tokens containing multiple vulnerabilities, and effective representations of source code programs for accurate vulnerability discovery. Therefore, this research focuses on extending the notion of multi-class vulnerability discovery to multi-label vulnerability discovery and investigating a balanced representation of source code that can identify multi-class source code vulnerabilities, presenting a new benchmark for multi-label source code vulnerability discovery.